Using REXML to Parse XML Documents

August 31, 2014

Yesterday I wrote about using Nokogiri to parse the XML file that contains a purchase order to retrieve details about the Shipping address. Today I thought I would go ahead and do the same exercise using REXML.

Here is the description of REXML:

REXML is a pure Ruby, XML 1.0 conforming, non-validating toolkit with an intuitive API. 
REXML passes 100% of the non-validating Oasis tests, and provides tree, stream, SAX2, 
pull, and lightweight APIs. REXML also includes a full XPath 1.0 implementation. 
Since Ruby 1.8, REXML is included in the standard Ruby distribution.

So REXML is included in the Standard library and is most probably used by all the other Ruby XML libraries. REXML was inspired by the Electric XML library for Java, which has an easy-to-use API, it’s small size and is quite fast. REXML was designed with the same philosophy as Electric XML and has these same features.

REXML supports both tree and stream document parsing. Stream parsing is about 1.5 times faster. However, with stream parsing, you don’t get access to features such as XPath.

We will use the same document that I used in the previous post on parsing XML. To remind you of the document we are parsing here it is:

Our requirement is simple, pullout the shipping address and display the name, street, city, state, zip and country.

Since REXML is included in the Ruby Standard Library, there is no need to install any gems, you still need to require ‘rexml/document’ to parse the file through. In the code, I use the ‘include REXML’ directive for easy of use. The ‘purchase_order.xml’ file will be read into memory and is accessible via the file variable.

To get access to the XML document, I create a new instance of the Document class passing in the contents of the ‘purchase_order.xml.’ file.

Just as I done with Nokogiri I pullout the Shipping address using the XPath expression

/PurchaseOrder/Address[@Type='Shipping']

Unlike Nokogiri, I read all the details I need to display the name, street, city, state, zip and country of the Shipping address using the map method. Then we can display these details using the ‘puts’ method.

REXML is easy to use and comes bundled in the Standard Library; therefore, it would be my first choice when it comes to parsing XML. Having stated that I still feel Crack had the best API for my needs and felt a lot more easy to use compared to Nokogiri and REXML.


Discussion, links, and tweets

My name is Deon Heyns and I am a developer learning things and documenting them in realtime. Python, Ruby, Scala, .NET, and Groovy are all languages I have written code in. I appeared in the New York Post once. I host my code up at GitHub and Bitbucket so have a look at my code, fork it and send those pull requests.

comments powered by Disqus