Parsing XML from a shell script can be a PITA. Enter xml2, which is a tool used to convert XML (and HTML) to and from a line-oriented format. The format created by xml2 is easier to use by Unix pipeline processing tools (grep, awk, sed, etc.).

Consider the following snippet of XML taken from this and old site:

<?xml version="1.0" encoding="UTF-8"?>
    <channel>
    <title>Jan-Piet Mens</title>
    <link>http://blog.fupps.com</link>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds2.feedburner.com/jpmens" type="application/rss+xml" />
    <item>
      <title>Snow White meets Steve Jobs</title>
      <link>http://feedproxy.google.com/~r/jpmens/~3/mqyDdQMpe5k/</link>
      <comments>/2009/03/31/snow-white-meets-steve-jobs/#comments</comments>
      <pubDate>Tue, 31 Mar 2009 12:49:27 +0000</pubDate>
      <dc:creator>Jan-Piet Mens</dc:creator>
      <category><![CDATA[Entertainement]]></category>
      ...

If I run that through xml2, I can easily search for post titles only:

$ curl -sf http://blog.fupps.com/feed/ | xml2 | grep item/title
    /rss/channel/item/title=Snow White meets Steve Jobs
    /rss/channel/item/title=UIT opens Alternative DNS Servers' Web site
    /rss/channel/item/title=Die 3 Minuten sind um
    /rss/channel/item/title=Chicken a la JP
    /rss/channel/item/title=The cooks aboard HMS Belfast
    /rss/channel/item/title=Polite notice on window
    /rss/channel/item/title=Observations on travel (to London)
    /rss/channel/item/title=Tidbits
    /rss/channel/item/title=Did you know?
    /rss/channel/item/title=A different kind of oops

Adding a bit of awk or cut, gives me just the titles themselves:

$ curl -sf http://blog.fupps.com/feed/ | xml2 | grep item/title | cut -d= -f2
    Snow White meets Steve Jobs
    UIT opens Alternative DNS Servers' Web site
    Die 3 Minuten sind um
    Chicken a  la JP
    The cooks aboard HMS Belfast
    Polite notice on window
    Observations on travel (to London)
    Tidbits
    Did you know?
    A different kind of oops

Very nice.

Documentation for xml2 consists of examples of its flat format as well as some usage examples.

Comments

blog comments powered by Disqus