My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML March 24, 2008 | 02:59 pm

Edit: Once you get the gist of this rant, jump to the comments for a slightly more reasoned approach. Or my follow-up post which attempts to re-open the dialog.


So, I’m trying to do a little bit of XML reading/writing. Nothing major — read in an XML, grab out some values, and then store the raw XML into the database. I’m doing pretty much the same thing in Groovy, and the XmlSlurper made that blissfully easy.

Since the core library comes with the REXML parser, I figured that it was a nice, stable library, and I’d roll with it. The interface wasn’t as nice as XmlSlurper, but it seems like it would do.

This was the start of the pain.

In fact, the pain pissed me off enough to share my frustrations with the world. Hopefully someone finds this useful, and they can avoid the pain and suffering I put up with. And yes, I could spend the time I’m griping going through and fixing up all the bugs, but I shouldn’t have to for a language as mature as Ruby. Core libraries are supposed to be stable, reliable beasties. If I wanted to spend all my time debugging half-baked implementations or rolling my own solutions, I’d never leave Ocaml — I come to Ruby for the community support. That’s supposed to be the big advantage.

Anyway, here we go:

Problem #1 came along when I tried to parse XML. First of all, the API documentation completely sucks — if you look at the top level REXML package, it’s totally worthless. If you manage to figure out that it’s REXML::Document that you probably want, you’re still not much better off. If you check out #new, which is really what you probably want, you’re rewarded with one word: “Constructor” You also have some “@param” tags that ran together and tell you things like the second argument, called “context”, should be a Hash of the context. That clears up a lot! And, seriously, if you’re telling me that it should be a Hash in the documentation, why aren’t we just doing implied static typing and being done with it?

Anyway, I retreated to Google, found the REXML tutorial, and managed to figure it out from there.

But then I kept having this annoying bug: when I called Element#text(), it was not only ignoring my instructions to leave entities alone (i.e. don’t turn “&lt;” into “<”), but it then seemed to go through and attempt to re-parse it, because it was complaining about unbalanced tags! Principle of Least Surprise my ass(1)! I’m not sure why the second part of that was happening, but the first part is apparently documented, so I stopped using the easy-to-read convenience method and went to Element#write.

This is where the real pain began. See, Element#write is broken. Deprecated and broken, actually. But the tutorial still tells you to use it. The solution is to use their Formatter approach. Except — ready for it? — that’s broken, too! No, I’m not kidding. In this language core library, both versions are broke! The solution is for me to reach in and make a change to the core library so that we avoid a null. In the standard Ruby deployment, using the standard core XML processing library, there is no way to write out XML. It is impossible because of bugs in the library.

The worst part?

THAT STUPID BUG IN THEIR CORE LIBRARY WOULD HAVE BEEN FIXED WITH STATIC TYPING(2). Even more if you have a type system which can check nulls for you. Null pointers/”nil when you didn’t expect it!” errors are totally solvable problems. The fact that our industry hasn’t moved past this painful left-over from C is driving me crazy. The next person who tries to tell me that dynamic typing is the best thing since sliced bread is going to get an earful. It is a flat-out wrong position, and I’m done hearing otherwise from anyone.

(1) As much as I’d love to claim that quote, it actually comes from Paul Cantrell’s excellent exploration of closures in Ruby.
(2) Or with the right test and a CI server guarding the production-bound branch. But that’s apparently not happening…which is where static typing comes in.

Tags: , ,

  • Nick

    The answer, my friend, is hpricot. Or to just abandon Ruby altogether since it’s dynamically typed and all the airbag ranting in the world isn’t going to change that.

  • Brian Hammond

    otherwise

  • http://www.AboutJustin.com Justin Bozonier

    There’s a difference between dynamic typing being bad versus strong typing being necessary at times.

    I think what we’re seeing is a growing need to be able to loosely type and at other times statically typed and sometimes dynamically type.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @Justin

    There’s a difference between dynamic typing being bad versus strong typing being necessary at times.
    Care to elaborate? I’m missing the distinction you’re making. And when is it necessary to dynamically type?

    Personally, I’m just burnt out on the extra work and flakiness that comes with dynamic languages. I had these problems back with CPAN, but Ruby is flat-out worse.

  • James

    Ruby’s not for everyone.

    Find a tool that better fits your world-view and you’ll be happier.

  • tg

    I had problems with REXML too. This time in XPATH predicate processing. I wrote an angry post about it here: .

    http://arrogantgeek.blogspot.com/2008/01/why-ruby-sucks-1.html

  • Pingback: Sp3w » Blog Archive » Linkage 2008.03.25 AM

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    Yeah, I’d like to get off Ruby, but the reality is that I haven’t encountered anything else out there better than Rails. And it’s not that Rails is the best web framework out there, but that the community provides a lot of strong support. I just don’t see another framework with the level of support, advanced set of plugins, etc., etc.

    My ranting was really 3-fold:
    1) To feel better. I was really frustrated with REXML, and although I had long considered Rails to be flaky, discovering that the Ruby core library is flaky, too, really upset me.
    2) To push back against the overwhelming popularity of dynamic languages right now. Ruby/Rails, like most things in IT, is just a temporary solution until there’s some better framework. It’d be really nice if the next framework was a bit smarter about typing and safety.
    3) To document my problem. There wasn’t a lot of documentation for my problem on the net, so I wanted to add some more. Also, my memory is horrible, so I wanted to have the story locked down somewhere for posterity — people seem to act like I’m just a reactionary freak when I gripe about dynamic languages, and it’s nice to have a solid example to demonstrate the issue.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    If you’re reading this, you probably want to also check out the comments for this post on Reddit. That walled garden has a nascent (although potentially interesting) thread.

  • http://www.gandcb.com Zach

    What interests me here is how you manage to go from “there is a major bug in this core library” to “an entire model for programming is wrong” in a single sentence. “THAT STUPID BUG IN THEIR CORE LIBRARY WOULD HAVE BEEN FiXED WITH STATIC TYPING.” doesn’t, to me, seem to be a useful enough point to warrant the conclusion you have drawn. You could write that sentence a hundred different ways – “the stupid bug could have been avoided if they used a hash instead of a set,” “they completely would have avoided readability problems if they had used black instead of blue!” – that doesn’t immediately nullify the value of the two choices. No one would claim that because a set or the color blue where inherently poor simply because they were not proper solutions to specific problems.

    This frustrates me because I have run into many things in Ruby and Ruby’s core libraries that make me wary of considering it a “mature” language or one I’d really be interested in designing production systems in, and you were well on your way to aptly pointing that out, until you chose to use your rhetorical sword to slay a dragon far out of range.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    Oh, and let’s not forget that this comes hot on the heels of another problem I had with name collisions at runtime.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    Check the Reddit comments for other links to complaints and pithy comments about REXML. It’s apparently just a complete mess.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @Zach

    You’re welcome to tackle Ruby and it’s problems — for my part, I was mainly trying to vent (see above for more on that). I’d encourage you to post your stuff on Ruby and Ruby’s core libraries: I’d be curious to see what you’ve bumped into.

    Now that I’m a little calmer, let me add some more. I may rework this and some other thoughts into a more coherent post in the near future — I haven’t decided.

    Here’s the thing: open source libraries are one thing. I’m used to those being of dubious quality, and I’m worth putting some effort in to fix them. Even more, I’ve cut Ruby on Rails a lot of slack, because the bugs and the awkwardness are the price that you pay to hang out with the beer-swilling hipsters that make out that community and roll with the rapid rate of development. And, really, it is the best web framework I’ve dealt with, mainly because it provides the most comprehensive and extensible set of functionality of any framework, and it’s got a solid community to back it up.

    However, it’s doing this all by constantly just barely working. That’s the price you pay: the development constantly pushes to the very edge of tolerance, and APIs you took for granted in the last release may fade away, and there’s this constant concern about stability.

    But, the argument ran, static typing really doesn’t get you anything. It’s the unit tests that do it. The testing will exercise the API and give you — for all practical purposes, anyway — the protection that static typing will. So all this clinging to static typing is just old fogies who can’t pry their insecurities away from their enterprisey languages long enough to see how people really get productive.

    And that argument isn’t one I really buy. But I hear it a lot. And I hear people griping about how bad static typing is, for reasons that have zero applicability to any language whose type system postdates the Reagan administration. And I try to mention things to engage in dialog, and it’s really gotten nowhere. But, y’know, hey, if the unit tests really do provide the same protection, then it’s no big deal.

    But then this went down. If Ruby cannot even keep their standard library in check — if they, as the leaders of the language, can’t manage to keep themselves together, and keep something as common as reading and writing XML working — if they fall apart in ways that straightforward static type checking would solve, and if it burns a full day of my precious and limited time trying to figure it out, then I feel like all that tolerance and engaging in dialog just bit me in my ass.

  • http://www.pavleck.com/blog Jeremy D Pavleck

    What a timely post actually. So I decided to poke around with Ruby finally, and see if it could work with one of my projects I have in mind, which is mainly grabbing an RSS feed, parsing it, re-writing it to a database, which I’ll use later to construct some neat things – just for the hell of it.

    Glad I saw this!

  • Pingback: Enfranchised Mind » Working on Rebuilding the Dialog

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @Jeremy D Pavleck

    Here’s the summary of my experience: I haven’t used the core library’s built-in RSS feed, so I don’t know how that works out. REXML is to be avoided. Hpricot is pretty nice, as long as you don’t need to deal with namespaces.

  • Jonas

    As someone pointed out earlier: don’t use REXML. Its documentation is terrible, it doesn’t really work, and it’s ridiculously complicated to use. Also, don’t use XML-Simple, since it suffers from the exact same problems.

    Instead, use Hpricot.XML, which is awesome, seriously fast, ridiculously easy to use and awesome.

    http://code.whytheluckystiff.net/hpricot/

    You can select stuff with CSS or XPATH selectors, change it, print it out. It doesn’t get any easier.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @jonas

    Got some documentation on Hpricot and namespaces? I haven’t found any support.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @jonas

    Oh, and if REXML sucks so bad, why does the Ruby team make sure that it ships with every Ruby distribution? What does it say about the language that they allow something so woefully broken to be a backbone of their language?

  • Pingback: libxml-ruby 0.8.0 Released: Ruby Gets Fast, Reliable XML Processing At Last

  • Caligula

    People that complain about dynamic languages because they found a bug that static typing would have solved make me sleepy.

  • Chandon

    > Even more if you have a type system which can check nulls for you. Null pointers/”nil when you didn’t expect it!” errors are totally solvable problems.

    What language are you thinking of? The only mechanism that I’ve encountered that comes anywhere near this claim is Haskell’s Maybe, and having a type system quite that powerful seems like overkill a lot of the time.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @Chandon

    OCaml also does null checking for you via an Option type. Its type system is really astoundingly nice to work in, because it allows mutability and defaults to eager evaluation, but has enough of a natural functional programming flow to really warp your mind.

    Post where I address this:
    http://enfranchisedmind.com/blog/2008/04/14/useful-things-about-static-typing/

    General presentation on OCaml:
    http://enfranchisedmind.com/blog/2008/07/07/rubymn-presentation-of-ocaml/

  • Brian Hurt

    Haskell’s Maybe and Ocaml’s option are exactly what he was thinking of. And I disagree that having a type system “that powerful seems like overkill a lot of the time”. If you care about code quality to be writing unit tests, you care about code quality enough to be using static typing.

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    @Brian

    There is a point when the noise of static typing outweighs its gains. Of course, this has to be determined on a language-by-language basis given the particular noise and the particular gains that were accomplished. For instance, Java is a lot noisier with a lot fewer gains than OCaml.

    (On the dynamic side, there is a point where the incomprehensibility/unmaintainability of dynamic typing outweighs its simplicity.)

    I’m increasingly coming to conceive of languages on a continuum of static-type-ness. After all, there is *some* static typing in Ruby — see classes vs. modules. But it’s just that there’s extremely little. Perl has slightly more — scalars, lists, hashes, subroutines. And you can run the full gamut over to proof-focused languages, where every aspect of the application is wrapped into the type system in one way or another.

    As per your Programming Languages are PC OSs circa 1986 and There is no One Answer posts, I think the question is trying to find the sweet spot on that continuum for the given problem space.

    I’m not willing to go so far as to say that things are all relative — Ruby is flat-out unmaintainable due to dynamism (see My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML), and the proof systems aren’t flexible enough to handle real-world demands for agility. But I’m willing to hear arguments in the middle.

  • Brian

    Granted- there is a point where static typing isn’t worth it. There’s a lot of code that people write that doesn’t need to be statically typed- or unit tested. I mean, when you type “find . -name \*.ml | xargs grep -l foobar” into the command line, you’ve written a little program. Is it worth statically typing it? No. Nor is it worth unit testing it, I comment. It goes higher than that, as well- build scripts, sysadmin style scripting, etc., don’t benefit much if at all from typing, and unit testing.

    My point is that the benefits of static typing go much farther down the hierarchy than most people think. Given that static typing (with type inference and a real type system- i.e. Ocaml/Haskell style static typing) is cheap (in terms of programmer time), it should actually be used before unit testing (which is relatively expensive in terms of programmer time). Certainly, by the time you’re writing unit tests, you should be statically typing (IMHO).

  • http://www.linkedin.com/in/robertfischer Robert Fischer

    I do agree with you there — insofar as static typing is less pain that unit testing, it stands to reason that it should be adopted earlier.

  • Pingback: Ronin 0.2.1 “notashellscript” released « House of Postmodern

  • Pingback: Enfranchised Mind » April Fails or April Fools Epic Win? On “Ruby is the Future”

  • Pingback: Ruby is the Future

  • steve

    I know this is an old post, but it made me laugh. Static typing gives you no protection except in precious few languages, and even then things still slip through.

    If you are relying on static typing in lieu of proper testing and it sure sounds like, retire.

  • Pingback: from Hpricot to nokogiri | Bibliographic Wilderness