Java Takes Its DTDs Seriously

Wed Jun 29 10:24:54 EDT 2011

Around 8:45 PM last night, my main XPages app stopped responding. The browser would sit there waiting for the server for about 30 seconds or a minute until the server finally gave up and handed out a Command Not Handled Exception. When I first started looking into it, I saw a rogue process taking up the whole CPU, but after killing it and bouncing the server for good measure, the problem remained.

I'll leave out the hour's worth of hair-pulling and cut right to the chase: I had added a doctype line to my faces-config file but hadn't gotten around to removing it. This normally isn't too much of a problem, but java.sun.com became unavailable yesterday (and still is at the moment). Thus, the server was opening the faces-config.xml file as XML and, as XML parsers are supposed to it, it was attempting to fetch the DTD to validate it. However, after waiting 30 seconds or so, it would give up the ghost, spit a misleading error to the console along the lines of "Can't parse configuration file:xspnsf://server:0/database.nsf/WEB-INF/faces-config.xml", and declare that it couldn't handle the command. As soon as I removed the DTD, everything started working perfectly again.

I'm reasonably certain this is Larry Ellison's fault.

The Problem I'm Having

Sun Jun 12 12:16:42 EDT 2011

The main non-work project I've been working on for the last... forever, it seems, has been a forum app for my guild. It started out as a raid-composition utility, then grew to have loot handing, and then added a forum. Most of it is pretty WoW-specific, but the upshot of the whole thing is that it's a moderately-complex XPages application, which has done wonders for my knowledge of the environment.

For programmatic convenient and, in some cases, performance, I've written a ton of wrapper classes, such that I almost never use the standard <xp:dominoDocument/> and <xp:dominoView/> data elements directly, instead using Java beans and collections that implement List. This allows me to do things like #{post.topic.forum.title} if I want to without having to worry about the XPage knowing about how to fetch each parent document in a view.

However, I've run into an annoying problem. Namely, this problem, wherein the Java classloader gets confused about different versions of some classes and stubbornly declares that a class is incompatible with itself, even when I didn't change anything with the class. From what I gather, this is something that you're sort of bound to run into with XPages once you start mucking about with scoped beans enough. It's gotten to the point in my app where I can trigger this behavior 100% of the time by simply re-saving any XPage document. Fortunately, I can clear it up by re-saving a Java class file, but that means that every change I make to an XPage or a Custom Control has to be followed by a re-save of a Java class, which is a step I'd rather not have to take. Worse, it seems to also happen when Domino document data is modified via a mechanism other than the XPages themselves (say, via replication), which is kind of a serious problem.

For the most part, I've learned to work around this. When developing, I re-save a Java class file after each XPage, and I (horrifyingly) wrote a cron script to check the page I know exhibits this error when it happens every 15 minutes and restarts Domino as necessary (which is why I wrote the special login DSAPI filter). These are terrible things to have to do, though, and so I'm really trying to figure out how to actually fix the problem.

Clearly, something I'm doing in my code has caused this, since it doesn't happen in my other apps, but it's not clear what it is and if it's really something I'm doing wrong. I've tried switching all of my managed beans to request scope, implementing Serializable all over the place, changing the way XPages are cached, fiddling with the compiler target version in the Eclipse project, eliminating all Domino classes from object instance variables, installing server fix packs as they come out, and, heck, switching from Windows to Linux, just in case.

At this point, my workarounds pretty much work, but it still drives me nuts, especially the server-bounce script. Since I suspect the culprit is related to all the managed beans, I'm going to try cutting down the data-type collections (Posts, Topics, etc.) and replacing them with <xp:dataContext/>s on the pages that use them, but I don't have terribly high hopes for that. Maybe whatever is causing it on the server side will be fixed in 8.5.3, but I'd still love to fix it before then. I may have to resign myself to it, though, and hope the problem does indeed go away once I'm no longer actively working on it.