I Went Crazy and Made a Small View Indexer
Sat Aug 04 11:45:00 EDT 2012
- I Went Crazy and Made a Small View Indexer
- This View Indexer Thing Might Be Worth Some Time
Warning: this post is almost entirely pie-in-the-sky nonsense, untethered from reality. Probably.
So yesterday, for some reason, I got to thinking about Domino's view indexing and whether or not it can be readily done better. The current view indexer does its intended task with aplomb (most of the time), but it has its problems. For one, iterating over view indexes in Java (read: everything you should do in Domino today) is dog slow. More theoretically, since it deals only with summary data for performance reasons, its ability to deal with structured data is limited - you can't deal with rich text or, for example, store any sort of Map in a Domino document and deal with it in a view (realistically).
I had the notion that it may be faster, at least in some cases, to do your own indexing: make a Java collection of your "view" data and serialize the result into a note. I decided to try a little test: take a view that consists only of a sorted column of 100k documents' UNIDs and compare it to a TreeSet
of the same. My initial results were promising: iterating over the collection of documents (via getNextDocument()
and a forall with getDocumentByUNID()
, respectively) was about twice as fast with the Java version and was stored in about 17% the space, presumably thanks to document compression.
To be fair, that was really stacking the deck in favor of Java: I didn't care about reader fields or any entry metadata like position, and fetching every document in a large view is possibly the most expensive way to use it. Presumably, fetching just the first entry would make the view version much faster. Still, it shows that it might be useful in some cases, so I decided to try something a bit more complicated.
I made another view with 15k documents (task requests) and two columns: the date of the request, sorted, and the summary line. I matched that up with a TreeSet
containing List
s of Map
s and iterated over both to fetch and print the entry data. Again, the Java version ended up way faster, taking about 25% of the time.
Now it has me wondering about whether it'd be worth investigating further. It's fair to assume that Java can be treated as the new "native" language of Domino, so serialization is potentially a legitimate mechanism. Furthermore, a real implementation of an alternative view index wouldn't have to be as hopelessly naive as the one I put together: the basic java.util.* collections are almost definitely not the best storage mechanisms for a database index, and, better still, this is well-trodden territory. Since normal views would still exist in this magic future world, these fancy views could eschew a lot of the UI and build-performance requirements of the normal ones in favor of fancy tricks like native knowledge of serialized data structures. Imagine storing structured data like a table of line items in a MIME entity but then being able to query that in a "view". Reader fields could be handled by storing the applicable names in the "entry" and then comparing that two the user's names list at runtime, presumably like standard views do. A server task could handle monitoring document changes and making incremental changes (like happens now, really).
So it's theoretically possible. Would it be worth it? I'm not sure - maybe. It's fun to think about, in any event.
Tim Tripcony - Sun Aug 05 03:10:18 EDT 2012
No... words... they should have sent a poet.
Nathan T. Freeman - Sun Aug 05 11:11:44 EDT 2012
I've got 3 words for you: Concurrent Skip List
By the way, unless you need these serializations to be replicatable, there's no reason to store them in MIME entities. Just write them directly to a folder on the server like DAOS does. Regular view indicies aren't replicated -- they are unique to each instance of the NSF, and many of us have been asking IBM to get them OUT of the NSF for quite some time. So don't make the same mistake with your own indexer strategy.
Jesse Gallagher - Sun Aug 05 13:41:15 EDT 2012
Oh crap, yeah. Even though I knew it wouldn't have to be replicatable, I still had the notion in my head that I'd have to get to the index data remotely via NRPC, like from a Notes client or across servers. Since that's not necessarily the case, I can make things easier on myself.