frostillic.us

Showing posts for tag "views"

Meandering Musing About Views

Sun Jul 20 23:19:24 EDT 2014

Tags: views

As happens periodically, I've been thinking about Domino views lately. When I get into one of these moods, I find it helps to take a step back to look at what an NSF is.

An NSF is, in its heart of hearts, a key/value store. Each entry has several keys of which the useful ones are Note ID and the UNID, which are 32-bit and 128-bit integers, respectively, and where the Note ID is fixed and the UNID is mutable. Each entry's value is a multimap with string keys and values that are either effectively blobs or multi-value strings, numbers, or date/times+ranges, plus metadata.

What it does not intrinsically have (conceptually) is an ability to collect, index, and query documents other than "all" or by key (db.search can be thought of as a specialized instance of indexing). Layered on top (again: conceptually) of the NSF are two IBM-supported indexing schemes: the view indexer (NIF) and the full-text index. Though these services are baked into Domino at an API level (including data transit over the same wire), they are in many ways no different from LDAP, the RnR manager, or third-party Domino addins: they are independent services that provide additional capabilities to the server.

This is a long-winded way of saying: there's nothing stopping anyone from doing their own indexer, particularly now that the primary use would be within an XPage running on the server directly, not a legacy client. So what would such an indexer need? The way I see it, there are four conceptual parts: document selection, index entry creation, index updating, and querying. Both of the two built-in methods handle these tasks in their own way. The full-text index's answers are:

Document selection: all of them.
Index entry creation: all processable string data in the document, with variations configured at setup time. There is no user-defined index-stored metadata and the resultant index is effectively a black box.
Index updating: "immediately", by which it means "eventually". Specifically, it's handled by an updater task that I'll charitably assume batches changes for group processing. Can also be done on schedule or manually.
Querying: a custom string-based DSL that allows for selection of documents and sorting by "relevance" or creation date. It also attempts to provide location/highlight information for matched data in the document, but it's best not to think about that.

NIF's answers are:

Document selection: formula language, with the limitation that it can select only based on summary data.
Index entry creation: a combination of formula language (also with the summary limitation) and column and view configuration, resulting in a combined array+tree structure of individual entries, each of which is a combined array+map structure. This also involves specifying categories and keys for later querying.
Index updating: this is a bit more reliably quick than the FT update, and operates on similar lines: by default, it's triggered by DB change events, but can be updated manually and set to update less frequently.
Querying: querying is done via a series of operations to read the index structure. These operations focus on getting a single column's values, selecting entries/documents by key, and traversing entries sequentially with some hierarchical operations. The additional information included during entry creation can be used to eliminate the need to access the actual documents later in some situations.

Click-to-sort columns in views are effectively separate indexes that share much of their configuration information. NIF can also be combined with the full-text search index to insersect the pre-selected contents of the view with the FT query result.

When I dabbled with Fancy Views years ago, I focused primarily on the first two components. For selection, I allowed either the standard formula-based selection or an FT-search query, and for the entry creation I took the framework established by NIF and expanded it to allow any JSR-223 language to return a value, to work transparently with MIMEBean values, and to allow storing any Serializable value. The updating was skeletal - basically whenever I ran the agent - and the querying was half-assed, being limited to a single sort value and then just iteration. Still, this concept has promise: because the relative expense of generating the index is dwarfed on modern systems by the value of having a better resultant index, allowing complex operations like non-summary/MIMEBean access and alternative languages is very worthwhile.

The last couple days, I've been taking a look at CQEngine. CQEngine focuses on the final step - querying. It operates on Java Collections, which could range from an ArrayList of HashMaps to an arbitrarily-complex database index that implements Collection and for which the user provides key/value adapters. Where CQEngine shines is being able to build complex queries across multiple attributes and ordering the results, much like you would do in SQL.

I'm not 100% sold on the notion of CQEngine being a building block of a new view indexer, but it has some promise - and any new indexer doesn't have to be a full NIF replacement or even the only new indexer. The lack of a string-based query syntax makes it a bit awkward (would it be represented as a tree of XSP components?) and the fact that the built indexes aren't meant to be serialized means that they'd have to be rebuilt once per session (though the backing index itself wouldn't be). Combined with an initial indexer that takes the Couch* approach of a JavaScript/JSR-223 function to select documents and emit entry values and an index-update task, it could provide some interesting capabilities while being potentially much faster and more flexible than NIF for many operations.

Though this is currently all speculation, it's satisfying to know that, like with the OpenNTF Domino API, there's nothing standing in between speculation and a real system other than doing a bit of programming. It's also just one of many potential non-exclusive paths. One of the coolest aspects of the Cambrian explosion of NoSQL technology in recent years is that each system comes with its own take on indexing/querying and associated support systems have arisen that can be used side-by-side with a document store like Domino. The latter systems also have the side effect of further opening the window to the outside world.

So will I actually try to fully build out one of these index-replacement ideas? Eh, maybe. I get the itch every once in a while, either for performance concerns or my desire to index on MIMEBean data, and having a working index replacement could open up a world of new possibilities. So we'll see. For now, I put my CQEngine tinkering up on GitHub and I expect I'll keep the concept floating around in the back of my brain for the next couple days at least.

2 comments

This View Indexer Thing Might Be Worth Some Time

Sun Aug 05 16:51:00 EDT 2012

Tags: java views crazy

Aug 04 2012 - I Went Crazy and Made a Small View Indexer
Aug 05 2012 - This View Indexer Thing Might Be Worth Some Time

I've put a little more time into my small view indexer from the other day, and it seems even more promising now. The editor I made for it might give you some idea of the potential:

Fancy View Builder

Since the view building is all being done in Java, I decided to toss in the list of available JSR-223-compliant scripting languages currently available. The scripting context is given a variable "doc" that's a DocumentWrapper class I wrote that implements Map to make its use a bit easier. For added fun, I baked in knowledge of serialized Java objects stored in the document's items:

Fancy View Objects

Normally, the wrapper just returns the value of getItemValue(...), but when it encounters a MIME entity of type "application/x-java-serialized-object", it deserializes it and returns that instead, allowing for real structured data access.

I also realized that I don't have any reason to stick to only objects available in the stock JDK for index storage. Since using this will require extra Java code anyway, I may as well make my life easier and write my own "view entry" class. Once I get that sorted out, I can work on keeping the indexes updated with a server task, dealing with reader fields, and physical storage.

Hmm, maybe I should make myself an editor for normal views that looks like this one. It'd probably be a lot less hassle to deal with than the legacy one.

1 comment

I Went Crazy and Made a Small View Indexer

Sat Aug 04 11:45:00 EDT 2012

Tags: java views crazy

Aug 04 2012 - I Went Crazy and Made a Small View Indexer
Aug 05 2012 - This View Indexer Thing Might Be Worth Some Time

Warning: this post is almost entirely pie-in-the-sky nonsense, untethered from reality. Probably.

So yesterday, for some reason, I got to thinking about Domino's view indexing and whether or not it can be readily done better. The current view indexer does its intended task with aplomb (most of the time), but it has its problems. For one, iterating over view indexes in Java (read: everything you should do in Domino today) is dog slow. More theoretically, since it deals only with summary data for performance reasons, its ability to deal with structured data is limited - you can't deal with rich text or, for example, store any sort of Map in a Domino document and deal with it in a view (realistically).

I had the notion that it may be faster, at least in some cases, to do your own indexing: make a Java collection of your "view" data and serialize the result into a note. I decided to try a little test: take a view that consists only of a sorted column of 100k documents' UNIDs and compare it to a TreeSet of the same. My initial results were promising: iterating over the collection of documents (via getNextDocument() and a forall with getDocumentByUNID(), respectively) was about twice as fast with the Java version and was stored in about 17% the space, presumably thanks to document compression.

To be fair, that was really stacking the deck in favor of Java: I didn't care about reader fields or any entry metadata like position, and fetching every document in a large view is possibly the most expensive way to use it. Presumably, fetching just the first entry would make the view version much faster. Still, it shows that it might be useful in some cases, so I decided to try something a bit more complicated.

I made another view with 15k documents (task requests) and two columns: the date of the request, sorted, and the summary line. I matched that up with a TreeSet containing Lists of Maps and iterated over both to fetch and print the entry data. Again, the Java version ended up way faster, taking about 25% of the time.

Now it has me wondering about whether it'd be worth investigating further. It's fair to assume that Java can be treated as the new "native" language of Domino, so serialization is potentially a legitimate mechanism. Furthermore, a real implementation of an alternative view index wouldn't have to be as hopelessly naive as the one I put together: the basic java.util.* collections are almost definitely not the best storage mechanisms for a database index, and, better still, this is well-trodden territory. Since normal views would still exist in this magic future world, these fancy views could eschew a lot of the UI and build-performance requirements of the normal ones in favor of fancy tricks like native knowledge of serialized data structures. Imagine storing structured data like a table of line items in a MIME entity but then being able to query that in a "view". Reader fields could be handled by storing the applicable names in the "entry" and then comparing that two the user's names list at runtime, presumably like standard views do. A server task could handle monitoring document changes and making incremental changes (like happens now, really).

So it's theoretically possible. Would it be worth it? I'm not sure - maybe. It's fun to think about, in any event.

3 comments