frostillic.us

Showing posts for tag "dgqf"

Working Domino Views Into Jakarta NoSQL

Sun Jun 12 15:33:47 EDT 2022

Tags: dgqf jakartaee jnosql

A few versions ago, I added Jakarta NoSQL support to the XPages Jakarta EE Support project. For that, I used DQL and QueryResultsProcessor exclusively, since it's a near-exact match for the way JNoSQL normally goes things and QRP brought the setup into the realm of "good enough for the normal case".

However, as I've been working on a project that puts this to use, the limitations have started to hold me back.

The Limitations

The first trouble I ran into was the need to list, for example, the most recent 20 of an entity. This is something that QRP took some steps to handle, but it still has to build the pseudo-view anew the first time and then any time documents change. This gets prohibitively expensive quickly. In theory, QRP has enough flexibility to use existing views for sorting, but it doesn't appear to do so yet. Additionally, its "max entries" and "max documents" values are purely execution limits and not something to use to give a subset report: they throw an exception when that many entries have been processed, not just stop execution. For some of this, one can deal with it when manually writing the DQL query, but the driver doesn't have a path to do so.

The second trouble I ran into was the need to get a list composed of multiple kinds of documents. This one is a limitation of the default idiom that JNoSQL uses, where you do queries on named types of documents - and, in the Domino driver, that "type" corresponds to Form field values.

The Uncomfortable Solution

Thus, hat in hand, I returned to the design element I had hoped to skim past: views. Views are an important tool, but they are way, way overused in Domino, and I've been trying over time to intentionally limit my use of them to break the habit. Still, they're obviously the correct tool for both of these jobs.

So I made myself an issue to track this and set about tinkering with some ways to make use of them in a way that would do what I need, be flexible for future needs, and yet not break the core conceit of JNoSQL too much. My goal is to make almost no calls to an explicit Domino API, and so doing this will be a major step in that direction.

Jakarta NoSQL's Extensibility

Fortunately for me, Jakarta NoSQL is explicitly intended to be extensible per driver, since NoSQL databases diverge more wildly in the basics than SQL databases tend to. I made use of this in the Darwino driver to provide support for stored cursors, full-text search, and JSQL, though all of those had the bent of still returning full documents and not "view entries" in the Domino sense.

Still, the idea is very similar. Jakarta NoSQL encourages a driver author to write custom annotations for repository methods to provide hints to the driver to customize behavior. This generally happens at the "mapping" layer of the framework, which is largely CDI-based and gives you a lot of room to intercept and customize requests from the app-developer level.

Implementation

To start out with, I added two annotations you can add to your repository methods: @ViewEntries and @ViewDocuments. For example:

@RepositoryProvider("blogRepository")
public interface BlogEntryRepository extends DominoRepository<BlogEntry, String> {
    public static final String VIEW_BLOGS = "vw_Content_Blogs"; //$NON-NLS-1$
    
    @ViewDocuments(value=VIEW_BLOGS, maxLevel=0)
    Stream<BlogEntry> findRecent(Pagination pagination);
    
    @ViewEntries(value=VIEW_BLOGS, maxLevel=0)
    Stream<BlogEntry> findAll();
}

The distinction here is one of the ways I slightly break the main JNoSQL idioms. JNoSQL was born from the types of databases where it's just as easy to retrieve the entire document as it is to retrieve part - this is absolutely the case in JSON-based systems like Couchbase (setting aside attachments). However, Domino doesn't quite work that way: it can be significantly faster to fetch only a portion of a document than the data from all items, namely when some of those items are rich text or MIME.

The @ViewEntries annotation causes the driver to consider only the item values found in the entries of the view it's referencing. In a lot of cases, this is all you'll need. When you set a column in Designer to be just directly an item value from the documents, the column is by default named with the same name, and so a mapped entity pulled from this column can have the same fields filled in as from a document. This does have the weird characteristic where objects pulled from one method may have different instance values from the "same" objects from another method, but the tradeoff is worth it.

@ViewDocuments, fortunately, doesn't have this oddity. With that annotation, documents are processed in the same way as with a normal query; they just are retrieved according to the selection and order from the backing view.

Using these capabilities allowed me to slightly break the JNoSQL idiom in the other way I needed: reading unrelated document types in one go. For this, I cheated a bit and made a "document" type with a form name that doesn't correspond to anything, and then made the mapped items based on the view name. So I created this entity class:

@Entity("ProjectActivity")
public class ProjectActivity {
    @Column("$10")
    private String projectName;
    @Column("Entry_Date")
    private OffsetDateTime date;
    @Column("$12")
    private String createdBy;
    @Column("Form")
    private String form;
    @Column("subject")
    private String subject;

    /* snip */
}

As you might expect, no form has a field named $10, but that is the name of the view column, and so the mapping layer happily populates these objects from the view when configured like so:

@RepositoryProvider("projectsRepository")
public interface ProjectActivityRepository extends DominoRepository<ProjectActivity, String> {
    @ViewEntries("AllbyDate")
    Stream<ProjectActivity> findByProjectName(@ViewCategory String projectName);
}

These are a little weird in that you wouldn't want to save such entities lest you break your data, but, as long as you keep that in mind, it's not a bad way to solve the problem.

Future Changes

Since this implementation was based on fulfilling just my immediate needs and isn't the result of careful consideration, it's likely to be something that I'll revisit as I go. For example, that last example shows the third custom annotation I introduced: @ViewCategory. I wanted to restrict entries to a category that is specified programmatically as part of the query, and so annotating the method parameter was a great way to do that. However, there are all sorts of things one might want to do dynamically when querying a view: setting the max level programmatically, specifying expand/collapse behavior, and so forth. I don't know yet whether I'll want to handle those by having a growing number of parameter annotations like that or if it would make more sense to consolidate them into a single ViewQueryOptions parameter or something.

I also haven't done anything special with category or total rows. While they should just show up in the list like any other entry, there's currently nothing special signifying them, and I don't have a way to get to the note ID either (just the UNID). I'll probably want to create special pseudo-items like @total or @category to indicate their status.

There'll also no doubt be a massive wave of work to do when I turn this on something that's not just a little side project. While I've made great strides in my oft-mentioned large client project to get it to be more platform-independent, it's unsurprisingly still riven with Domino API references top to bottom. While I don't plan on moving it anywhere else, writing so much code using explicit database-specific API calls is just bad practice in general, and getting this driver to a point where it can serve that project's needs would be a major sign of its maturity.

2 comments

DQL, QueryResultsProcessor, and JNoSQL

Thu Jan 13 14:32:04 EST 2022

Tags: dgqf domino jakartaee jnosql

As I've been adding new technologies to and talking about the XPages Jakarta EE project, I've kind of danced around a major missing layer: data access.

Technically, the toolchain has provided Domino data access all along, by way of having the same contextual sessions and database as XPages. You could use those to access whatever data you want, and they'd do the job as well as they ever do (c'est-?-dire: poorly). Beyond that, though, there's no equivalent to the (questionable) xp:dominoDocument and xp:dominoView components of XPages, and definitely no pre-provided object-to-database mapper.

The answer is pretty clear: Jakarta NoSQL. This API isn't quite finalized, but it's been usable for a long time: I wrote a Darwino driver for it years ago, and that driver is powering this very blog. I also wrote a Domino driver years ago, but it was very much a proof-of-concept: since it pre-dated DQL, it used formula queries for everything, and thus would scale extremely poorly. It was a nice exercise, but not anything useful.

For XPages JEE, I decided to take another swing at that. The implementation of the driver will warrant a tale on its own, but for now I'd like to focus on the DQL side of it.

DQL

I talked a bit about DQL when it came out, back when it wasn't well-understood, but since then I haven't actually had much occasion to put it to use. For the times I've needed complex Domino data access since then, it's been built on pre-existing operations on top of views. While adding DQL has been something I've considered from time to time, it'd never hit the threshold of being worth it: our needs involve extracting tons of data to bulk send it to service clients, and so views have remained necessary. While we could in theory alter our querying and filtering to select documents and project those selections onto the views, it'd have been a lot of work for partial benefits.

DQL itself has gotten more capable in the intervening years, and just on its own it's a perfect match for JNoSQL needs. Since all JNoSQL operations are sent to the driver as either individual doc IDs or an arbitrary query, something like DQL is required, and it's up to the task now.

It's half of the story, though. What DQL (by way of the DominoQuery object) gives you is a DocumentCollection, effectively just the list of note IDs. You can, as I'd hypothesized about doing, apply that against a view to extract data, but that still requires you to separate out the act of view management from the act of doing queries. If you want to have data sorted or categorized, you would still have to create an equivalent or superset view.

QueryResultsProcessor

So that's where the addition of QueryResultsProcessor comes in. QRP is technically distinct from DQL - you can use it to process arbitrary document collections, for example - but they're certainly a conceptual match. If you're comparing it to a SQL statement, DQL is the "FROM foo" and "WHERE x" parts, while QRP is the "SELECT a,b,c", "ORDER BY y", and "GROUP BY z" parts.

The general way it works is that you:

Create a QueryResultsProcessor from a Database instance (as opposed to Session - this distinction becomes important later)
Feed it sources of documents: DQL queries or arbitrary document collections
Add any desired columns to extract data. These are Domino-style columns, and you can also specify sorting and categorization here, as you would when building a view
Since data may come from multiple databases, you can also customize column formulas to account for that
Execute the process and retrieve the results, currently either as JSON or as a "view". More on these "views" later

When I first heard about QRPs, I had a concern with step 2: I'd thought that you could only pass a built DocumentCollection to the processor, which would significantly limit the room for Domino to add behind-the-scenes efficiencies. However, my fears were unfounded; the ability to pass in a DominoQuery object and the DQL directly and let the QRP execute it means that HCL is free to do whatever they want to make it fast. That's the sort of thing that makes SQL queries potentially so stupidly efficient: because you're just asking the database for results, the DB is free to optimize the heck out of them. This pairing potentially brings that to Domino, and that's what makes it important.

JSON Output

The executeToJson method is pretty straightforward if a somewhat-peculiar choice. It has no parameters, and returns the results of your query as reasonably-formatted JSON. It's unfortunate that this returns a String and not an InputStream, which adds some inherent inefficiency to dealing with it on the Java side, but that will only really hurt with very-large data sets.

Along with the requested fields, formula results, and aggregations, the document entries include the note ID (oddly in "formula" format) and the database file path, so you can use that to open up the document.

Anyway, this is a workmanlike format and can be potentially just sent to REST clients directly, though it'd be good form to at least strip out the DB paths and note IDs.

View Output

Now here's the spicy one. The executeToView method stores the results in a very-weird type of view. This has a few big advantages over the JSON mechanism:

The view persists in the database, up to a number of hours you specify programmatically. This allows you to essentially offload some extra caching to the database, which is ideal
You can use ViewNavigator and other efficient mechanisms to work with the view data, meaning you don't have the "here's a big result blob in memory" problem you have with the JSON format
Since it's a "view", anything that works with view data can work with it. This is presumably the reason it's implemented this way at all, rather than as some new kind of entity - building on existing mechanisms
The "anything that works with view data" doesn't just mean things like ViewNavigator: it also means the Notes client and view data sources

Now, these "views" have a lot of weird characteristics. It's useful to see the specifics listed out like that, but they all derive from a core lesson to ingest:

This is not a stored query; it is a cached result.

These views are not auto-updated, nor is there any mechanism I know of to refresh them outside of deleting and re-creating them. They're equivalent in concept to if you took the JSON from the first type and stored it in a document somewhere: it'll only change if you change it. The only way Domino will act on them is to delete them when they're expired.

Anyway, the data in these views is the same data that would go to the JSON format, just stored as Notes collation data instead of a string. It contains columns, potentially categorized and aggregated, for the data you requested, as well as hidden "$DBPath" and "$NoteID" columns at the end. The entry-level note ID (the one from entry.getNoteID()) is arbitrary and intended to not represent an actual document - since, after all, the documents may come from distinct databases. I've found the value of entry.getUniversalID() to be the doc's original UNID, but this is best treated as not a guarantee and so should not be used.

Designer Rights

So here's a fun catch: though any Reader can perform a query, you need Designer access to create a view. This seemed like a problem to me at first, since I'd want the generated results to be from a specific user for reader-field purposes, but it's not really an impediment, at least when you're in an environment like XPages.

Above, I mentioned that the fact that you create a QueryResultsProcessor object from a Database is important, and this is one of the reasons why. Though traditionally you wouldn't mix descendants of session and sessionAsSigner together, there's no actual rule against it. You can re-open your context database with sessionAsSigner, make a QRP object from that, and then feed it a DominoQuery object created from the non-signer database:

Database database = ExtLibUtil.getCurrentDatabase();
Session sessionAsSigner = ExtLibUtil.getCurrentSessionAsSigner();
Database databaseAsSigner = sessionAsSigner.getDatabase(database.getServer(), database.getFilePath());

DominoQuery dominoQuery = database.createDominoQuery();
QueryResultsProcessor qrp = databaseAsSigner.createQueryResultsProcessor();
qrp.addDominoQuery(dominoQuery, "some DQL", null);
View result = qrp.executeToView("some view name");

Because the QueryResultsProcessor uses the provided DominoQuery object as the "engine" for the DQL search, the query will use the normal user's rights while the processing will use the signer rights.

Naming and Expiring Results

As seen there, you have to name your views. While you could in theory use this mechanism to kind of manage your own views for general use and name them things like "People By First Name" or whatever, you'll likely want to work with them programmatically and name them based on your query input.

In the case of this JNoSQL driver, I compute a predictable-from-input hash-based name from the name of the creating class, the current user, and the sort/skip/limit attributes of the incoming query. You could really do whatever you want here, but having at least some sort of hash like this is likely the way to go.

Now there's the matter of detecting when you need to refresh the data. In some applications, it may suffice to go with the "expire in X hours" parameter when creating the view, though that's extremely coarse and only really useful on its own for specific needs (like a daily report).

The tack I took here was to try to do an efficient check of view creation time compared to the last data modification time from the source database. The Database class only has a "last modified" time in general, but I can't very well use that when my results caches are being added as design elements: a second distinct query would "invalidate" the first even when the data hasn't changed. There might be a proper way to get this in lotus.domino, the NAPI has a wrapper for NSFDbModifiedTimeByName: NotesSession.getLastDataModificationDateByName. That lets you get the last data-mod time in epoch seconds, and you can then compare that to the creation time of the view.

While it's unfortunate that you have to remove the view outright to refresh it instead of doing a delta update like NIF would do, I get it, and it's generally fast enough. Plus, there's enough hand-wavy stuff going on with feeding the DQL query to the QRP that Domino would be free to secretly retain results for a bit and do deltas internally if it so desires.

Storing Result Views

The other interesting aspect of creating a QRP object from a Database and not a Session is that that DB serves as the destination to house the views. While in a single-DB environment it would seem very natural to just store the views in the same place as the data, there's no particular requirement to do so. Moreover, if you're querying multiple databases, you're naturally not going to do this for all docs anyway, so you'll be forced to conceptualize this anyway.

Now, personally, I'm fine with a bunch of temporary machine-named views hanging out in the NSF (especially since the names are wrapped in parentheses to hide them from default UI listings), I can see why it could be annoying. For one, these views sync to an ODP in Designer, which I put in as an Aha idea to change, but might actually rightly be called a bug. Beyond that, while these views won't meaningfully contribute to NIF's workload (since NIF will skip them), they're unsightly and would get in the way if you're trying to tend to the design of your NSF like a garden.

So you might want to have a side database to store these views, and this could also be a way to get around the "needing Designer access" requirement if you're in an environment where you don't have a signer session. In the Notes client, you could store the results in a local NSF; on the server, you could make a "scratch" NSF somewhere to house them, and then add readers to the view design note when doing so to prevent leaking data across users and apps.

Conclusion

Anyway, this is all pretty neat. Reusing view design elements to just be static containers for collation data is weird, but I get the practical reasons why it makes sense. Importantly, this pairing solves some very-real problems with querying and extracting data from Domino. For example, if you do all of your querying via this route, you can use DQL's "EXPLAIN" capability to actually get some insight into database performance for once. You could imagine having an optional mode where you log the EXPLAIN results and execution times for all queries your app is performing, and then manually create "index" views to fix hotspots. It's quite satisfying to finally get that kind of ability in Domino. It'd be neat if that also came to QueryResultsProcessor.

I'm looking forward to expanding the JNoSQL driver further and then either using that directly in client work or adapting the code I use there. I'll definitely add such a logging capability, which will go a long way to put some numbers to the "feels slow" problem that can crop up. Beyond that, barring any show stoppers, I'm thoroughly excited by the prospect of moving away from fetching explicitly-named views in code and switching to an idiom of querying the pool of documents and letting the database make it work for me.

No comments

DGQF and DQL as I Understand Them

Thu Jul 26 12:27:59 EDT 2018

Tags: domino dgqf

At CollabSphere this year, the big information coming from HCL was detail about the Domino General Query Facility (DGQF) and its associated language, Domino Query Language (DQL). They originally announced this a few weeks ago, but it was good to have had some time to let the dust settle and to see the specifics.

Because it was discussed alongside the domino-db Node.js package and because it's one of the first real new ways we'll interact with data in a Domino DB in a while, it's a bit difficult to identify just what it is and what it is not. Here's how I understood it:

What DGQF Is

DGQF is, at least conceptually, a "meta" layer on top of the existing NIF indexing facility. It doesn't provide a core change to the actual storage of documents, but instead treats existing view indexes as (roughly) analagous to both SQL table indexes and SQL views. It trawls through the design elements of a database to analyze their selection formulae and columns to use applicable ones as implicit indexes and also to allow access to arbtirary collections within queries.

Implicit Indexes

Other than the design collection and the "optimize document table" option in a DB, an NSF doesn't really have much in the way of indexing note contents by default. So, if you have a query asking for all documents where FirstName is Bob, a program has no choice but to look through every document for that key/value match. If, however, you create a view that has a column showing the FirstName field, you now have a much-faster index you can use. It's this sort of view that the DGQF picks up on implicitly, using them to accelerate queries: views showing all documents with either a default sort or "click to sort" column showing explicitly a field (and not a formula).

Access to Arbitrary Collection Data

For those qualifying views plus others, you can reference a view by name or alias to compare to a column value by programmatic name (often either the field name for simple columns or something like $4 by default for formulas).

"In" clauses

Additionally, you can use view (and folder, I think) names to refine queries for documents that are in one or more of these collections, equivalent to an "in" subquery or view reference in SQL

What DQL Is

In short, DQL is the human-readable query language used to access DGQF. It's reasonably SQL-like (though it is not SQL) and tends to look like FirstName='Bob' and in all ('Managers', 'Active Users'). This is the language you will use, and so "DGQF" and "DQL" will generally refer to the same thing in practice.

In practice, this is implemented as a new method on the Database class in each high-level language supported by Domino, plus a Node-styled variant in domino-db.

What DGQF and DQL Are Not

Since DGQF sits on top of NIF (and probably the FT index eventually), it's not a core change to data storage. Eventually, the same abilities and limits of Domino remain as they are with respect to this.

Additionally, DQL is, I believe, a query language only: it does not provide a mechanism for creating, modifying, or deleting existing documents. Instead, it is essentially a super-powered and much-smarter version of database.search(…): you can use it to find documents and the processing of them is up to your program.

That last point was a bit muddied by its pairing with the domino-db Node.js package: the Node.js package provides bulk operations that are paired with DQL queries, but that is a function of that library specifically, not DQL or DGQF.

Why It's Cool

Though it's not a reworking of the core NSF, what DGQF does do is abstract away a lot of the manual looping and lookups that we've always had to do, and it allows the system to optimize and do things more efficiently than when written out procedurally. So, while there's theoretically nothing that DGQF does that we couldn't do before, it allows us to do those things with far, far less code and with automatic optimization.

This brings Domino something that SQL servers have enjoyed for a long time. With a SQL statement, you can analyze the trouble spots of a slow-running query and add indexes to improve the speed, with the tooling helping to explain what's going on. DGQF+DQL brings this along for the ride: when you execute a DQL query, you have the option to dump out this "explain" output to see what specifically the facility did, which views it used, and how long each step took. So, if you have a long-running query, you can look to see if you can add an "index" view to automatically speed it up without having to change your code. And, since the language is an abstraction over the task of querying and not the sort of "burned in" process of a normal getNextDocument loop, it can be optimized and short-circuited by the underlying system without the developer having to know the decades of built-up knowledge of how to efficiently search a DB.

All in all, this is a very welcome addition to the server, and it certainly should improve a lot of common tasks.

10 comments