DGQF and DQL as I Understand Them

Jul 26, 2018, 12:27 PM

Tags: domino dgqf

At CollabSphere this year, the big information coming from HCL was detail about the Domino General Query Facility (DGQF) and its associated language, Domino Query Language (DQL). They originally announced this a few weeks ago, but it was good to have had some time to let the dust settle and to see the specifics.

Because it was discussed alongside the domino-db Node.js package and because it's one of the first real new ways we'll interact with data in a Domino DB in a while, it's a bit difficult to identify just what it is and what it is not. Here's how I understood it:

What DGQF Is

DGQF is, at least conceptually, a "meta" layer on top of the existing NIF indexing facility. It doesn't provide a core change to the actual storage of documents, but instead treats existing view indexes as (roughly) analagous to both SQL table indexes and SQL views. It trawls through the design elements of a database to analyze their selection formulae and columns to use applicable ones as implicit indexes and also to allow access to arbtirary collections within queries.

Implicit Indexes

Other than the design collection and the "optimize document table" option in a DB, an NSF doesn't really have much in the way of indexing note contents by default. So, if you have a query asking for all documents where FirstName is Bob, a program has no choice but to look through every document for that key/value match. If, however, you create a view that has a column showing the FirstName field, you now have a much-faster index you can use. It's this sort of view that the DGQF picks up on implicitly, using them to accelerate queries: views showing all documents with either a default sort or "click to sort" column showing explicitly a field (and not a formula).

Access to Arbitrary Collection Data

For those qualifying views plus others, you can reference a view by name or alias to compare to a column value by programmatic name (often either the field name for simple columns or something like $4 by default for formulas).

"In" clauses

Additionally, you can use view (and folder, I think) names to refine queries for documents that are in one or more of these collections, equivalent to an "in" subquery or view reference in SQL

What DQL Is

In short, DQL is the human-readable query language used to access DGQF. It's reasonably SQL-like (though it is not SQL) and tends to look like FirstName='Bob' and in all ('Managers', 'Active Users'). This is the language you will use, and so "DGQF" and "DQL" will generally refer to the same thing in practice.

In practice, this is implemented as a new method on the Database class in each high-level language supported by Domino, plus a Node-styled variant in domino-db.

What DGQF and DQL Are Not

Since DGQF sits on top of NIF (and probably the FT index eventually), it's not a core change to data storage. Eventually, the same abilities and limits of Domino remain as they are with respect to this.

Additionally, DQL is, I believe, a query language only: it does not provide a mechanism for creating, modifying, or deleting existing documents. Instead, it is essentially a super-powered and much-smarter version of database.search(…): you can use it to find documents and the processing of them is up to your program.

That last point was a bit muddied by its pairing with the domino-db Node.js package: the Node.js package provides bulk operations that are paired with DQL queries, but that is a function of that library specifically, not DQL or DGQF.

Why It's Cool

Though it's not a reworking of the core NSF, what DGQF does do is abstract away a lot of the manual looping and lookups that we've always had to do, and it allows the system to optimize and do things more efficiently than when written out procedurally. So, while there's theoretically nothing that DGQF does that we couldn't do before, it allows us to do those things with far, far less code and with automatic optimization.

This brings Domino something that SQL servers have enjoyed for a long time. With a SQL statement, you can analyze the trouble spots of a slow-running query and add indexes to improve the speed, with the tooling helping to explain what's going on. DGQF+DQL brings this along for the ride: when you execute a DQL query, you have the option to dump out this "explain" output to see what specifically the facility did, which views it used, and how long each step took. So, if you have a long-running query, you can look to see if you can add an "index" view to automatically speed it up without having to change your code. And, since the language is an abstraction over the task of querying and not the sort of "burned in" process of a normal getNextDocument loop, it can be optimized and short-circuited by the underlying system without the developer having to know the decades of built-up knowledge of how to efficiently search a DB.

All in all, this is a very welcome addition to the server, and it certainly should improve a lot of common tasks.

Commenter Photo

Sean Cull - Jul 26, 2018, 1:29 PM

This is really helpful, Thanks

Commenter Photo

Knut Herrmann - Jul 26, 2018, 4:02 PM

Great review, thanks.

It is possible to update documents though. John Curtis showed in his presentation "Demo and Deep Dive: Domino General Query Facility and DominoDB NPM" how easy it is to accomplish this. I am looking forward to use DGQF instead of own REST services in the future.

Commenter Photo

Jesse Gallagher - Jul 26, 2018, 4:06 PM

That’s what I mean by how they were conflated in the presentation. As I recall, it was a bulk update call that was based on a DQL selection query, so one block of code but two steps. I get the impression that the Node adapter is geared towards that sort of combination.

Commenter Photo

Sean Cull - Jul 27, 2018, 4:58 AM

Jesse, could you explain a bit about the format of the retrieved data.

For most of our XPage views we currently use FT index searches to get a collection and then we iterate through the documents in the collection building a series of backing beans to power the views i.e. a 2 step process. It is this 2nd step of iterating through the documents that takes most time.

If these new methods were more SQL like ( SELECT CustomerName, City FROM Customers; )  we could bring back the data as part of the query results in 1 step rather than having a 2 step process.

Just for a bit more background we use 3 beans

The skinny view bean has just enough data to allow the records in the "view" to be sorted

The view page bean has enough information to display all the columns in a "view" a page at time

We also have a doc bean has all the data to display an individual document ( which may be a composite document ) 

Thank, Sean

Commenter Photo

Knut Herrmann - Jul 27, 2018, 6:53 AM

@Sean: You can define the fields in the query you'd like to receive as result. The result is JSON always.

@Jesse: You are right, the example was a bulk update. But, if the query/selection selects just one document by UNID, then you are able to update one defined document. I'd like to think that CRUD will be fully supported...

Commenter Photo

WK - Jul 27, 2018, 7:20 AM

So it's essentially an DSL that translates into old Domino API under the hood ? If You look at it that way then its not impressing at all. I was counting on something more revolutionary (modern ft search engine?).

Commenter Photo

Jesse Gallagher - Jul 27, 2018, 8:30 AM

I'll be able to tell for sure when the slides for the session are out, but I believe that the return value from the DGQF is just the document set, but then the domino-db Node module has "mget"-style operations that take that result and extract individual values into JSON objects. Looking around, the examples of DQL-the-language don't include an extraction part. Even if such a split is the case, though, the Node API could imply some new at-least-C-level calls for multi-item fetching, like the multi-item stamp all that's in there.

As for it being essentially a DSL, it is, but it still sounds very useful. While getting to the same level of performance for an even-moderately-complicated query (especially using the "in" operator for AND'ing against views) is theoretically possible without DGQF, having an underlying system that takes care of the nuts-and-bolts automatically in one call allows for transparent optimization, automatic improvement as the underlying code improves, and eliminating the "bridge tax" especially from the lsxbe classes.

Commenter Photo

myns - Jul 27, 2018, 9:42 AM

if dql is same with database.search(...) it will have some performance issues... if the optimized version, no problem. 

Commenter Photo

Jesse Gallagher - Jul 27, 2018, 9:57 AM

It's definitely not the same as database.search(...). database.search(...) is essentially fixed at the absolute worst-case scenario for a query, but DGQF automatically uses available views to speed up its searching, and you have the option to explicitly name views and folders for some of the criteria. If you have a database with no collated views at all, they'll probably be roughly comparable (though it's possible that DGQF has some extra smarts in there), but the idea will be to have "index views" to match your queries to get high performance.

Commenter Photo

John Curtis - Nov 2, 2018, 2:43 PM

So to truly elaborate would take more time than I have here.  DQL uses both view searching and ultra-high-speed database searching to gather results for each query term, then ANDs and ORs them (or NOTs them) together virtually without cost.  It also can access the entire content of views or folders and allow those contents to be ANDed or ORed like any other term.  Full text support is planned for V11.  But DQL sits atop the different search mechanisms and smartly injects results from one term into the next, assuring that the same document is searched as little as possible.

In Java and Lotusscript its output is a (Notes)DocumentCollection, useful everywhere that object can be used.  In core Domino, DQL's output is an IDTable (look at the Domino C API for details).  Post processing can provide ordering in Java/Lotusscript in V10.0.1. 

Using Node.js and the domino-db API, very fast retrieval and efficient bulk processing can be performed.  The domino-db code drops are part of the appdev pack, in beta as I write this.

But DQL is new and we believe you'll find it a breakthrough in several ways.  As always, we want to hear of your experiences.

-John Curtis (DQL inventor)

New Comment