XPages Renderers and Thread Safety

Thu May 27 09:56:13 EDT 2021

Tags: xpages

I got to thinking yesterday about renderers in XPages - the part of the stack that takes the abstract back-end representation and actually turns it into HTML. They've been an interest of mine for a good while and they have interesting characteristics worth revisiting from time to time.

One of those characteristics is their general statelessness: the way they work, there's generally only one instance of the renderer object, and then it's applied to many pages. This is reflected in the Javadoc for the base javax.faces.render.Renderer class (Java EE 5 here because that's what XPages is based on):

Individual `Renderer` instances will be instantiated as requested during the rendering process, and will remain in existence for the remainder of the lifetime of a web application. Because each instance may be invoked from more than one request processing thread simultaneously, they MUST be programmed in a thread-safe manner.

As a quick aside, that all-caps "MUST" is an application of the delightful RFC 2119, which defines common meanings for those types of words in specs. It's worth reading.

Implementation

Anyway, how does this shake out in practice? Well, we can use the Bootstrap theme as provided by IBM back when it was open source as a baseline example of how it's done. If you look at the, for example, ResponsiveAppLayoutRenderer class, you can see a ton of code, but no instance variables. Looking at any number of classes, there are static constants, but no instance variables. In general, that bundle as present on GitHub there is a great example of the craft.

In general, thread safety is tricky, and the easiest way to make a class thread-safe is to not have any instance variables. These renderer examples take that route, and for good reason: since renderers are instantiated once per app, they're potentially shared across all pages in the app, multiple components within a page, and multiple instances of the same page.

Demonstration

Thread safety in general and in Java in particular is a huge topic, and it's something that harries basically all programmers working in a potentially-threaded environment, even when the code you're writing doesn't seem troublesome. It's one thing if you're explicitly writing multithreaded code, divvying up tasks among executors or something, but what would be the trouble here? Well, fortunately, writing a demonstration is fairly quick. To do so, I made a new NSF with a few design elements. First, a theme:

1
2
3
4
5
6
7
8
9
<theme>
    <control>
        <name>ViewRoot</name>
        <property>
            <name>rendererType</name>
            <value>renderer.TestViewRoot</value>
        </property>
    </control>
</theme>

You can name the theme whatever you want, since the name of a theme is intended to be of human interest only. It will only matter that it's then selected in the Xsp Properties file.

Then, a customized faces-config.xml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
<?xml version="1.0" encoding="UTF-8"?>
<faces-config>
  <render-kit>
    <renderer>
      <component-family>javax.faces.ViewRoot</component-family>
      <renderer-type>renderer.TestViewRoot</renderer-type>
      <renderer-class>renderer.TestViewRoot</renderer-class>
    </renderer>
  </render-kit>
  <!--AUTOGEN-START-BUILDER: Automatically generated by HCL Domino Designer. Do not modify.-->
  <!--AUTOGEN-END-BUILDER: End of automatically generated section-->
</faces-config>

This has prepared us to use a custom renderer for the view root, which is the main page itself. Finally, as the last step before the renderer class, I made five XPages, named "home1.xsp" through "home5.xsp". The content is irrelevant, so I made them the simplest possible:

1
2
<?xml version="1.0" encoding="UTF-8"?>
<xp:view xmlns:xp="http://www.ibm.com/xsp/core"/>

Now, to the renderer class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package renderer;

import javax.faces.component.UIComponent;
import javax.faces.context.FacesContext;
import javax.faces.render.Renderer;

import com.ibm.xsp.component.UIViewRootEx;

public class TestViewRoot extends Renderer {
    String viewId;
    
    @Override
    public void encodeBegin(FacesContext context, UIComponent component) {
        UIViewRootEx viewRoot = (UIViewRootEx)context.getViewRoot();
        viewId = viewRoot.getViewId();
    }
    @Override
    public void encodeEnd(FacesContext context, UIComponent component) {
        UIViewRootEx viewRoot = (UIViewRootEx)context.getViewRoot();
        String endingViewId = viewRoot.getViewId();
        if(!this.viewId.equals(endingViewId)) {
            System.out.println("finished rendering view ID "+ endingViewId + " - started with " + this.viewId);
        }
    }
}

This class gets the view ID (e.g. "/home1") in encodeBegin and again in encodeEnd. When the two are different - essentially, our bug condition - it emits a message to the console. This "viewId" is a stand-in for any sort of expected shared state, to demonstrate that the renderer can make no assumptions that encodeBegin and encodeEnd are called in sequence for the same page or page instance. The latter could be demonstrated by checking viewRoot.getUniqueViewId() instead, which identifies the specific page instance and is thus distinct even across different users on the same page.

Then, I wrote a script that, in a multithreaded fashion, requests home1.xsp through home5.xsp randomly for a little while, and it was only a couple seconds before the messages started appearing:

1
2
3
4
5
[0B4C:000A-14A0] 05/27/2021 09:25:21 AM  HTTP JVM: finished rendering view ID /home4 - started with /home2
[0B4C:0023-1388] 05/27/2021 09:25:21 AM  HTTP JVM: finished rendering view ID /home2 - started with /home1
[0B4C:001E-0C88] 05/27/2021 09:25:21 AM  HTTP JVM: finished rendering view ID /home2 - started with /home1
[0B4C:001B-1784] 05/27/2021 09:25:21 AM  HTTP JVM: finished rendering view ID /home3 - started with /home1
[0B4C:001C-1AE8] 05/27/2021 09:25:22 AM  HTTP JVM: finished rendering view ID /home2 - started with /home5

Knock-On Demo

Okay, so that's bad, but there's a subtle other problem I created here and that's to do with code reuse via subclassing. In general, renderers are subclassed out the wazoo. Just look at the type hierarchy for FacesRendererEx:

FacesRendererEx Type Hierarchy

It goes on for a while like that.

While renderers being subclass-friendly isn't a "MUST"-type rule like thread safety, it's both an associated benefit of that and a general cultural idiom. But imagine if I were to subclass my example above and override just the encodeBegin portion (to represent, say, changing just the output of the page header but leaving the footer the same):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package renderer;

import javax.faces.component.UIComponent;
import javax.faces.context.FacesContext;

public class TestViewRootEx extends TestViewRoot {
    @Override
    public void encodeBegin(FacesContext context, UIComponent component) {
        // Do something else here
    }
}

If I use that as the renderer, I get a dramatically-new look for my page:

NPE on subclass

The trouble here is that I overrode encodeBegin but didn't call super.encodeBegin - which I naturally wouldn't, since I explicitly don't want whatever the parent class is outputting. However, since encodeEnd assumes that its version of encodeBegin runs and sets this.viewId, I hit a NullPointerException when it tries to access it. Curses.

Conclusion

Anyway, this is all a long-winded way of pointing out that designing for thread safety is tricky business, and it can crop up when you wouldn't otherwise expect it. It's good here that the JSF developers included that "MUST" business in the Javadoc, but, unfortunately, Java-the-language doesn't have a way of actually enforcing it, making room for this sort of thing to creep in.

My Tortured Relationship With libnotes

Sat May 22 12:31:10 EDT 2021

Tags: c java

A tremendous amount of my work lately involves wrangling the core Notes library, variously "libnotes.dylib", "libnotes.so", or (for some reason) "nnotes.dll" (edit: see the comments!). I do almost all of my daily work in Liberty servers on the Mac loading this library, my first major use of the Domino Docker image was to use it as an overgrown carrier for this native piece, and in general I spend a lot of time putting this to use.

It ain't easy, though! Unlike a lot of native libraries that you can just kind of load from wherever, libnotes is extremely picky about its loading environment. There are a few main ways that this manifests.

Required-Library References

libnotes isn't standalone, and it doesn't just rely on standard OS stuff like libc. It also relies on other Notes-specific libraries like libxmlproc and libgsk8iccs, and some of those refer back to each other. This all shakes out in normal practice, since they're all next to the Notes/Domino executable and each other, but it makes things finicky when you're running from outside there.

This seems to be the most finicky on macOS, where the references to each library are marked with @executable_path relative paths, meaning relative to the running executable.

I've wrangled with this quite a bit over the years, but the upshot is that, on macOS and Linux, you really want to set some environment variables before you ever load your program. Naturally, since your running program can't do anything about what happened before it was loaded, this means you have to balance the teacups in your environment beforehand.

Non-Library Files

Beyond the dynamic libraries, libnotes also need some program-support executable files, things like string resource files and whatnot. And, very importantly, it needs an active data directory with an ID, notes.ini, and names.nsf at least. The data-directory contents bits are at least a little more forgiving: though there's still a hard requirement that they be present on the filesystem (since libnotes loads them by string path, not as configurable binary streams), you could at least bundle them or copy them around. For example, for my Dockerized app runners, I tend to have some basic versions of those in the repo that get copied into the Docker container's notesdata directory during build.

Working Around It

As I mentioned, the main way I go about dealing with this is by telling whatever is running my program to use applicable environment variables, ideally in a reproducible config-file-based way. This works perfectly with Docker and with tycho-surefire-plugin in Maven, but doesn't work so well for maven-surefire-plugin (the normal unit test runner) or Eclipse's JUnit tools. In Eclipse's case, I can at least fill in the environment variables in the Run Configuration. It hampers me a bit (I can't just right-click a test case and run it individually without first running the whole suite or setting up a configuration specially), but it works.

I gave a shot recently to copying the Mac dylibs and programmatically fiddling with otool and install_name_tool to adjust their dependency paths, and I got somewhere with that, but that somewhere was an in-libnotes fatal panic, so I'm clearly missing something. And besides, even if I got that working, it'd be a bit of a drag.

What Would Be Nice

What would be really nice would be a variant of this that's just more portable - something where I can just System.load a library, call NotesInitExtended to point to my INI and ID, and be good to go. I'm not really sure what this would entail, especially since I'd also want libinotes, libjnotes, and liblsxbe. I do know that I don't have the tools to do it, which fortunately frees me up to idly speculate about things I'd like to have delivered to me.

As long as I'm wishing for stuff, I'll say what would be even cooler would be a WebAssembly library usable with Wasmer, which is a multi-language WebAssembly runtime. The promise there is that you'd have a single compiled library that would run on any OS on any processor that supports WebAssembly, from now until forever. I'm not sure that this would actually be doable at the moment - for one, I don't know if callback parameters work, which would be fairly critical. Still, my birthday is coming up, and that would be a nice present.

OpenNTF May 2021 Webinar Followup

Thu May 20 13:33:09 EDT 2021

Tags: openntf

Earlier today, I took part in OpenNTF's May webinar on recent project updates. During that, I gave a quick overview of several of the projects I've worked on in the past year, and I figured it'd be useful to put together some followup notes for future reference.

Project Links

I didn't include any actual links to the projects, which could be useful:

Slides

To begin with, I posted my slides on SlideShare:

OpenNTF Webinar May 2021 - Jesse from Jesse Gallagher

NSF ODP Tooling

I gave a brief overview of the NSF ODP Tooling, but, if anyone is interested in a more in-depth overview, my presentation from CollabSphere last year should cover it:

Import and Export for Designer

My quick mention of this refreshed project focused mainly on importing components, but generating new ones for upload is a pretty-critical part of it too. The tool comes with an option for that, which (to the credit of the original creators) does a splendid job packaging your controls and applying a license. The PDF from the project page goes over that in detail. Once the components are packaged up, you can upload them as a release on OpenNTF and I'll figure out how to add it to the download list.

More Notes on Filesystem and Charset Portability

Tue May 18 15:39:40 EDT 2021

Tags: java
  1. Java Hiccups
  2. Bitwise Operators
  3. Java Grab Bag 2
  4. Java Travelogue: The Care and Feeding of Locales
  5. More Notes on Filesystem and Charset Portability

A few months back, I talked about some localization troubles in the NSF ODP Tooling and how it's important to be explicit in your handling of this sort of thing to make sure your code will work in an environment that isn't specifically "Linux or macOS in an en-US environment".

Well, after making a bunch of little tweaks over the last few days, I have two additional tips in this arena! Specifically, my foes this round came from three sources: Windows, my use of a ZIP file filesystem, and the old reliable charset.

Path Separators

The first bit of trouble had to do with how those two things interact. For a long time, I've been in the (commonly-held) habit of using File.separator and File.separatorChar to get the default path separator for the system - that is, \ on Windows and / on most other platforms. Those work well enough - no real trouble there.

However, my problem came from using the Java NIO ZIP filesystem on Windows. Take this bit of code:

1
2
3
4
5
6
7
public static String toJavaClassName(Path path) {
	String name = path.toString();
	if(name.endsWith(".java")) {
		return name.substring(0, name.length()-".java".length()).replace(File.separatorChar, '.');
	}
	/* Other conditions here */
}	

When Path is a path on the local filesystem, that works just fine, taking a path like "com/example/Foo.java" and turning it into "com.example.Foo". It also works splendidly on macOS and Linux in all cases, the two systems I actually use. However, when path represents a path within a ZIP file and you're working on Windows, it fails, returning a "class name" like "com/example/Foo".

This is exactly what happens when compiling an ODP using a remote Domino server running on Windows. For the portability reasons mentioned in my previous post, the client sends a ZIP of the ODP to the server and then the compilation pulls directly out of that ZIP instead of writing it out to the filesystem. The way the ZIP filesystem driver in Java is written, it uses / for its path separator on all platforms, which is consistent with dealing with ZIP files generally. But, when mixed with the native filesystem separator, that line resolved to:

1
return "com/example/Foo".replace('\\', '.');

...and there's the problem. The fix is to change the code to instead get the directory separator from the contextual filesystem in question:

1
2
3
4
5
6
7
public static String toJavaClassName(Path path) {
	String name = path.toString();
	if(name.endsWith(".java")) {
		return name.substring(0, name.length()-".java".length()).replace(path.getFileSystem().getSeparator(), ".");
	}
	/* Other conditions here */
}

A little more verbose, sure, but it has the advantage of functioning consistently in all environments.

This also has significant implications if you use static properties to store filesystem-dependent elements. This came into play in my OnDiskProject class, which contains a bunch of path matchers to find design elements to import from the ODP. Originally, I kept these in a static property that was generated by writing them Unix-style, then running them through a generator to use the platform-native separator character. This had to change, since the actual ODP store may or may not be the platform-native filesystem. This sort of thing is pervasive, and it'll take me a bit to get over my long-standing habit.

Over-Interpreting Character Sets

This one is similar to the charset troubles in my previous post, but ran into subtle trouble in the ODP compiler. Here was the sequence of events:

  1. The ODP Compilers reads the XSP source of a page or custom control using ODPUtil, which read in the string as UTF-8
  2. It then passes that string to the Bazaar's DynamicXPageBean
  3. That method uses StringReader and an IBM Commons ReaderInputStream to read the content
  4. That content is then read in by FacesReader, which uses the default DOM parser to read the XML

In general, that flow worked just fine. However, that's because, in general, I write US-ASCII markup. However, when the page contains, say, Czech diacritics, this goes off the rails. Somewhere in the interpretation and re-interpretation of the file, the UTF-8-iness of it breaks.

Fortunately, this one was a clean one: XML has its own mechanism for declaring its encoding (and it's almost always UTF-8 anyway), so my code doesn't actually need to be responsible for interpreting the bytes of the file before it gets to the DOM parser. So I added a version of the Bazaar method that takes an InputStream directly and modified NSF ODP to use it, with no extra interpretation in between.

Tinkering With Cross-Container Domino Addins

Sun May 16 13:35:54 EDT 2021

Tags: docker domino

A good chunk of my work lately involves running distinct processes with a Domino runtime, either run from Domino or standalone for development or CI use. Something that had been percolating in the back of my mind was another step in this: running these "addin-ish" programs in Docker in a separate container from Domino, but participating in that active Domino runtime.

Domino addins in general are really just separate processes and, while they gain some special properties when run via load foo on the console or OSLoadProgram in the C API, that's not a hard requirement to getting a lot of things working.

I figured I could get this working and, armed with basically no knowledge about how this would work, I set out to try it.

Scaffolding

My working project at hand is a webapp run with the standard open-liberty Docker images. Though I'm using that as a starting point, I had to bring in the Notes runtime. Whether you use the official Domino Docker images from Flexnet or build your own, the only true requirement is that it match the version used in the running server, since libnotes does a version check on init. My Dockerfile looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
FROM --platform=linux/amd64 open-liberty:beta

USER root
RUN useradd -u 1000 notes

RUN chown -R notes /opt/ol
RUN chown -R notes /logs

# Bring in the Domino runtime
COPY --from=domino-docker:V1200_03252021prod /opt/hcl/domino/notes/latest/linux /opt/hcl/domino/notes/latest/linux
COPY --from=domino-docker:V1200_03252021prod /local/notesdata /local/notesdata

# Bring in the Liberty app and configuration
COPY --chown=notes:users /target/jnx-example-webapp.war /apps/
COPY --chown=notes:users config/* /config/
COPY --chown=notes:users exec.sh /opt/
RUN chmod +x /opt/exec.sh

USER notes

ENV LD_LIBRARY_PATH "/opt/hcl/domino/notes/latest/linux"
ENV NotesINI "/local/notesdata/notes.ini"
ENV Notes_ExecDirectory "/opt/hcl/domino/notes/latest/linux"
ENV Directory "/local/notesdata"
ENV PATH="${PATH}:/opt/hcl/domino/notes/latest/linux:/opt/hcl/domino/notes/latest/linux/res/C"

EXPOSE 8080 8443

ENTRYPOINT ["/opt/exec.sh"]

I'll get to the "exec.sh" business later, but the pertinent parts now are:

  • Adding a notes user (to avoid permissions trouble with the data dir, if it comes up)
  • Tweaking the Liberty container's ownership to account for this
  • Bringing in the Domino runtime
  • Copying in my WAR file from the project and associated config files (common for Liberty containers)
  • Setting environment variables to tell the app how to init

So far, that's largely the same as how I run standalone Notes-runtime-enabled apps that don't talk to Domino. The only main difference is that, instead of copying in an ID and notes.ini, I instead mount the data volume to this container as I do with the main Domino one.

Shared Memory

The big new hurdle here is getting the separate apps to participate in Domino's shared memory pool. Now, going in, I had a very vague notion of what shared memory is and an even vaguer one of how it works. Certainly, the name is straightforward, and I know it in Domino's case mostly as "the thing that stops Notes from launching after a crash sometimes", but I'd need to figure out some more to get this working. Is it entirely a filesystem thing, as the Notes problem implies? Is it an OS-level thing with true memory? Well, both, apparently.

Fortunately, Docker has this covered: the --ipc flag for docker run. It has two main modes: you can participate in the host's IPC pool (essentially like what a normal, non-contained process does) or join another container specifically. I opted for the latter, which involved changing both the Domino launch arguments.

For Domino, I added --ipc=shareable to the argument list, basically registering it as an available host for other containers to glom on to.

For the separate app, I added --ipc=container:domino, where "domino" is the name of the Domino container.

With those in place, the "addin" process was able to see Domino and do addin-type stuff, like adding a status line and calling AddinLogMessageText to display a message on the server's console.

Great: this proved that it's possible. However, there were still a few show-stopping problems to overcome.

PIDs

From what I gather, Notes keeps track of processes sharing its memory by their reported process IDs. If you have a process that joins the pool and then exits (maybe only if it exits abruptly; I'm not sure) and then tries to rejoin with the same PID, it will fail on init with a complaint that the PID is already registered.

Normally, this isn't a problem, as the OS hands out distinct PIDs all the time. This is trouble with Docker, though: by default, in general, the direct process in a Docker container sees itself as PID 1, and will start as such each time. In the Domino container, its PID 1 is "start.sh", and that's still going, and it's not going to hear otherwise from some other process calling itself the same.

Fortunately, this was a quick fix: Docker's -pid option. Though the documentation for this is uncharacteristically slight, it turns out that the syntax for my needs is the same as the last option. Thus: --pid=container:domino. Once I set that, the running app got a distinct PID from the pool. That was pleasantly simple.

SIGTERM

And now we come to the toughest problem. As it turns out, dealing with SIGTERM - the signal sent by docker stop - is a whole big deal in the Java world. I banged my head at this for a while, with most of the posts I've found being not quite applicable, not working at all for me, or technically working but only in an unsustainable way.

For whatever reason, the Open Liberty Docker image doesn't handle this terribly well - when given a SIGTERM order, it doesn't stop the servlet context before dying, which means the contextDestroyed method in my ServletContextListener (such as this one) doesn't fire.

In many webapp cases, this is fine, but Domino is extremely finicky when it comes to memory-sharing processes needing to exit cleanly. If a process calls NotesInit but doesn't properly call NotesTerm (and close all its Notes-enabled threads), the server panics and dies. This is... not great behavior, but it is what it is, and I needed to figure out how to work with it. Unfortunately, the Liberty Docker container wasn't doing me any favors.

One option is to use Runtime.getRuntime().addShutdownHook(...). This lets you specify a Thread to execute when a SIGTERM is received, and it can work in some cases. It's a little shaky sometimes, though, and it's bad form to riddle otherwise-normal webapps with such things: ideally, even webapps that you intend to run in a container should be written such that they can participate in a normal multi-app environment.

What I ended up settling on was based on this blog post, which (like a number of others) uses a shell script as the main entrypoint. That's a common idiom in general, and Open Liberty's image does it, but its script doesn't account for this, apparently. I tweaked that post's shell script to use the Liberty start/stop commands and ended up with this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/env bash
set -x

term_handler() {
  /opt/ol/helpers/runtime/docker-server.sh /opt/ol/wlp/bin/server stop
  exit 143; # 128 + 15 -- SIGTERM
}

trap 'kill ${!}; term_handler' SIGTERM

/opt/ol/helpers/runtime/docker-server.sh /opt/ol/wlp/bin/server start defaultServer

# echo the Liberty console
tail -f /logs/console.log &

while true
do
  tail -f /dev/null & wait ${!}
done

Now, when I issue a docker stop to the container, the script issues an orderly shutdown of the Liberty instance, which properly calls the contextDestroyed method and allows my code to close down its ExecutorService and call NotesTerm. Better still, Domino keeps running without crashing!

Conclusion

My final docker run scripts ended up being:

Domino

1
2
3
4
5
6
7
8
9
docker run --name domino \
	-d \
	-p 1352:1352 \
	-v notesdata:/local/notesdata \
	-v notesmisc:/local/notesmisc \
	--cap-add=SYS_PTRACE \
	--ipc=shareable \
	--restart=always \
	iksg-domino-12beta3

Webapp

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
docker build . -t example-webapp
docker run --name example-webapp \
	-it \
	--rm \
	-p 8080:8080 \
	-v notesdata:/local/notesdata \
	-v notesmisc:/local/notesmisc \
	--ipc=container:domino \
	--pid=container:domino \
	example-webapp

(Here, the webapp is run to be temporary and tied to the console, hence -it, --rm, and no -d)

One nice thing to note is that there's nothing webapp- or Java-specific here. One of the nice things about Docker is that it removes a lot of the hurdles to running whatever-the-heck type of program you want, so long as there's a Linux Docker image for it. I just happen to default to Java webapps for basically everything nowadays. The above script could be tweaked to work with most anything: the original post had it working with a Node app.

Now, considering that I was starting from nearly scratch here, I certainly can't say whether this is a bulletproof setup or even a reasonable idea in general. Still, it seems to work, and that's good enough for me for now.

Replicating Domino to Azure Data Lake With Darwino

Mon May 10 10:06:09 EDT 2021

Tags: darwino

Though Darwino is an app-dev platform, one of the main ongoing uses it's had has been for reporting on Domino data. By default, when replicating with Domino, Darwino uses its table layout in its backing SQL database, making use of the various vendors' native JSON capabilities to have storage capabilities analogous to Domino. Once it's there, even if you don't actually build any apps on top of it, it's immediately useful for querying at scale with various reporting tools: BIRT, Power BI, Crystal Reports, what have you. Add in some views and, as needed, extra indexes and you have an extraordinarily-speedy way to report on the data.

Generic Replication

But one of the neat other uses comes in with that "by default" above. Darwino's replication engine is designed to be thoroughly generic, and that's how it replicates with Domino at all: since an adapter only needs to implement a handful of classes to expose the data in a Darwino-friendly way, the source or target's actual storage mechanism doesn't matter overmuch. Darwino's own storage is just "first among equals" as far as replication is concerned, and the protocol it uses to replicate with Domino is the same as it uses to replicate among multiple Darwino app servers.

From time to time, we get a call to make use of this adaptability to target a different backend. In this case, a customer wanted to be able to push data to Azure Data Lake, which is a large-scale filesystem-like storage service Microsoft offers. The idea is that you get your gobs of data into Data Lake one way or another, and then they have a suite of tools to let you report on and analyze it to your heart's content. It's the sort of thing that lets businesspeople make charts to point to during meetings, so that's nice.

This customer had already been using Azure's SQL Server services for "normal" Darwino replication from Domino, but wanted to avoid doing something like having a script to transform the SQL data into Data-Lake-friendly formats after the fact. So that's where the custom adapter came in.

The Implementation

Since the requirement here is just going one way from Domino to Data Lake, that took a bit of the work off our plate. It wouldn't be particularly conceptually onerous to write a mechanism to go the other way - mostly, it'd be finding an efficient way to identify changed documents - but the "loose" filesystem concept of Data Lake would make identifying changes and dealing with arbitrary user-modified data weird.

The only real requirements for a Darwino replication target are that you can represent the data in JSON in some way and that you are able to identify changes for delta replication. That latter one is technically a soft requirement, since an adapter could in theory re-replicate the entire thing every time, but it's worlds better to be able to do continuous small replications rather than nightly or weekly data dumps. In Darwino's own storage, this is handled by indexed columns to find changes, while with Domino it uses normal NSFSearch-type capabilities to find modifications since a specific date.

Data Lake is a little "metadata light" in this way, since it represents itself primarily as a filesystem, but the lack of need to replicate changes back meant I didn't have to worry about searching around for an efficient call. I settled on a basic layout:

Data Lake layout

Within the directory for NSFs, there are a few entities:

  • darwino.json keeps track of the last time the target was replicated to, so I can pick up on that for delta replication in the future
  • docs houses the documents themselves, named like "(unid).json" and containing the converted JSON content of the Domino documents
  • attachments has subfolders named for the UNIDs of the documents referenced, followed by the attachment and embedded images, names prefixed with the fields they're from

Back on the Domino side, I can set this up in the Sync Admin database the same way I do for traditional Darwino targets, where the Data Lake extension is picked up:

Data Lake in Sync Admin

Once it's set up, I can turn on the replication schedule and let it do its thing in the background and Data Lake will stay in step with the NSFs.

Conclusion

Now, immediately, this is really of interest just to our client who wanted to do it, but I felt like it was a neat-enough trick to warrant an overview. It's also satisfying seeing the layers working together: the side that reads the Domino data needed no changes to work with this entirely-new target, and similarly the core replication engine needed no tweaks even though it's pointing at a fully-custom destination.