Generating Archive Files On The Fly In Java

Thu Dec 09 10:30:36 EST 2021

Tags: java liberty

When working on version 3.0 of the Domino Open Liberty Runtime, I had occasion to do something I've done in other situations, but it occurred to me that it'd make a good post on its own. Specifically, part of one of the new features involved creating archive data on the fly, purely in-memory, and that's something that comes in handy quite a bit.

Background: The Task

The task at hand in that project involved the way the runtime will deploy custom extension features for Liberty when creating the server. There are a few of these, all centered around adding integration with Domino in one way or another. For the previously-existing Liberty features, this was done in three parts:

  • The actual Liberty extension code, which is a Java project that produces a Liberty-compatible OSGi bundle.
  • A "subsystem" module, which is a code-less Maven project that uses esa-maven-plugin to embed the above bundle and generate a "SUBSYSTEM.MF" file to describe it. This ESA/subsystem bit is a mechanism for distributing packaged features from the OSGi spec.
  • A "deployment" module, which is a small Java project that provides an extension for the Domino-side runtime to house and deploy the above ESA file.

For 3.0, I wanted to make a feature that would provide Notes.jar and the NAPI to application. Since those files are proprietary and non-distributable, I couldn't include them in the actual runtime distribution and would instead have to look them up from Domino's environment at runtime. Additionally, since all I wanted to do was provide the existing API and not add any new code, there was no particular need to make a code project like the first one above.

More Background: Java Streams

The way these extensions are registered to Domino is by classes that provide some metadata about the feature and then a method called getEsaData() that returns an InputStream. Though InputStreams aren't the only way to represent arbitrary binary blobs like this, they're used everywhere by virtue of them arriving with Java 1, and they're extremely adaptable.

Basically, the idea of an InputStream is that it's just a mechanism to read a sequence of bytes from somewhere. In Domino terms, they're like NotesStream, but good.

Their utility comes from their simplicity and adaptability. Because the abstract class only deals with reading bytes and a few operations for skipping around, they can be used for all sorts of things. The prototypical use is for reading a file. For example:

1
2
3
4
Path someFile = Paths.get("/foo/bar/baz.txt");
try(InputStream is = Files.newInputStream(someFile)) {
  // read file data from the stream
}

They're not limited to that, though: the JDK comes with all sorts of InputStream variants like ByteArrayInputStream, which lets you read from a byte[] in memory.

In addition to being arbitrary as to where the byes are coming from, streams are also very composable. Many types of streams either must or may wrap an existing stream to alter it in some way. One of the more-common cases where you'd do this is when reading ZIP file data. Taking something similar to above:

1
2
3
4
5
6
7
8
Path someZipFile = Paths.get("/foo/bar/baz.zip");
try(
  InputStream is = Files.newInputStream(someFile);
  ZipInputStream zis = new ZipInputStream(is, StandardCharsets.UTF_8)
) {
  ZipEntry entry = zis.getFirstEntry();
  // work with the ZIP entries
}

The thing to note here is that, while this happens to be coming from a ZIP file on disk, it doesn't have to be: that first is could just as easily be a stream coming from HttpURLConnection or a ByteArrayInputStream.

Along with InputStream, Java also has OutputStream. Luckily, OutputStream is similarly simply designed, and has uses that are a direct mirror for everything above: there exist ByteArrayOutputStream, ZipOutputStream, and all sorts of others.

Putting It Together

Back to the original goal, my task was to create a class that would provide an InputStream containing ESA data - that is to say, a ZIP file - to the runtime, which could then deploy it as a Liberty feature. The previous extensions did this by embedding the ESA in their JAR and then returning an InputStream to that. Now, though, I wanted to do it all dynamically.

Now, I talked a big game above about how streams didn't have to have anything to do with files, and it could all be done in memory. That's still all true, but technically here I ended up using files for caching purposes. The above is still good to know, though!

So anyway, my goal was to deliver an InputStream to the runtime that represented an ESA that looks like this:

Contents of a generated ESA file

Of those entries, "corba.jar" is the CORBA API from Maven Central to make Notes.jar work on Java 9+, while "Notes.jar" comes from jvm/lib/ext and "lwpd.commons.jar" and "lwpd.domino.napi.jar" come from the OSGi framework in the running Domino server. The remaining entries - the two "MF" files and the embedded JAR - are composed on the fly.

The starting point here is that I identify a cache location within my working directory based on the current Domino build number, and I name that path out. Then, I open it up as a ZIP to fill with contents, like above:

1
2
3
4
5
6
try(
  OutputStream os = Files.newOutputStream(out, StandardOpenOption.CREATE);
 ZipOutputStream zos = new ZipOutputStream(os, StandardCharsets.UTF_8)
) {
  // work happens here
}
SUBSYSTEM.MF

The next part is to build the "SUBSYSTEM.MF" file. As implied by the extension, this file has the same syntax as "MANIFEST.MF" files, and so I can use the java.util.zip.Manifest class to handle encoding and formatting. I start out by loading a template from the current bundle's resources:

1
2
3
4
Manifest subsystem;
try(InputStream is = getClass().getResourceAsStream("/subsystem-template.mf")) { //$NON-NLS-1$
  subsystem = new Manifest(is);
}

There, I'm using the constructor from Manifest that reads from an existing stream. Often, that would be reading a "MANIFEST.MF" from an existing JAR, but it'll work with any stream.

Then, I fill it in with some details, with lines like:

1
2
3
4
5
Attributes attrs = subsystem.getMainAttributes();
attrs.putValue("Subsystem-Name", getShortName()); //$NON-NLS-1$
String featureName = getShortName() + "-" + getFeatureVersion(); //$NON-NLS-1$
attrs.putValue("IBM-ShortName", featureName); //$NON-NLS-1$
// etc.

Finally, I create an entry in the ZipOutputStream and write the contents. The way ZipOutputStream works is that its "stream-iness" counts towards whatever the most-recently-added entry is.

1
2
zos.putNextEntry(new ZipEntry("OSGI-INF/SUBSYSTEM.MF")); //$NON-NLS-1$
subsystem.write(zos);
Embedded Bundle

Alright, so far, so good. Up until now, this is the "normal" case for working with ZIP files, where you make a new entry and pour in some text data. What's neat, though, is that the encapsulation capabilities of these streams can be stacked, which is what comes up next.

Specifically, I wanted to put a ZIP file (the .jar) within this surrounding ZIP (the .esa). The way this is done is by just composing the same tools we've been working with again. Here, esa is what zos was above: the outermost package ZIP contents. I just renamed it in this method for clarity inside the code itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// This will be a shell bundle that in turn embeds the API JARs
esa.putNextEntry(new ZipEntry(BUNDLE_NAME + "_" + dominoVersion + ".0.0.jar")); //$NON-NLS-1$ //$NON-NLS-2$
		
// Build the embedded JAR contents
try(ZipOutputStream zos = new ZipOutputStream(esa, StandardCharsets.UTF_8)) {
  Manifest manifest;
  try(InputStream is = getClass().getResourceAsStream("/manifest-template.mf")) { //$NON-NLS-1$
    manifest = new Manifest(is);
  }
  
  // Finish manifest
  // More work here
}

So there, I'm doing basically the same thing as I did originally, to make a ZipOutputStream. Since the ZipOutputStream really doesn't care what the stream is writing to is, it works just as well when writing to a file as when writing to another ZIP stream - the cascading streams handle their own encoding and it works out in the end.

Once I write the manifest, I can make use of the Files utility class to embed each of the JARs from the filesystem:

1
2
3
4
for(Path jar : embeds) {
  zos.putNextEntry(new ZipEntry(jar.getFileName().toString()));
  Files.copy(jar, zos);
}

Finally, I download the CORBA JAR on the fly, so for that one I use a utility function to download from the remote URL:

1
2
3
4
5
OpenLibertyUtil.download(new URL(URL_CORBA), is -> {
  zos.putNextEntry(new ZipEntry("corba.jar")); //$NON-NLS-1$
  IOUtils.copy(is, zos);
  return null;
});

Here, I use IOUtils from Apache Commons IO because it's not copying from a filesystem path, but the idea is basically the same, and exactly the same as far as the destination ZIP is concerned.

The Final Result

Once this is all written to the cached file on the filesystem, the final result is just to return a stream from it:

1
return Files.newInputStream(out);

Since the job of this extension class is only to return an InputStream, the consuming code doesn't care that the extension did all this work, as opposed to the other ones that just return a stream of an embedded resource: everything else is the same.

So, all in all, this isn't a groundbreaking new technique, but that's the point: the way these lower-level JDK components work, you get a tremendous amount of flexibility from just a few common parts.

New Comment