frostillic.us

Showing posts for tag "c"

The $ViewFormat Structure For Table Views (As I Understand It)

Wed Jul 14 16:00:49 EDT 2021

Tags: c design-api

(Updated 2021-08-01)

The last week or so, I've been working on reading Notes design elements without my longtime unreliable friend DXL. This is a task I've done in parts a few times before, but now I've set out to finish the job.

For the most part, the task is arcane but explicable: the structures are generally documented well enough in the C API docs, and most non-string items are usually only a few elements long. Views - likely because they're as old as Notes itself - are not one of these easy parts.

Background

The $ViewFormat item in a view design note contains, well, the format of the view: the overall settings (like the background colors) as well as the formats for the individual columns. Because views have gotten more complicated over the years, this item has grown. And because these types of structures have to grow in a way that's generally compatible with older code that might read it, the format is essentially an array of progressively-newer structures that alternatingly describe new traits of the view and the columns.

Some of this is described in the C API documentation. In there, you'll find "readview.c", which shows a basic example of how to read a view. Unfortunately, this example stops at the VIEW_TABLE_FORMAT2 structure, which, as we'll see shortly, is not far in. From what I can gather by looking at the "FirstVer" values in the C API documentation NSF, this file demonstrates only what existed before Notes R4.

So that example is a bust, but fortunately "viewfmt.h" has some more information. Near the top of that file is an overview of the $ViewFormat item structure in general, and it's more helpful. It's not fully helpful, though: it stops with things added in R6, and doesn't cover parts added in 6.5 and beyond. Additionally, it doesn't explain the storage of the R6-era variable data - the hide-when formulas and twistie image references - and those are a bit weird.

The mitigating factor for the documentation is that, while it doesn't explain the order of the remaining data, it does at least include definitions for the post-R6 structures, such as VIEW_COLUMN_FORMAT5 and whatnot. That is... it mostly includes that: they're fully present in "viewfmt.h", but some are missing in the NSF doc DB, and the NSF lacks some important flags. Still, it's something.

My final task remained: I had a few hundred bytes left and, armed just with some available structures and Karsten Lehmann's earlier attempt, I had to figure out how views work. And I think I did it! I haven't tested all edge cases, and in particular this will be different for calendar-type views, but this holds together. I'll update this as I find more hiccups.

Implementation Notes

I've been accessing this data as read by using the pointer returned by OSLockObject on an item value's BLOCKID. I'm also running on an amd64 Mac. Those two aspects may affect the need for ODSReadMemory - things may be different if you're on a different architecture, or perhaps the way I read it already brought it into host format. In my use, I found that ODSReadMemory was harmful in all cases, and that it was best to read the memory directly. Additionally, this is a view created using Designer V12 and read using the V12 libnotes.dylib, so anything newer won't be reflected here.

The rules around ODSReadMemory elude me at the moment. The way I'm reading this data involves the value BLOCKID as fetched by NSFItemQueryEx, which indicates that the value will be in canonical format for types including this. In theory, that should mean that ODSReadMemory is required for these structures, but it ends up harmful.

The Structure

To start out with, we have the VIEW_FORMAT_HEADER structure, which defines the format version (always 1) and whether it's a normal "table"-type view or a calendar view.

This header structure is, in table views, actually part of the VIEW_TABLE_FORMAT structure that is the true beginning (and is also where things will diverge for calendar views). This one's pretty straightforward, with no variable data, and contains the critical tidbit of the Columns value for the total number of columns, a number that we will reference again and again. Once read, advance your pointer sizeof(VIEW_TABLE_FORMAT) bytes.

Following this will be one VIEW_COLUMN_FORMAT structure for each column based on the count in the format. For each, read the structure and advance your pointer sizeof(VIEW_COLUMN_FORMAT) bytes for each. The first WORD of each of these will be equal to VIEW_COLUMN_FORMAT_SIGNATURE.

Next up is the variable data for those columns: the item name (string), the title (string), the formula (compiled formula data), and the constant value data (documented as reserved). These should be read by iterating over the above-read VIEW_COLUMN_FORMAT structures and reading byte arrays with length based on ItemNameSize, TitleSize, FormulaSize, and ConstantValueSize.

Next is VIEW_TABLE_FORMAT2, which is the last bit read by the "readview.c" example. Read sizeof(VIEW_TABLE_FORMAT2) bytes and increment your pointer.

(At this point in time, you might be inclined to ask "hey, how come VIEW_COLUMN_FORMAT has a handy signature to identify it but VIEW_TABLE_FORMAT2 doesn't?". Well, to that I say that you sure are asking a lot of nosy questions, pal, and maybe you should mind your own business. Anyway, this will be a reocurring feature.)

This is followed by one VIEW_COLUMN_FORMAT2 structure for every column in the view. Though this includes variable-data sizes, we don't read them yet - just read sizeof(VIEW_COLUMN_FORMAT2) * n in total here. These will each have VIEW_COLUMN_FORMAT_SIGNATURE2 in the first WORD.

Next is VIEW_TABLE_FORMAT3, which is a blessedly fixed-size structure, so you can just read sizeof(VIEW_TABLE_FORMAT3) bytes and move on.

Now here's where we come to the variable data from VIEW_COLUMN_FORMAT2, and where things start getting weird. To read this data, iterate over each FORMAT2 structure for your columns and check the wHideWhenFormulaSize and wTwistieResourceSize properties. If the hide-when size is above zero, read the hide-when formula first as a compiled formula. Then, read the twistie resource as CDRESOURCE composite-data structure if its size is non-zero. Fortunately, the lengths for both of these are defined in VIEW_COLUMN_FORMAT2, so the total amount to read will be wHideWhenFormulaSize plus wTwistieResourceSize from that structure.

Next, we have VIEW_TABLE_FORMAT4, which is a simple little structure. Of note here is that the RepeatType property - which I gather is how to repeat the background image of the view - references "viewprop.h", but that is not in the documentation. Oh well. Pleasantly, though, this structure includes a Length property as its first WORD, so read that many bytes (even though it should always be 8).

After this, our friend CDRESOURCE shows up again, this time as a single entity to represent the background image of the view. Since there's nothing that tells you the length of this before hand, you should glean the length from the CD record header, which is a WSIG - so the second WORD. Read that, then backtrack and read that many bytes.

Now, we have 0-to-n VIEW_COLUMN_FORMAT3 structures - one for each column that includes date formatting (from that "Style" dropdown on the fourth tab in Designer). To read these, look over your array of VIEW_COLUMN_FORMAT2 structures and, for each that has ExtDate in its Flags3 field, read one of these. Each column's structure is immediately followed by variable data for the different formatting strings, so read strings based on DTDsep1Len, DTDsep2Len, DTDsep3Len, and DTTsepLen from the structure, and then move on to the next applicable column.

These are followed by 0-to-n VIEW_COLUMN_FORMAT4 structures. These are the same idea as above, but for number formatting. Look for VIEW_COLUMN_FORMAT2 entries with NumberFormat in their Flags3 and read a VIEW_COLUMN_FORMAT4. Each of these is followed immediately by strings based on DecimalSymLength, MilliSepSymLength, NegativeSymLength, and CurrencySymLength, so read those too before moving to the next applicable column.

We've got one more of these in this sequence, and that's VIEW_COLUMN_FORMAT5: formatting for names columns. Iterate through your VIEW_COLUMN_FORMAT2s and look for NamesFormat in Flags3 and read these structures. Each one is followed immediately by the programmatic name of the column containing the distinguished person name for Sametime purposes, as defined by wDistNameColLen. Fortunately, this structure also includes a dwLength (which is a WORD despite the prefix) that contains the total length of the structure and its associated variable data, so you can read that many bytes in total here. It also, though, contains a nasty trick: you might be inclined to think that this structure also contains the name of the shared column used here, what on account of wSharedColumnAliasLen being in the structure and the documentation saying that's what it means. You'd be wrong, though - that value will be 0 and you'll get nothing from it.

This next part is a doozy. The fact that VIEW_COLUMN_FORMAT5 purports to contain the name of the shared column but in fact doesn't raises the question of where that name actually is. Well, it's right up next - you just need to read the string! But hrm: the length value in the previous structure was 0 and, besides, there won't even be one of those for non-names columns. Is it null-terminated? No: your next bytes will likely be something like 0x0700, and 0x07 is not a column name, that's for sure. What we have here is a P-string, prefixed by a WORD containing the number of characters. These strings represent both shared column titles and any titles for columns marked as "Do not display title in column header", with the latter first. So, loop through your VIEW_COLUMN_FORMAT2s again, looking for HideColumnTitle in Flags3, and, for each match, read a WORD length and then that many characters to glean the title. Then, do the same for columns with IsSharedColumn in Flags3 for your shared-column aliases.

The final structure in play that I know of is VIEW_COLUMN_FORMAT6, which fortunately is handled in basically the same way as formats 3-5. Look through VIEW_COLUMN_FORMAT2 again, this time checking Flags3 for ExtendedViewColFmt6 (I appreciate the functional name), and read a VIEW_COLUMN_FORMAT6 for each. This one again has some variable data immediately after each structure and also has a Length property, so read in that property's value in total bytes. This structure isn't in the doc NSF, but it is in "viewfmt.h", and it contains R8-era composite-app stuff.

Finally, it seems like there's a trailing 0. It makes me nervous that I end up with a single byte, but it seems to be consistent, and so my guess is that it's here to meet a WORD boundary, which is extremely common in the Notes API.

Incidentally, I don't know where VIEW_TABLE_FORMAT5 comes in, if at all. It's in the NSF and "viewfmt.h", but looks like just a duplicate of VIEW_TABLE_FORMAT4. It's possible that it sometimes shows up right after 4 when some specific thing is set, though it has no signature component and would be difficult to detect if so.

The Structure (Condensed)

If you came here to just get a quick overview of the format, I'm afraid I've pulled a "recipe blog post" on you and wrote it all in prose before the important part. In any event, here's the above but condensed:

VIEW_TABLE_FORMAT
VIEW_COLUMN_FORMAT * column count
(variable data for each column)     // Order: (item name, title, compiled formula, constant value), repeat per column
VIEW_TABLE_FORMAT2
VIEW_COLUMN_FORMAT2 * column count
VIEW_TABLE_FORMAT3
(hide-whens and twistie resources)  // Order: (compiled formula, CDRESOURCE), repeat per column where VIEW_COLUMN_FORMAT2 has wHideWhenFormulaSize > 0 or wTwistieResourceSize > 0
VIEW_TABLE_FORMAT4
CDRESOURCE                          // Represents the background image for the view
VIEW_COLUMN_FORMAT3 * x1            // One per column where Flags3 has ExtDate. Order: (VIEW_COLUMN_FORMAT3, var data), repeat per column
VIEW_COLUMN_FORMAT4 * x2            // One per column where Flags3 has NumberFormat. Order: (VIEW_COLUMN_FORMAT4, var data), repeat per column
VIEW_COLUMN_FORMAT5 * x3            // One per column where Flags3 has NamesFormat. Order: (VIEW_COLUMN_FORMAT5, var data), repeat per column
(hidden column titles)              // One per column where Flags3 has HideColumnTitle. Stored as WORD-prefixed P-strings
(shared column names)               // One per column where Flags3 has IsSharedColumn. Stored as WORD-prefixed P-strings
VIEW_COLUMN_FORMAT6 * x4            // One per column where Flags3 has ExtendedViewColFmt6. Order (VIEW_COLUMN_FORMAT6, var data), repeat per column

Conclusion

Hoo boy, this was a fun one. Anyway, assuming this holds up under further testing, my hope is that this post will be useful for the next person to come around to try to decode $ViewFormat. If that's you: good luck!

1 comment

My Tortured Relationship With libnotes

Sat May 22 12:31:10 EDT 2021

Tags: c java

A tremendous amount of my work lately involves wrangling the core Notes library, variously "libnotes.dylib", "libnotes.so", or (for some reason) "nnotes.dll" (edit: see the comments!). I do almost all of my daily work in Liberty servers on the Mac loading this library, my first major use of the Domino Docker image was to use it as an overgrown carrier for this native piece, and in general I spend a lot of time putting this to use.

It ain't easy, though! Unlike a lot of native libraries that you can just kind of load from wherever, libnotes is extremely picky about its loading environment. There are a few main ways that this manifests.

Required-Library References

libnotes isn't standalone, and it doesn't just rely on standard OS stuff like libc. It also relies on other Notes-specific libraries like libxmlproc and libgsk8iccs, and some of those refer back to each other. This all shakes out in normal practice, since they're all next to the Notes/Domino executable and each other, but it makes things finicky when you're running from outside there.

This seems to be the most finicky on macOS, where the references to each library are marked with @executable_path relative paths, meaning relative to the running executable.

I've wrangled with this quite a bit over the years, but the upshot is that, on macOS and Linux, you really want to set some environment variables before you ever load your program. Naturally, since your running program can't do anything about what happened before it was loaded, this means you have to balance the teacups in your environment beforehand.

Non-Library Files

Beyond the dynamic libraries, libnotes also need some program-support executable files, things like string resource files and whatnot. And, very importantly, it needs an active data directory with an ID, notes.ini, and names.nsf at least. The data-directory contents bits are at least a little more forgiving: though there's still a hard requirement that they be present on the filesystem (since libnotes loads them by string path, not as configurable binary streams), you could at least bundle them or copy them around. For example, for my Dockerized app runners, I tend to have some basic versions of those in the repo that get copied into the Docker container's notesdata directory during build.

Working Around It

As I mentioned, the main way I go about dealing with this is by telling whatever is running my program to use applicable environment variables, ideally in a reproducible config-file-based way. This works perfectly with Docker and with tycho-surefire-plugin in Maven, but doesn't work so well for maven-surefire-plugin (the normal unit test runner) or Eclipse's JUnit tools. In Eclipse's case, I can at least fill in the environment variables in the Run Configuration. It hampers me a bit (I can't just right-click a test case and run it individually without first running the whole suite or setting up a configuration specially), but it works.

I gave a shot recently to copying the Mac dylibs and programmatically fiddling with otool and install_name_tool to adjust their dependency paths, and I got somewhere with that, but that somewhere was an in-libnotes fatal panic, so I'm clearly missing something. And besides, even if I got that working, it'd be a bit of a drag.

What Would Be Nice

What would be really nice would be a variant of this that's just more portable - something where I can just System.load a library, call NotesInitExtended to point to my INI and ID, and be good to go. I'm not really sure what this would entail, especially since I'd also want libinotes, libjnotes, and liblsxbe. I do know that I don't have the tools to do it, which fortunately frees me up to idly speculate about things I'd like to have delivered to me.

As long as I'm wishing for stuff, I'll say what would be even cooler would be a WebAssembly library usable with Wasmer, which is a multi-language WebAssembly runtime. The promise there is that you'd have a single compiled library that would run on any OS on any processor that supports WebAssembly, from now until forever. I'm not sure that this would actually be doable at the moment - for one, I don't know if callback parameters work, which would be fairly critical. Still, my birthday is coming up, and that would be a nice present.

1 comment

How I Learned to Stop Worrying and Love C Structs

Mon Aug 04 22:38:04 EDT 2014

Tags: c java

Since a large amount of on-site client time has put my framework work on hold for a bit, I figured I'd continue my dalliance in the world of raw item data.

Specifically, what I've been doing is filling out the collection of structs with Java wrappers, particularly in the "cd" subpackage. For someone like me who has only done bits and pieces of C over the years, this is a fruitful experience, particularly since I'm dealing with just the core concept of the Notes API data structures without the messy business of actually worrying about memory or crashing programs.

If you're not familiar with structs, what they are is effectively just the data portion of a class. They're like a blueprint of related pieces of data - ints, doubles, other structs, etc. - used both for creating new elements and for a contract for the type of data received from an API. The latter is what I'm dealing with: since the raw item data in DXL is presented as just a series of bytes, the struct documentation is vital in figuring out what to expect in which spot. Here is a representative example of a struct declaration from the Notes API, one which includes bonus concepts:

typedef struct{
	LSIG	Header;		/* Signature and Length */
	WORD 	FileExtLen;		/* Length of file extenstion */
	DWORD	FileDataSize;	/* Size (in bytes) of the file data */
	DWORD	SegCount;		/* Number of CDFILESEGMENT records expected to follow */
	DWORD	Flags;		/* Flags (currently unused) */
	DWORD	Reserved;		/* Reserved for future use */
	/*	Variable length string follows (not null terminated).
		This string is the file extension for the file. */
} CDFILEHEADER;

So... that's a thing, isn't it? It represents the start of a File Resource (or CSS file, Java class, XPage, etc.). I'll start from the top:

LSIG Header: An "LSIG" is actually another struct, but a smaller one: its job is to let a processor (like my code) identify the following record, as well as providing the total size of the record. When processing the byte stream, my code looks to the first two bytes of each record to determine what to expect for the next block.
WORD FileExtLen: To start with, "WORD" here is just an alias for an "unsigned short" - a 16-bit integer ranging from 0 to 65535 (64K). The term "word" itself refers to the computer-architecture term. This field specifically denotes the number of bytes to expect after the record for the file extension of the file resource being defined.
- The "unsigned" above refers to the way the numbers are stored in memory, which is to say a series of binary digits. By default, the highest-value bit is reserved for the "sign" - a 0 for positive and a 1 for negative. Normal "signed" numbers can store positive values only half the size of their unsigned counterparts, because going from "0111 1111" (+127) wraps around to "1000 0000" (-127). This is why the "half" values - 32K, 2G - are seen as commonly as their "full" counterparts. As to why negative numbers are represented with all those zeros, you'll either have to read up independently or just trust me for now.
DWORD FileDataSize: A "DWORD" is double the size of a "WORD" (hence the "D", presumably): it's an unsigned 32-bit number, ranging from 0 to 4,294,967,295 (4G). This number represents the total size of the file attachment, and so also presumably represents the maximum size of a file resource in Domino.
DWORD SegCount: This indicates the number of "segment" records expected to follow. Segment records are similar to this header we're looking at, but contain the actual file data, split across multiple chunks and across multiple items. That "multiple items" bit is due to the overall structure of rich-text items, and is why you'll often see rich text item names (like "Body" or "$FileData") repeated many times within a note.
DWORD Flags: As the comment in the API indicates, this field is currently unused. However, it's representative of something that is used very commonly in other structures: a set of bits that isn't useful as a number, but instead has values set to correspond to traits of the entity in question. This is referred to as a bit field and is an efficient way to store a fixed number of flags. So if you had a four-bit flags field for a text item, the bits may indicate whether it's summary data, a names field, a readers field, and/or an authors field - for example "1100" for a field that's summary and names, but not readers or authors. That is, incidentally, basically how those flags are implemented, albeit with a larger bit field.
- There is a secondary type of "flag" in Notes: the "$Flags" and "$FlagsExt" fields in design elements. Those flags are less efficient - 8 bits per flag instead of 1 - but are generally more extensible and somewhat conceptually easier.
- These bit fields are what those oddball bitwise operators are for, by the way.
DWORD Reserved: Many (most?) C API data structures contain at least one block of bits like this that is reserved for future use. These are so that IBM can add features without breaking all existing code: API users are expected to leave anything there intact and to create new structures with all zeros. You can see this in action in a couple places, such as the addition of rudimentary theme support to legacy design elements, which used up a byte out of the previously-reserved block.
Variable data: This is the fun part. Many structures contain "variable" data following the "fixed" portion. Whereas the previous parts are a predictable size - a DWORD will always be four bytes - the parts following the block can be anywhere from 0 bytes to whatever is the max value of their referring entity. Fortunately, the "SIG" at the beginning of the record tells the API user the length ahead of time, so it's not required to read it all in when dealing with an API entity. Still, the code to read this can be complex and bug-prone, particularly when there are multiple variable parts packed together. In this case, though, it's relatively simple: we get the value of "FileExtLen" and read that many bytes into an array and convert that from LMBCS to a respectable Unicode string.

Dealing with a stream of structs is... awkward at first, particularly when you're used to Java amenities like being able to just pour out and read back in serialized objects without even thinking twice about it. After a while, though, it gets easier, and you get an appreciation for a lower-level type of programming. And while you still may not be happy about Domino's 32K summary-data limit, you at least get an understanding for why it's there.

So if you're in a position to dive into this sort of thing once in a while, I recommend you do so. Though the value of what I'm writing specifically is mixed - I doubt the world needs, say, a Java wrapper for a CD record reflecting a DECS field association - the benefit to my brain is immense. Dealing with web and Java programming can cause you to become very disconnected from the fundamentals of programming, and something like this can bring you back to solid ground. Give it a shot!

2 comments

The DSAPI Login Filter I Wrote Years Ago

Mon May 19 15:02:18 EDT 2014

Tags: dsapi c

In one of the conversations I had at the meetup yesterday, I was reminded of the DSAPI filter I use on my server for authentication, and remembered I'd yet to properly blog about it.

Years ago, I had a problem: I was setting up an XPages-based forum site for my WoW guild, and I wanted sticky logins. Since my guildies are actual humans and not corporate drones subject to the whims of an IT department, I couldn't expect them to put up with having to log in every browser session, nor did I want to deal with SSO tokens expiring seemingly randomly during normal use. My first swing at the problem was to grossly extend the length of Domino's web sessions and tweak the auth cookie on the server to make it persist between browser launches, but that still broke whenever I restarted the server or HTTP task.

What I wanted instead was a way to have the user authenticate in a way that was separate from the normal Domino login routine, but would still grant them normal rights as a Domino user, reader fields and all. Because my sense of work:reward ratios is terribly flawed, I wrote a DSAPI filter in C. Now, I hadn't written a line of C since a course or two in college, and I hadn't the foggiest notion of how the Domino C API works (for the record, I consider myself as now having exactly the foggiest notion), and the result is a mess. However, it contains the kernel of some interesting concepts. So here's the file, warts, unncessary comments, probable memory leaks or buffer overflows, and all:

raidomaticlogin.c

The gist of the way my login works is that, when you log in to the forum app, a bit of code (in an SSJS library, because I was young and foolish) finds the appropriate ShortName, does a basic XOR semi-encryption on it, BASE64s the result, and stores it in a cookie. The job of this DSAPI filter, then, is to look for the presence of the cookie, de-BASE64 it, re-XOR it back to shape, and pass the username back to Domino.

Now, the thing that's interesting to me is that, because you've hooked directly into Domino's authentication stack, the server trusts the filter's result implicitly. It doesn't have to check the password and - interestingly - the user doesn't actually have to exist in any known Directory. So you can make up any old thing:

CN=James T. Kirk/OU=Starfleet/O=UFP

Though I do not, in fact, maintain a Directory for the Federation, Domino is perfectly happy to take this name and run with it, generating a full-fledged names list:

Names List: CN=James T. Kirk/OU=Starfleet/O=UFP, *, */OU=Starfleet/O=UFP, */O=UFP

It gets better: you can use any of these names, globs included, in Directory groups or DB-level ACLs and they're included transparently. So I set up a couple groups in the main Directory containing either the full username or "*/OU=Starfleet/O=UFP" and also granted "*/O=UFP" Editor access and an Admin role in the DB itself. Lo and behold:

Names List: CN=James T. Kirk/OU=Starfleet/O=UFP, *, */OU=Starfleet/O=UFP, */O=UFP, Starfleet Captains, Guys Who Aren't As Good As Picard, [Admin]
Access Level: Editor

Now we're somewhere interesting! Though I didn't use it for this purpose (all my users actually exist in a normal Directory), you could presumably use this to do per-app user pools while still maintaining the benefits of Domino-level authentication (reader fields, ACLs, user tracking). With a lot of coordination, you could write your C filter to look for appropriate views in the database matching the incoming HTTP request and only pass through the username when the cookie credentials match a document in the requested app. Really, only the requirements of high performance and the relentless difficulty of writing non-server-crashing C stand in between you and doing some really clever things with web authentication.

1 comment