Bitwise Operators
Sat Apr 06 16:11:00 EDT 2019
- Java Hiccups
- Bitwise Operators
- Java Grab Bag 2
- Java Travelogue: The Care and Feeding of Locales
- More Notes on Filesystem and Charset Portability
In the "Java Hiccups" post, I briefly mentioned this little tidbit:
(short)Integer.MAX_VALUE == -1 // true
This fact betrays a lot about how numbers are stored internally in most languages. Some of those details, like the fact that the specific method is called two's complement, are fully in the range of "computer science" stuff - but it can be handy to know about a couple ways you can put this sort of thing to use.
For our purposes, let's work with the short
data type in Java, because it's painfully important to Notes. "Short" is called such because it's smaller than a standard integer type in a given language. In some languages, like C, the length of core data types like this is sometimes tied to the bitness of the processor, but in Java they're consistent across everything. Specifically, short
in Java is 16 bits long. Let's take the number 21,038, which is represented in binary as:
0101 0010 0010 1110
That's two bytes, and each byte is broken up into two groups of four bits each because it turns out to be helpful visually and because each "nibble" there can be matched to a single hexadecimal character (522E
in this case). This is the most common way you'll see binary written when you start getting into this sort of thing.
So I mentioned casting in the previous post, and what casting does with primitive integer types like this is to chop off the bits that don't fit. If you cast the above value into a byte
, it'll chop off all but the ending 8 bits, giving you:
0010 1110
Which is 46. If you cast it to a larger type, such as int
(32 bits in Java), it just adds some zeros to the front:
0000 0000 0000 0000 0101 0010 0010 1110
Which is still 21,038, but now it takes up twice as much memory.
For our uses, this stuff matters immediately in two ways: it represents the limits of some data storage and it allows us to manipulate numbers with bitwise operators.
Storage Limits
If you max out the bits in a 16-bit integer, you get 1111 1111 1111 1111
, and this value is either -1 or 65,535 (64k) depending on whether or not the value is considered "signed" by the code using it. When it's "unsigned", a 16-bit integer can represent between 0 and 65,535; when it's "signed", its range is -32,768 to 32,767 (32k). Java only has "signed" types, which is why Short.MAX_VALUE
is 32k and not 64k. C, on the other hand, lets you choose in your code which type you want to use.
The Notes C API uses a couple type names to represent consistent sizes across different platforms, and the one that's important here is WORD
: a 16-bit unsigned integer. This is used in, for example, the function used to set the value of a text item in a note:
NSFItemSetText(
NOTEHANDLE hNote,
const char far *ItemName,
const char far *ItemText,
WORD TextLength)
That WORD
at the end there is where you tell the API how much text you're providing, and is thus capped at 64k - and is why you can store 60k of text in a non-summary text field but not 70k. As to why summary data is limited to 32k and not 64k, I'm guessing that either there's a signed value in there somewhere or that last bit was needed for something else.
If you look over the list of Notes limits, you can see this reflected all over the place, along with a few 8-bit (0-255) limits for good measure.
Bitwise Operators (for real this time)
The term "bitwise" refers to a handful of operators that deal directly on the bits of a number. Java takes a cue from C here and provides &
, |
, ^
, ~
, <<
, >>
, and >>>
. These all have their uses, primarily for really-low-level stuff and efficiency, but we mainly care about &
and |
. These two may have bitten you in the past because of their similarity to &&
and ||
- and in particular because Formula Language uses the single-character versions to mean what Java means by the double-character ones. They also kind of work the same way, but aren't really the same.
To see how they work, let's take our original starting number, 21,038, and pair it up with another value, 5,000:
0101 0010 0010 1110
0001 0011 1000 1000
Visualizing the numbers in binary and stacked like this comes in extremely handy when it comes to dealing with bitwise operators. We'll start with &
, or "bitwise AND". What this operator does is take two numbers and return a number where the matching slots in each one both contain a 1
. So, in this case, it results in this internal math:
0101 0010 0010 1110
0001 0011 1000 1000
===================
0001 0010 0000 1000
Which corresponds to 4,616.
The |
operator, or "bitwise OR", will provide a result where either of the original numbers has a 1
in the slot. In this case:
0101 0010 0010 1110
0001 0011 1000 1000
===================
0101 0011 1010 1110
Or 21,422.
It's pretty rare that you want these operators for the actual numerical values, though. What they're pretty much always used for are "bit fields", an extremely-efficient way to store a set of related flags. Going back to the Notes API, the NSFItemAppend
function (a generic way to add an item value) has a parameter WORD Flags
, that matches up to a number of properties that you can assign to an item. In the C source, they're written as hexadecimal values, but I've added in the binary versions here:
#define ITEM_SIGN 0x0001 // 0000 0000 0000 0001
#define ITEM_SEAL 0x0002 // 0000 0000 0000 0010
#define ITEM_SUMMARY 0x0004 // 0000 0000 0000 0100
#define ITEM_READWRITERS 0x0020 // 0000 0000 0010 0000
#define ITEM_NAMES 0x0040 // 0000 0000 0100 0000
#define ITEM_PLACEHOLDER 0x0100 // 0000 0001 0000 0000
#define ITEM_PROTECTED 0x0200 // 0000 0010 0000 0000
#define ITEM_READERS 0x0400 // 0000 0100 0000 0000
#define ITEM_UNCHANGED 0x1000 // 0001 0000 0000 0000
So you can see that each flag there has a distinct slot filled in that each other doesn't (you can also see, like missing teeth, the bits that are probably obsolete or undocumented features). If you want to create an item that is a signed, sealed, summary, readers item, you'd do this:
WORD flags = ITEM_SIGN | ITEM_SEAL | ITEM_SUMMARY | ITEM_READERS
…which will result in this bit of internal math:
0000 0000 0000 0001
0000 0000 0000 0010
0000 0000 0000 0100
0000 0100 0000 0000
===================
0000 0100 0000 0111
This is extremely efficient, both because you can store a bunch of information in just 16 bits, but also because working with these bit fields is baked into processors at the lowest level. There's not much you can do on a computer that's faster, in fact.
When you have a value like this, you can query it for whether or not a given value is set by using the AND operator:
flags & ITEM_SUMMARY
This will result in:
0000 0000 0000 0100
…which is the same as the value of ITEM_SUMMARY
itself, since that one slot is the only bit in common. Conversely, if you check:
flags & ITEM_PROTECTED
…you'll end up with zero, since the flag doesn't match.
Use in Java
Most of the time, you don't have to think too much about the bits that make up numbers in Java, but they creep up from time to time. Generally, the main situations where they arise are when you're dealing with a low-level API or with code written by someone who spent a lot of time with C and deeply internalized its efficiencies.
For an example of the former, take a look at the com.ibm.designer.domino.napi.NotesNoteItem
class in IBM's NAPI. It contained a method called int getFlags()
, which returns exactly the value we were talking about above. If you get your hands on a summary item this way, you can do this to tell if it's a summary item:
(item.getFlags() & 0x0004) != 0
Note the extra != 0
. While the operation involved is the same as in C, C lets you treat any non-zero integer value as a boolean true
, while Java forces you to perform an actual boolean operation.
For an example of the latter, turn your eyes to the DefaultColumnDef
class used in xe:dynamicViewPanel
generation and customization, a snippet of which is:
public static class DefaultColumnDef implements ColumnDef {
public static final int FLAG_HIDDEN = 0x000001;
public static final int FLAG_LINK = 0x000002;
public static final int FLAG_ONCLICK = 0x000004;
public int flags;
public boolean isHidden() {
return (flags&FLAG_HIDDEN)!=0;
}
public boolean isLink() {
return (flags&FLAG_LINK)!=0;
}
public boolean isOnClick() {
return (flags&FLAG_ONCLICK)!=0;
}
}
This could also be represented as a bunch of boolean
properties on the object or, most idiomatically, an EnumSet
, but using a bitmask is slightly more space- and CPU-cycle-efficient. Slightly. The difference isn't enough to even think about normally, but can become worth it in cases where the code is called so often that tiny efficiencies can make a difference.
This whole topic is the sort of thing that is normally so many layers of abstraction below where we work that it's basically invisible, but it's definitely good to at least know the basics for the times when it does come up.
Ben Langhinrichs - Sat Apr 27 19:53:48 EDT 2019
Well explained. I don't have to deal with this in Java often, because I'm that bozo who works in C all the time and has "deeply internalized its efficiencies". I love it when you explain all this, though, as it is seldom laid out in a clear way.