Be a Better Programmer, Part 3

Thu Jun 05 11:06:03 EDT 2014

Tags: performance
  1. Be a Better Programmer, Part 1
  2. Be a Better Programmer, Part 2
  3. Be a Better Programmer, Part 3
  4. Be a Better Programmer, Part 4
  5. Be a Better Programmer, Part 5

After a break last week, this week's entry returns a bit to practical concepts. Specifically:

Develop a Working Knowledge of Relative Speeds

Performance of an app depends on almost-countless factors, but I've found that you mostly just need to know a few general ideas and rules of thumb to avoid the real pitfalls.

Big O Notation

The first thing to have a general working knowledge of is the naively-named Big O notation. This is one of the core concepts in a Computer Science curriculum and is often one of the more "mathy" subjects when you learn it. However, for the level of programming we do, it mostly serves as a warning to not loop too much. Specifically, avoid this sort of thing when possible:

for(int i = 0; i < someLength; i++) {
	for(int j = 0; j < someLength; j++) {
		// some stuff here
	}
}

That's O(n²): the amount of times that the inner loop runs grows quadratically with the size of "someLength", and it's among the worst things you can do. I recommend looking around for some basic introductions to the concept, such as this article. One thing to remember with Big O notation is that it's a general guide, not a hard-and-fast rule: an O(n²) algorithm may be faster than an O(1) if what the latter is actually doing is really slow in some other way.

I/O Overhead

Another key thing to keep in mind is that the difference between "levels" of activity - in-memory, DB access, etc. - is often so great that it completely dwarfs any other complexity. For example, it's usually going to be significantly faster to sort/search/mangle a data structure in memory many times over than to access a database even once. As with Big O, this is sort of a fuzzy rule, but I tend to keep a couple order-of-magnitude levels in mind:

  1. In-memory structure access
  2. Efficient database access (open connection, in-memory)
  3. Filesystem access
  4. Remote database/service access (e.g. web services)

#2 and #3 probably go back and forth in performance in different contexts, but any filesystem use is "dirty" enough conceptually that I downgrade it just for that.

This is why caching and minimizing database access is so important. The Domino programming environment makes it feel like the database is there "for free", but it has all the same pitfalls. Say you have a basic List of Maps in Java and the same data stored in a view in Domino, and you want to do what amounts to a keyed lookup. Even though the algorithm for doing a lookup by key in Domino is (presumably) very efficient from a Big O perspective, even a linear search through the List in memory is going to run laps around it.

There are, naturally, limits: some database operations are particularly fast and some in-memory algorithms are particularly slow.

Platform Knowledge

And in addition to the rules of thumb, sometimes it's best to just know a lot about the platform you're working with. Sometimes this comes with a knowledge of the general type of platform (e.g. key/value lookup in a document DB is fast, "relational" activity is not), and sometimes it just takes hands-on experience.

Fortunately, this becomes self-reinforcing as you learn it. Once you run across severe pitfalls (say, db.Search or using @Now in a view), you develop sort of a visceral revulsion to using it again except when it's the only option. Eventually, you build up enough working knowledge and callouses that it becomes second nature.


Once you have a good grasp of what is and isn't quick, it can really help you structure your programs more effectively. Not only does it help you write faster-executing code, but it also helps you understand when it does and does not matter: if you have code that will only execute once, you can intentionally forgo a faster-but-harder-to-maintain algorithm in favor of a clear-but-moderately-slower one as long as you know you're not falling off a performance cliff. That balance between writing for the computer and writing for other programmers is one of the crucial things to strive for.

Commenter Photo

Mick Moignard - Thu Jun 05 12:59:44 EDT 2014

Many database systems cache an awful lot of stuff in memory, and Domino isn't any different.  Which may mean that the difference in performance over looking something up vs caching it, when you take into account the effort involved in caching the data in the first place might not be very much.  And the code will be more complex, which has a cost-of-ownership attached to it.  

Commenter Photo

Ralf M Petter - Fri Jun 06 04:33:54 EDT 2014

Hi Mick!

No matter how efficient the database cache of Domino is (in reality compared to a relational database like DB/2 it is not very fast), the access to a HashMap or similar Data structure is multiple time faster.

 

Ralf

New Comment