Mitch Richling: Example DB Programs

You will find several simple examples of programs to get you started with the traditional UNIX database libraries NDBM, GDBM or Berkeley DB. See the notes below for background information. The makefile is here.

NDBM (dbm)

mkNDBM.c: Create a simple NDBM database (Works with GDBM in compatibility mode too)
rdNDBM.c: Read in and display the NDBM database created by mkNDBM.c.

GDBM (GNU DBM)

mkGDBM.c: Create a simple GDBM database.
rdGDBM.c: Read in and display the GDBM database created by mkGDBM.c.

Berkeley DB

mkBerkeleyDB.c: Create a simple Berkeley DB (BTREE)
rdBerkeleyDB.c: Read in and display the Berkeley DB created by mkBerkeleyDB.c.

Notes & FAQ

DBM, originally developed at UC Berkeley, is a VERY high performance, key-value pair database library. Modern programmers might think of the key-value pair database as an associative container backed with a persistent disk store. DBM doesn't support transactional operations, any form of concurrency, or general query (SQL) capabilities. DBM only allows a program to have ONE database open at any time, and data payloads are limited to 1K (i.e. the key and value must each be less than 1K). Later, NDBM, which stands for New-DBD, was developed as an improved version including simple access to multiple databases at the same time. Later versions of NDBM removed the limits on the size of key and value data. Unfortunately, NDBM uses a different file format than DBM. While DBM and NDBM are BSDisms, the interfaces are widely available on modern System V derived systems like Solaris and Linux. NDBM appeared in the Single UNIX Specification version 2, and is thus available on a wide array of systems.

Even with NDBM widely available, an improved GNU-DBM, called GDBM, was has been developed. As part of the GNU project, GDBM is even more widely available and may be a useful option if portability to odd hardware or software platforms is a must. GDBM has many improvements, but it uses yet another file format. GDBM has compatibility modes for both DBM/NDBM file formats and source code. While it is tempting to use the compatibility modes and simply stick with the older interfaces, it is important to note that the compatibility modes provide compatibility with the limitations of the older libraries too! GDMB has several notable improvements over standard NDBM. It supports multiple readers into a DB, has no size limits on records, much better tuning options, and better error reporting. GDBM provides all of this while still providing an almost identical API -- an NDBM program can be converted into GDBM mostly via search and replace. The removal of size limitations on what can be stored has come at a cost. This cost is increased memory allocation and deallocation overhead, and a requirement for users of GDBM to free up memory allocated by the library. This is one of the most common sources of memory leaks in newly converted GDBM programs -- one must free the memory allocated by functions like gdbm_fetch(). I highly recommend stepping up to the better capabilities of GDBM if you are going to use it -- why not, GDBM compiles on most any platform worth using!

An even more sophisticated library called Berkeley DB is commercially available. An older version of Berkeley DB, around v1.85, is included with many BSD variants and the NDBM interface is implemented using this version of Berkeley DB. As a result, the NDBM implementations on BSD systems have no size limitations on what can be stored and no user required memory management! Modern versions of Berkeley DB support concurrency, transactions, and a variety of sophisticated features. All of this wonderful stuff is implemented while still supporting the same design principles of the other DBMs -- simple, clean, and easy to use. The API is quite different from the other DBMs -- more consistent and less difficult to use in my opinion. I HIGHLY recommend the Berkeley DB package if you are looking for uncompromising performance, ease of use, and powerful features! (Hmm, that sounded like a commercial. I represent Sleepycat in no way, I just really like the product)

Can XDBM store things other than null terminated strings?: YES! In fact, they all store the specified number of bytes of the binary glob of data is pointed to by the structure handed to them. They do NOT store strings! The example programs simply tell them, via strlen(foo)+1, to store everything pointed to by the data pointers including the NULL. This is done simply to make printing out the stuff stored in the DB a simple matter in C. The complete ignorance of what is stored in a XDBM database is one of the most powerful aspects of the libraries presented here. You can put whatever you want in them: C++ objects, strings, built in types like ints, or just globs of binary data.
Will XDBM null terminate data I put into it?: NO! None of the libraries will do anything to the data you stuff into them. This is a feature. See Q1.
Which libraries have size limitations for objects I store: DBM is limited to 1024. Many NDBMs are too, but some are not -- BSD implementations for example. GDBM has no limitations if you don't use the compatibility functions. Berkeley DB has no limitations -- or at least none you are likely to hit.
How "free" is Berkeley DB?: Check out the license at Sleepycat! Last time I looked you could use it for free on a GPL project, but you had to pay for it if you wanted to make money of of something you used it for. BTW, IANAL (I am not a lawyer).
I don't have a "key" but I want to store things in a DB! What do I do?: Invent a string key -- from an integer. You can also use Berkeley DB and have it generate record numbers for you. This is a handy feature of Berkeley DB.
Do the DBs use hashes, btrees, or what?: They all use hashes, but Berkeley DB can also use a BTREE structure.
I am worried about leaving the DB in a bad state if my program crashes.: You can sync with GDBM and minimize this problem. With Berkeley DB you can use transactions to solve this problem.
What kind of server should I run my database on?: If you are asking this question, then you probably don't understand what kinds of DBs we are talking about. The databases discussed here are embedded and don't require a server -- they are not like Oracle, mySQL, or PostgreSQL.
How can I interact with my database with SQL?: People asking this question are usually already using an SQL database and getting less than enough performance. Somebody told them that GDBM or Sleepycat was the fastest DB on the planet and they should use that instead of the RDB they are currently using. Unfortunately, the DBs discussed here are not SQL DBs and are not drop in replacements for such applications. You can implement SQL databases using the tools discussed here as an underlying system (like mySQL uses Berkeley DB under the hood); however, it is difficult to switch a nontrivial relational DB application over to any of the DBs we are talking about here. Sorry. If you were just curious, then the short answer is: "you can't" -- SQL would not be too much use for a key-value pair DB anyhow...
Can you help me with my DB problem?: Sure. Be prepared to send me money if the problem is not a fun one or something I don't really have time for.