Trees

Indexes in Noblit are hitchhiker trees. The trees have the same on-disk format as memory format, which allows them to be memory mapped. A hitchhiker tree is similar to an immutable B-tree, but updates usually only allocate a single new node, rather than logB(n) nodes. This gives us the advantages of an immutable data structure, without the write amplification.

Tree nodes in Noblit are immutable. Index updates produce new nodes that succeed older nodes. Nodes can become unreachable, but they are never removed. Eventually, many nodes in a file may not be reachable. In that case the tree can be copied to a new file, omitting the unreachable nodes. This is a copying garbage collection.

Index Trees

Indexes in Noblit are sorted sets of datoms. Each datom is 32 bytes. (For large values, the datom contains a reference to a value on the heap.) The indexes store the datoms themselves: they are sorted sets, not key-values maps. In other words, indexes are covering indexes.

Trees in Noblit store every datom exactly once. Datoms in leaves are not repeated in interior nodes, unlike a B+ tree, which would store all datoms in leaf nodes, and repeat some datoms as midpoints in interior nodes.

In addition to the midpoint datoms, tree nodes store pending datoms: datoms that need to be flushed into the leaves, but which we avoid as long as possible. This modification is what turns a B-tree into a hitchhiker tree.

Disk Format

Tree nodes are 4096 bytes.

  • Byte 4088 through 4095 contains the node header, which is built up of the following 8 bytes:
  • Byte 0 contains the depth of the node (0 for a leaf, 1 for its parent, etc.).
  • Byte 1 contains the number of datoms in the node internally, say k. k is at most 102. TODO: Leaf nodes could store 127 datoms and no child page ids, as opposed to 102 midpoints.
  • The remaining 6 bytes are currently not used. TODO: I could store a checksum there.
  • At byte 0, the datom array starts. It contains k datoms in increasing order.
  • At byte 3264, the child array starts. It contains 64-bit page ids of the child nodes. The child array has k + 1 elements, such that all datoms in node children[i] order before datoms[i]. All datoms in node children[i + 1] order after datoms[i]. The special value 264 – 1 indicates that there is no child node in this slot. If this is the case for slot i, then datom i is a pending datom rather than a midpoint, and that datom should be flushed to a child node if space runs out in the node.