libavl

A library for manipulation of balanced binary trees

Ben Pfaff


Table of Contents


Introduction to balanced binary trees

Consider some techniques that can be used to find a particular item in a data set. Typical methods include sequential searching, digital searching, hash tables, and binary searching.

Sequential searching is simple, but slow (O(n)). Digital searching requires that the entire data set be known in advance, and memory efficient implementations are slow.

Hash tables are fast (O(1)) for static data sets, but they can be wasteful of memory. It can be difficult to choose an effective hash function. Some hash tables variants also make deletion an expensive operation.

Binary search techniques work almost as quickly (O(log(n)) on an ordered table, or on a binary tree. Binary trees also allow easy iteration over the data in the tree in sorted order. With hash tables it is necessary to sort the data before iterating, and after sorting the data is no longer in hash form.

Binary trees are efficient for insertion, deletion, and searching, if data are inserted in random order. But, if data are inserted in order using a naive algorithm, binary search degenerates to sequential search.

In turn, this problem can be solved by rebalancing the tree after each insertion or deletion. In rebalancing, nodes are rearranged via transformations called rotations using an algorithm that tends to minimize the tree's height.

There are several schemes for rebalancing binary trees. The two most common types of balanced tree are AVL trees and red-black trees. libavl implements both types:

The table below presents a comparison among unbalanced binary trees, AVL trees, and red-black trees. In the table, n is the number of nodes in the tree and h is the tree's height before the operation. lg is the base-2 logarithm function.

Operation
Time per insertion or deletion
Time for insertion of k nodes having sequential values
Time for insertion of k nodes having random values
Maximum number of rotations per insertion
Maximum number of rotations per deletion
Maximum h as a function of n
Minimum n as a function of h
Binary Tree AVL Tree Red-Black Tree
O(h) O(lg n) O(lg n)
O(k^2) O(n lg n) O(n lg n)
O(n lg n) O(n lg n) O(n lg n)
0 1 lg n
0 lg n lg n
n 1.44 lg (n + 2) - .328 2 lg (n + 1)
h 2^((h + .328) / 1.44) - 2 2^(h / 2) - 1
There are alternatives to AVL trees that share some of their properties. For instance, skip lists, 2-3 trees, and splay trees all allow O(log(n)) insertion and deletion. The main disadvantage of these methods is that their operations are not as well documented in the literature.

Introduction to threaded trees

Threading is a clever method that simplifies binary tree traversal.

Nodes in a unthreaded binary tree that have zero or one subnodes have two or one null subnode pointers, respectively. In a threaded binary tree, a left child pointer that would otherwise be null is used to point to the node's inorder(1) predecessor, and in a null right child pointer points to its inorder successor.

In a threaded tree, it is always possible to find the next node and the previous node of a node, given only a pointer to the node in question. In an unthreaded tree, it's also necessary to have a list of the nodes between the node in question and root of the tree.

Advantages of a threaded tree compared to an unthreaded one include:

Some disadvantages of threaded trees are:

A right-threaded binary tree is similar to a threaded binary tree, but threads are only maintained on the right side of each node. This allows for traversal to the right (toward larger values) but not to the left (toward smaller values). Right-threaded trees are convenient when the properties of a threaded tree are desirable, but traversal in reverse sort order is not necessary. Not threading the left links saves time in insertions and deletions.

Left-threaded binary trees also exist, but they are not implemented by libavl. The same effect can be obtained by sorting the tree in the opposite order.

Types

The following types are defined and used by libavl:

Data Type: avl_tree
Data Type: avlt_tree
Data Type: avltr_tree
Data Type: rb_tree
These are the data types used to represent a tree. Although they are defined in the libavl header files, it should never be necessary to access them directly. Instead, all accesses should take place through libavl functions.

Data Type: avl_node
Data Type: avlt_node
Data Type: avltr_node
Data Type: rb_node
These are the data types used to represent individual nodes in a tree. Similar cautions apply as with avl_tree structures.

Data Type: avl_traverser
Data Type: avlt_traverser
Data Type: avltr_traverser
Data Type: rb_traverser
These are the data types used by the avl_traverse family of functions to iterate across the tree. Again, these are opaque structures.

Data Type: avl_comparison_func
Every tree must have an ordering defined by a function of this type. It must have the following signature:

int compare (const void *a, const void *b, void *param)

The return value is expected to be like that returned by strcmp in the standard C library: negative if a < b, zero if a = b, positive if a > b. param is an arbitrary value defined by the user when the tree was created.

Data Type: avl_node_func
This is a class of function called to perform an operation on a data item. Functions of this type have the following signature:

void operate (void *data, void *param)

data is the data item and param is an arbitrary user-defined value set when the tree was created.

Data Type: avl_copy_func

This is a class of function called to make a new copy of a node's data. Functions of this type have the following signature:

void *copy (void *data, void *param)

The function should return a new copy of data. param is an arbitrary user-defined value set when the tree was created.

Macro: AVL_MAX_HEIGHT
This macro defines the maximum height of an AVL tree that can be handled by functions that maintain a stack of nodes descended. The default value is 32, which allows for AVL trees with a maximum number of nodes between 5,704,880 and 4,294,967,295, depending on order of insertion. This macro may be defined by the user before including any AVL tree header file, in which case libavl will honor that value.

Macro: RB_MAX_HEIGHT
This macro defines the maximum height of an AVL tree that can be handled by functions that maintain a stack of nodes descended. The default value is 32, which allows for red-black trees with a maximum number of nodes of at least 65535. This macro may be defined by the user before including the red-black tree header file, in which case libavl will honor that value.

Functions

libavl is four libraries in one:

Identifiers in these libraries are prefixed by avl_, avlt_, avltr_, and rb_, with corresponding header files `avl.h', `avlt.h', `avltr.h', and `rb.h', respectively. The functions that they declare are defined in the `.c' files with the same names.

Most tree functions are implemented in all three libraries, but threading allows more generality of operation. So, the threaded and right-threaded libraries offer a few additional functions for finding the next or previous node from a given node. In addition, they offer functions for converting trees from threaded or right-threaded representations to unthreaded, and vice versa.(2)

Tree Creation

These functions deal with creation and destruction of AVL trees.

Function: avl_tree * avl_create (avl_comparison_func compare, void *param)
Function: avlt_tree * avlt_create (avlt_comparison_func compare, void *param)
Function: avltr_tree * avltr_create (avltr_comparison_func compare, void *param)
Function: rb_tree * rb_create (avl_comparison_func compare, void *param)
Create a new, empty tree with comparison function compare. Arbitrary user data param is saved so that it can be passed to user callback functions.

Function: void avl_destroy (avl_tree *tree, avl_node_func free)
Function: void avlt_destroy (avlt_tree *tree, avl_node_func free)
Function: void avltr_destroy (avltr_tree *tree, avl_node_func free)
Function: void rb_destroy (rb_tree *tree, avl_node_func free)
Destroys tree, releasing all of its storage. If free is non-null, then it is called for every node in postorder before that node is freed.

Function: void avl_free (avl_tree *tree)
Function: void avlt_free (avlt_tree *tree)
Function: void avltr_free (avltr_tree *tree)
Function: void rb_free (rb_tree *tree)
Destroys tree, releasing all of its storage. The data in each node is freed with a call to the standard C library function free.

Function: avl_tree * avl_copy (const avl_tree *tree, avl_copy_func copy)
Function: avlt_tree * avl_copy (const avlt_tree *tree, avl_copy_func copy)
Function: avltr_tree * avl_copy (const avltr_tree *tree, avl_copy_func copy)
Function: rb_tree * rb_copy (const rb_tree *tree, avl_copy_func copy)
Copies the contents of tree into a new tree, and returns the new tree. If copy is non-null, then it is called to make a new copy of each node's data; otherwise, the node data is copied verbatim into the new tree.

Function: int avl_count (const avl_tree *tree)
Function: int avlt_count (const avlt_tree *tree)
Function: int avltr_count (const avltr_tree *tree)
Function: int rb_count (const rb_tree *tree)
Returns the number of nodes in tree.

Function: void * xmalloc (size_t size)
This is not a function defined by libavl. Instead, it is a function that the user program can define. It must allocate size bytes using malloc and return it. It can handle out-of-memory errors however it chooses, but it may not ever return a null pointer.

If there is an xmalloc function defined for use by libavl, the source files (`avl.c', `avlt.c', `avltr.c', `rb.c') must be compiled with HAVE_XMALLOC defined. Otherwise, the library will use its internal static xmalloc, which handles out-of-memory errors by printing a message `virtual memory exhausted' to stderr and terminating the program with exit code EXIT_FAILURE.

Insertion and Deletion

These function insert nodes, delete nodes, and search for nodes in trees.

Function: void ** avl_probe (avl_tree *tree, void *data)
Function: void ** avlt_probe (avlt_tree *tree, void *data)
Function: void ** avltr_probe (avltr_tree *tree, void *data)
Function: void ** rb_probe (rb_tree *tree, void *data)
These are the workhorse functions for tree insertion. They search tree for a node with data matching data. If found, a pointer to the matching data is returned. Otherwise, a new node is created for data, and a pointer to that data is returned. In either case, the pointer returned can be changed by the user, but the key data used by the tree's comparison must not be changed(3).

It is usually easier to use one of the avl_insert or avl_replace functions instead of avl_probe directly.

Please note: It's not a particularly good idea to insert a null pointer as a data item into a tree, because several libavl functions return a null pointer to indicate failure. You can sometimes avoid a problem by using functions that return a pointer to a pointer instead of a plain pointer. Also be wary of this when casting an arithmetic type to a void pointer for insertion--on typical architectures, 0's become null pointers when this is done.

Function: void * avl_insert (avl_tree *tree, void *data)
Function: void * avlt_insert (avlt_tree *tree, void *data)
Function: void * avltr_insert (avltr_tree *tree, void *data)
Function: void * rb_insert (rb_tree *tree, void *data)
If a node with data matching data exists in tree, returns the matching data item. Otherwise, inserts data into tree and returns a null pointer.

Function: void avl_force_insert (avl_tree *tree, void *data)
Function: void avlt_force_insert (avlt_tree *tree, void *data)
Function: void avltr_force_insert (avltr_tree *tree, void *data)
Function: void rb_force_insert (rb_tree *tree, void *data)
Inserts data into tree. If a node with data matching data exists in tree, aborts the program with an assertion violation. This function is implemented as a macro; if it is used, the standard C header assert.h must also be included. If macro NDEBUG is defined when a libavl header is included, these functions are short-circuited to a direct call to avl_insert, and no check is performed.

Function: void * avl_replace (avl_tree *tree, void *data)
Function: void * avlt_replace (avlt_tree *tree, void *data)
Function: void * avltr_replace (avltr_tree *tree, void *data)
Function: void * rb_replace (rb_tree *tree, void *data)
If a node with data matching data, such that the comparison function returns 0, exists in tree, replaces the node's data with data and returns the node's former contents. Otherwise, inserts data into tree and returns a null pointer.

Function: void * avl_delete (avl_tree *tree, const void *data)
Function: void * avlt_delete (avlt_tree *tree, const void *data)
Function: void * avltr_delete (avltr_tree *tree, const void *data)
Function: void * rb_delete (rb_tree *tree, const void *data)
Searches tree for a node with data matching data. If found, the node is deleted and its data is returned. Otherwise, returns a null pointer.

Function: void * avl_force_delete (avl_tree *tree, const void *data)
Function: void * avlt_force_delete (avlt_tree *tree, const void *data)
Function: void * avltr_force_delete (avltr_tree *tree, const void *data)
Function: void * rb_force_delete (rb_tree *tree, const void *data)
Deletes a node with data matching data from tree. If no matching node is found, aborts the program with an assertion violation. If macro NDEBUG is declared when a libavl header is included, these functions are short-circuited to a direct call to avl_delete, and no check is performed.

Searching

These function search a tree for an item without making an insertion or a deletion.

Function: void * avl_find (avl_tree *tree, const void *data)
Function: void ** avlt_find (avlt_tree *tree, const void *data)
Function: void ** avltr_find (avltr_tree *tree, const void *data)
Function: void * rb_find (rb_tree *tree, const void *data)
Searches tree for a node with data matching data, If found, returns the node's data (for threaded and right-threaded trees, a pointer to the node's data). Otherwise, returns a null pointer.

Function: void * avl_find_close (avl_tree *tree, const void *data)
Function: void ** avlt_find_close (avlt_tree *tree, const void *data)
Function: void ** avltr_find_close (avltr_tree *tree, const void *data)
Function: void * rb_find_close (rb_tree *tree, const void *data)
Searches tree for a node with data matching data. If found, returns the node's data (for threaded and right-threaded trees, a pointer to the node's data). If no matching item is found, then it finds a node whose data is "close" to data; either the node closest in value to data, or the node either before or after the node with the closest value. Returns a null pointer if the tree does not contain any nodes.

Iteration

These functions allow the caller to iterate across the items in a tree.

Function: void avl_walk (const avl_tree *tree, avl_node_func operate, void *param)
Function: void avlt_walk (const avlt_tree *tree, avl_node_func operate, void *param)
Function: void avltr_walk (const avltr_tree *tree, avl_node_func operate, void *param)
Function: void rb_walk (const rb_tree *tree, avl_node_func operate, void *param)
Walks through all the nodes in tree, and calls function operate for each node in inorder. param overrides the value passed to avl_create (and family) for this operation only. operate must not change the key data in the nodes in a way that would reorder the data values or cause two values to become equal.

Function: void * avl_traverse (const avl_tree *tree, avl_traverser *trav)
Function: void * avlt_traverse (const avlt_tree *tree, avlt_traverser *trav)
Function: void * avltr_traverse (const avltr_tree *tree, avltr_traverser *trav)
Function: void * rb_traverse (const rb_tree *tree, rb_traverser *trav)
Returns each of tree's nodes' data values in sequence, then a null pointer to indicate the last item. trav must be initialized before the first call, either in a declaration like that below, or using one of the functions below.

avl_traverser trav = AVL_TRAVERSER_INIT;

Each avl_traverser (and family) is a separate, independent iterator.

For threaded and right-threaded trees, avlt_next or avltr_next, respectively, are faster and more memory-efficient than avlt_traverse or avltr_traverse.

Function: void * avl_init_traverser (avl_traverser *trav)
Function: void * avlt_init_traverser (avlt_traverser *trav)
Function: void * avltr_init_traverser (avltr_traverser *trav)
Function: void * rb_init_traverser (rb_traverser *trav)
Initializes the specified tree traverser structure. After this function is called, the next call to the corresponding *_traverse function will return the smallest value in the appropriate tree.

Function: void ** avlt_next (const avlt_tree *tree, void **data)
Function: void ** avltr_next (const avltr_tree *tree, void **data)
data must be a null pointer or a pointer to a data item in AVL tree tree. Returns a pointer to the next data item after data in tree in inorder (this is the first item if data is a null pointer), or a null pointer if data was the last item in tree.

Function: void ** avltr_prev (const avltr_tree *tree, void **data)
data must be a null pointer or a pointer to a data item in AVL tree tree. Returns a pointer to the previous data item before data in tree in inorder (this is the last, or greatest valued, item if data is a null pointer), or a null pointer if data was the first item in tree.

Conversion

Function: avlt_tree * avlt_thread (avl_tree *tree)
Function: avltr_tree * avltr_thread (avl_tree *tree)
Adds symmetric threads or right threads, respectively, to unthreaded AVL tree tree and returns a pointer to tree cast to the appropriate type. After one of these functions is called, threaded or right-threaded functions, as appropriate, must be used with tree; unthreaded functions may not be used.

Function: avl_tree * avlt_unthread (avlt_tree *tree)
Function: avl_tree * avltr_unthread (avltr_tree *tree)
Cuts all threads in threaded or right-threaded, respectively, AVL tree tree and returns a pointer to tree cast to avl_tree *. After one of these functions is called, unthreaded functions must be used with tree; threaded or right-threaded functions may not be used.

Author

libavl was written by Ben Pfaff blp@gnu.org.

libavl's generic tree algorithms and AVL algorithms are based on those found in Donald Knuth's venerable Art of Computer Programming series from Addison-Wesley, primarily Volumes 1 and 3. libavl's red-black tree algorithms are based on those found in Cormen et al., Introduction to Algorithms, 2nd ed., from MIT Press.

Index

a

  • Adel'son-Velskii, G. M.
  • Art of Computer Programming
  • author
  • AVL tree
  • avl_comparison_func
  • avl_copy, avl_copy, avl_copy
  • avl_copy_func
  • avl_count
  • avl_create
  • avl_delete
  • avl_destroy
  • avl_find
  • avl_find_close
  • avl_force_delete
  • avl_force_insert
  • avl_free
  • avl_init_traverser
  • avl_insert
  • AVL_MAX_HEIGHT
  • avl_node
  • avl_node_func
  • avl_probe
  • avl_replace
  • avl_traverse
  • avl_traverser
  • avl_tree
  • avl_walk
  • avlt_count
  • avlt_create
  • avlt_delete
  • avlt_destroy
  • avlt_find
  • avlt_find_close
  • avlt_force_delete
  • avlt_force_insert
  • avlt_free
  • avlt_init_traverser
  • avlt_insert
  • avlt_next
  • avlt_node
  • avlt_probe
  • avlt_replace
  • avlt_thread
  • avlt_traverse
  • avlt_traverser
  • avlt_tree
  • avlt_unthread
  • avlt_walk
  • avltr_count
  • avltr_create
  • avltr_delete
  • avltr_destroy
  • avltr_find
  • avltr_find_close
  • avltr_force_delete
  • avltr_force_insert
  • avltr_free
  • avltr_init_traverser
  • avltr_insert
  • avltr_next
  • avltr_node
  • avltr_prev
  • avltr_probe
  • avltr_replace
  • avltr_thread
  • avltr_traverse
  • avltr_traverser
  • avltr_tree
  • avltr_unthread
  • avltr_walk
  • b

  • binary tree
  • h

  • hash table
  • k

  • Knuth, Donald Ervin
  • l

  • Landis, E. M.
  • p

  • Pfaff, Benjamin Levy
  • r

  • rb_copy
  • rb_count
  • rb_create
  • rb_delete
  • rb_destroy
  • rb_find
  • rb_find_close
  • rb_force_delete
  • rb_force_insert
  • rb_free
  • rb_init_traverser
  • rb_insert
  • RB_MAX_HEIGHT
  • rb_node
  • rb_probe
  • rb_replace
  • rb_traverse
  • rb_traverser
  • rb_tree
  • rb_walk
  • rebalancing
  • red-black tree
  • right threads
  • t

  • threads
  • u

  • unthreaded
  • x

  • xmalloc

  • Footnotes

    (1)

    In tree traversal, inorder refers to visiting the nodes in their sorted order from smallest to largest.

    (2)

    In general, you should build the sort of tree that you need to use, but occasionally it is useful to convert between tree types.

    (3)

    It can be changed if this would not change the ordering of the nodes in the tree; i.e., if this would not cause the data in the node to be less than or equal to the previous node's data or greater than or equal to the next node's data.


    This document was generated on 6 October 1999 using the texi2html translator version 1.51a.