libavl
A library for manipulation of balanced binary trees
Ben Pfaff
Table of Contents
Consider some techniques that can be used to find a particular item in a
data set. Typical methods include sequential searching, digital
searching, hash tables, and binary searching.
Sequential searching is simple, but slow (O(n)). Digital searching
requires that the entire data set be known in advance, and memory
efficient implementations are slow.
Hash tables are fast (O(1)) for static data sets, but they can be
wasteful of memory. It can be difficult to choose an effective hash
function. Some hash tables variants also make deletion an expensive
operation.
Binary search techniques work almost as quickly (O(log(n)) on an ordered
table, or on a binary tree. Binary trees also allow easy iteration over
the data in the tree in sorted order. With hash tables it is necessary
to sort the data before iterating, and after sorting the data is no
longer in hash form.
Binary trees are efficient for insertion, deletion, and searching, if
data are inserted in random order. But, if data are inserted in order
using a naive algorithm, binary search degenerates to sequential search.
In turn, this problem can be solved by rebalancing the tree after
each insertion or deletion. In rebalancing, nodes are rearranged via
transformations called rotations using an algorithm that tends to
minimize the tree's height.
There are several schemes for rebalancing binary trees. The two most
common types of balanced tree are AVL trees and red-black
trees. libavl implements both types:
-
AVL trees, invented by Russian mathematicians G. M. Adel'son-Velskii and
E. M. Landis, ensure that, for each node, the difference in height
between its subtrees (the balance factor) is not greater than 1.
-
Red-black trees, invented by R. Bayer and studied at length by
L. J. Guibas and R. Sedgewick, assign each node of a tree a color (red
or black), and specify a set of rules governing how red and black nodes
may be arranged.
The table below presents a comparison among unbalanced binary trees, AVL
trees, and red-black trees. In the table, n is the number of
nodes in the tree and h is the tree's height before the
operation. lg is the base-2 logarithm function.
Operation
Binary Tree
| AVL Tree
| Red-Black Tree
|
Time per insertion or deletion
O(h)
| O(lg n)
| O(lg n)
|
Time for insertion of k nodes having sequential values
O(k^2)
| O(n lg n)
| O(n lg n)
|
Time for insertion of k nodes having random values
O(n lg n)
| O(n lg n)
| O(n lg n)
|
Maximum number of rotations per insertion
0
| 1
| lg n
|
Maximum number of rotations per deletion
0
| lg n
| lg n
|
Maximum h as a function of n
n
| 1.44 lg (n + 2) - .328
| 2 lg (n + 1)
|
Minimum n as a function of h
h
| 2^((h + .328) / 1.44) - 2
| 2^(h / 2) - 1
|
There are alternatives to AVL trees that share some of their properties.
For instance, skip lists, 2-3 trees, and splay trees all allow O(log(n))
insertion and deletion. The main disadvantage of these methods is that
their operations are not as well documented in the literature.
Threading is a clever method that simplifies binary tree
traversal.
Nodes in a unthreaded binary tree that have zero or one subnodes have
two or one null subnode pointers, respectively. In a threaded binary
tree, a left child pointer that would otherwise be null is used to point
to the node's inorder(1)
predecessor, and in a null right child pointer points to its inorder
successor.
In a threaded tree, it is always possible to find the next node and the
previous node of a node, given only a pointer to the node in question.
In an unthreaded tree, it's also necessary to have a list of the nodes
between the node in question and root of the tree.
Advantages of a threaded tree compared to an unthreaded one include:
-
Faster traversal and less memory usage during traversal, since no stack
need be maintained.
-
Greater generality, since one can go from a node to its successor or
predecessor given only the node, simplifying algorithms that require
moving forward and backward in a tree.
Some disadvantages of threaded trees are:
-
Slower insertion and deletion, since threads need to be maintained. In
somes cases, this can be alleviated by constructing the tree as an
unthreaded tree, then threading it with a special libavl function.
-
In theory, threaded trees need two extra bits per node to indicate
whether each child pointer points to an ordinary node or the node's
successor/predecessor node. In libavl, however, these bits are stored
in a byte that is used for structure alignment padding in unthreaded
binary trees, so no extra storage is used.
A right-threaded binary tree is similar to a threaded binary tree,
but threads are only maintained on the right side of each node. This
allows for traversal to the right (toward larger values) but not to the
left (toward smaller values). Right-threaded trees are convenient when
the properties of a threaded tree are desirable, but traversal in
reverse sort order is not necessary. Not threading the left links saves
time in insertions and deletions.
Left-threaded binary trees also exist, but they are not implemented by
libavl. The same effect can be obtained by sorting the tree in the
opposite order.
The following types are defined and used by libavl:
- Data Type: avl_tree
-
- Data Type: avlt_tree
-
- Data Type: avltr_tree
-
- Data Type: rb_tree
-
These are the data types used to represent a tree. Although they are
defined in the libavl header files, it should never be necessary to
access them directly. Instead, all accesses should take place through
libavl functions.
- Data Type: avl_node
-
- Data Type: avlt_node
-
- Data Type: avltr_node
-
- Data Type: rb_node
-
These are the data types used to represent individual nodes in a tree.
Similar cautions apply as with
avl_tree
structures.
- Data Type: avl_traverser
-
- Data Type: avlt_traverser
-
- Data Type: avltr_traverser
-
- Data Type: rb_traverser
-
These are the data types used by the
avl_traverse
family of
functions to iterate across the tree. Again, these are opaque
structures.
- Data Type: avl_comparison_func
-
Every tree must have an ordering defined by a function of this type. It
must have the following signature:
int compare (const void *a, const void *b, void *param)
The return value is expected to be like that returned by strcmp
in the standard C library: negative if a < b, zero if
a = b, positive if a > b. param is an
arbitrary value defined by the user when the tree was created.
- Data Type: avl_node_func
-
This is a class of function called to perform an operation on a data
item. Functions of this type have the following signature:
void operate (void *data, void *param)
data is the data item and param is an arbitrary user-defined
value set when the tree was created.
- Data Type: avl_copy_func
-
This is a class of function called to make a new copy of a node's data.
Functions of this type have the following signature:
void *copy (void *data, void *param)
The function should return a new copy of data. param is an
arbitrary user-defined value set when the tree was created.
- Macro: AVL_MAX_HEIGHT
-
This macro defines the maximum height of an AVL tree that can be handled
by functions that maintain a stack of nodes descended. The default
value is 32, which allows for AVL trees with a maximum number of nodes
between 5,704,880 and 4,294,967,295, depending on order of insertion.
This macro may be defined by the user before including any AVL tree
header file, in which case libavl will honor that value.
- Macro: RB_MAX_HEIGHT
-
This macro defines the maximum height of an AVL tree that can be handled
by functions that maintain a stack of nodes descended. The default
value is 32, which allows for red-black trees with a maximum number of
nodes of at least 65535. This macro may be defined by the user before
including the red-black tree header file, in which case libavl will
honor that value.
libavl is four libraries in one:
-
An unthreaded AVL tree library.
-
A threaded AVL tree library.
-
A right-threaded AVL tree library.
-
A red-black tree library.
Identifiers in these libraries are prefixed by avl_
,
avlt_
, avltr_
, and rb_
, with corresponding header
files `avl.h', `avlt.h', `avltr.h', and `rb.h',
respectively. The functions that they declare are defined in the
`.c' files with the same names.
Most tree functions are implemented in all three libraries, but
threading allows more generality of operation. So, the threaded and
right-threaded libraries offer a few additional functions for finding
the next or previous node from a given node. In addition, they offer
functions for converting trees from threaded or right-threaded
representations to unthreaded, and vice versa.(2)
These functions deal with creation and destruction of AVL trees.
- Function: avl_tree * avl_create (avl_comparison_func compare, void *param)
-
- Function: avlt_tree * avlt_create (avlt_comparison_func compare, void *param)
-
- Function: avltr_tree * avltr_create (avltr_comparison_func compare, void *param)
-
- Function: rb_tree * rb_create (avl_comparison_func compare, void *param)
-
Create a new, empty tree with comparison function compare.
Arbitrary user data param is saved so that it can be passed to
user callback functions.
- Function: void avl_destroy (avl_tree *tree, avl_node_func free)
-
- Function: void avlt_destroy (avlt_tree *tree, avl_node_func free)
-
- Function: void avltr_destroy (avltr_tree *tree, avl_node_func free)
-
- Function: void rb_destroy (rb_tree *tree, avl_node_func free)
-
Destroys tree, releasing all of its storage. If free is
non-null, then it is called for every node in postorder before that node
is freed.
- Function: void avl_free (avl_tree *tree)
-
- Function: void avlt_free (avlt_tree *tree)
-
- Function: void avltr_free (avltr_tree *tree)
-
- Function: void rb_free (rb_tree *tree)
-
Destroys tree, releasing all of its storage. The data in each
node is freed with a call to the standard C library function
free
.
- Function: avl_tree * avl_copy (const avl_tree *tree, avl_copy_func copy)
-
- Function: avlt_tree * avl_copy (const avlt_tree *tree, avl_copy_func copy)
-
- Function: avltr_tree * avl_copy (const avltr_tree *tree, avl_copy_func copy)
-
- Function: rb_tree * rb_copy (const rb_tree *tree, avl_copy_func copy)
-
Copies the contents of tree into a new tree, and returns the new
tree. If copy is non-null, then it is called to make a new copy
of each node's data; otherwise, the node data is copied verbatim into
the new tree.
- Function: int avl_count (const avl_tree *tree)
-
- Function: int avlt_count (const avlt_tree *tree)
-
- Function: int avltr_count (const avltr_tree *tree)
-
- Function: int rb_count (const rb_tree *tree)
-
Returns the number of nodes in tree.
- Function: void * xmalloc (size_t size)
-
This is not a function defined by libavl. Instead, it is a function
that the user program can define. It must allocate size bytes
using
malloc
and return it. It can handle out-of-memory errors
however it chooses, but it may not ever return a null pointer.
If there is an xmalloc
function defined for use by libavl, the
source files (`avl.c', `avlt.c', `avltr.c', `rb.c')
must be compiled with HAVE_XMALLOC
defined. Otherwise, the
library will use its internal static xmalloc
, which handles
out-of-memory errors by printing a message `virtual memory
exhausted' to stderr and terminating the program with exit code
EXIT_FAILURE
.
These function insert nodes, delete nodes, and search for nodes in
trees.
- Function: void ** avl_probe (avl_tree *tree, void *data)
-
- Function: void ** avlt_probe (avlt_tree *tree, void *data)
-
- Function: void ** avltr_probe (avltr_tree *tree, void *data)
-
- Function: void ** rb_probe (rb_tree *tree, void *data)
-
These are the workhorse functions for tree insertion. They search
tree for a node with data matching data. If found, a
pointer to the matching data is returned. Otherwise, a new node is
created for data, and a pointer to that data is returned. In
either case, the pointer returned can be changed by the user, but the
key data used by the tree's comparison must not be changed(3).
It is usually easier to use one of the avl_insert
or
avl_replace
functions instead of avl_probe
directly.
Please note: It's not a particularly good idea to insert a null
pointer as a data item into a tree, because several libavl functions
return a null pointer to indicate failure. You can sometimes avoid a
problem by using functions that return a pointer to a pointer instead of
a plain pointer. Also be wary of this when casting an arithmetic type
to a void pointer for insertion--on typical architectures, 0's become
null pointers when this is done.
- Function: void * avl_insert (avl_tree *tree, void *data)
-
- Function: void * avlt_insert (avlt_tree *tree, void *data)
-
- Function: void * avltr_insert (avltr_tree *tree, void *data)
-
- Function: void * rb_insert (rb_tree *tree, void *data)
-
If a node with data matching data exists in tree, returns
the matching data item. Otherwise, inserts data into tree
and returns a null pointer.
- Function: void avl_force_insert (avl_tree *tree, void *data)
-
- Function: void avlt_force_insert (avlt_tree *tree, void *data)
-
- Function: void avltr_force_insert (avltr_tree *tree, void *data)
-
- Function: void rb_force_insert (rb_tree *tree, void *data)
-
Inserts data into tree. If a node with data matching
data exists in tree, aborts the program with an assertion
violation. This function is implemented as a macro; if it is used, the
standard C header
assert.h
must also be included. If macro
NDEBUG
is defined when a libavl header is included, these
functions are short-circuited to a direct call to avl_insert
,
and no check is performed.
- Function: void * avl_replace (avl_tree *tree, void *data)
-
- Function: void * avlt_replace (avlt_tree *tree, void *data)
-
- Function: void * avltr_replace (avltr_tree *tree, void *data)
-
- Function: void * rb_replace (rb_tree *tree, void *data)
-
If a node with data matching data, such that the comparison
function returns 0, exists in tree, replaces the node's data with
data and returns the node's former contents. Otherwise, inserts
data into tree and returns a null pointer.
- Function: void * avl_delete (avl_tree *tree, const void *data)
-
- Function: void * avlt_delete (avlt_tree *tree, const void *data)
-
- Function: void * avltr_delete (avltr_tree *tree, const void *data)
-
- Function: void * rb_delete (rb_tree *tree, const void *data)
-
Searches tree for a node with data matching data. If found,
the node is deleted and its data is returned. Otherwise, returns a null
pointer.
- Function: void * avl_force_delete (avl_tree *tree, const void *data)
-
- Function: void * avlt_force_delete (avlt_tree *tree, const void *data)
-
- Function: void * avltr_force_delete (avltr_tree *tree, const void *data)
-
- Function: void * rb_force_delete (rb_tree *tree, const void *data)
-
Deletes a node with data matching data from tree. If no
matching node is found, aborts the program with an assertion violation.
If macro
NDEBUG
is declared when a libavl header is included,
these functions are short-circuited to a direct call to
avl_delete
, and no check is performed.
These function search a tree for an item without making an insertion or
a deletion.
- Function: void * avl_find (avl_tree *tree, const void *data)
-
- Function: void ** avlt_find (avlt_tree *tree, const void *data)
-
- Function: void ** avltr_find (avltr_tree *tree, const void *data)
-
- Function: void * rb_find (rb_tree *tree, const void *data)
-
Searches tree for a node with data matching data, If found,
returns the node's data (for threaded and right-threaded trees, a
pointer to the node's data). Otherwise, returns a null pointer.
- Function: void * avl_find_close (avl_tree *tree, const void *data)
-
- Function: void ** avlt_find_close (avlt_tree *tree, const void *data)
-
- Function: void ** avltr_find_close (avltr_tree *tree, const void *data)
-
- Function: void * rb_find_close (rb_tree *tree, const void *data)
-
Searches tree for a node with data matching data. If found,
returns the node's data (for threaded and right-threaded trees, a
pointer to the node's data). If no matching item is found, then it
finds a node whose data is "close" to data; either the node
closest in value to data, or the node either before or after the
node with the closest value. Returns a null pointer if the tree does
not contain any nodes.
These functions allow the caller to iterate across the items in a tree.
- Function: void avl_walk (const avl_tree *tree, avl_node_func operate, void *param)
-
- Function: void avlt_walk (const avlt_tree *tree, avl_node_func operate, void *param)
-
- Function: void avltr_walk (const avltr_tree *tree, avl_node_func operate, void *param)
-
- Function: void rb_walk (const rb_tree *tree, avl_node_func operate, void *param)
-
Walks through all the nodes in tree, and calls function
operate for each node in inorder. param overrides the value
passed to
avl_create
(and family) for this operation only.
operate must not change the key data in the nodes in a way that
would reorder the data values or cause two values to become equal.
- Function: void * avl_traverse (const avl_tree *tree, avl_traverser *trav)
-
- Function: void * avlt_traverse (const avlt_tree *tree, avlt_traverser *trav)
-
- Function: void * avltr_traverse (const avltr_tree *tree, avltr_traverser *trav)
-
- Function: void * rb_traverse (const rb_tree *tree, rb_traverser *trav)
-
Returns each of tree's nodes' data values in sequence, then a null
pointer to indicate the last item. trav must be initialized
before the first call, either in a declaration like that below, or using
one of the functions below.
avl_traverser trav = AVL_TRAVERSER_INIT;
Each avl_traverser
(and family) is a separate, independent
iterator.
For threaded and right-threaded trees, avlt_next
or
avltr_next
, respectively, are faster and more memory-efficient
than avlt_traverse
or avltr_traverse
.
- Function: void * avl_init_traverser (avl_traverser *trav)
-
- Function: void * avlt_init_traverser (avlt_traverser *trav)
-
- Function: void * avltr_init_traverser (avltr_traverser *trav)
-
- Function: void * rb_init_traverser (rb_traverser *trav)
-
Initializes the specified tree traverser structure. After this function
is called, the next call to the corresponding
*_traverse
function
will return the smallest value in the appropriate tree.
- Function: void ** avlt_next (const avlt_tree *tree, void **data)
-
- Function: void ** avltr_next (const avltr_tree *tree, void **data)
-
data must be a null pointer or a pointer to a data item in AVL
tree tree. Returns a pointer to the next data item after
data in tree in inorder (this is the first item if
data is a null pointer), or a null pointer if data was the
last item in tree.
- Function: void ** avltr_prev (const avltr_tree *tree, void **data)
-
data must be a null pointer or a pointer to a data item in AVL
tree tree. Returns a pointer to the previous data item before
data in tree in inorder (this is the last, or greatest
valued, item if data is a null pointer), or a null pointer if
data was the first item in tree.
- Function: avlt_tree * avlt_thread (avl_tree *tree)
-
- Function: avltr_tree * avltr_thread (avl_tree *tree)
-
Adds symmetric threads or right threads, respectively, to unthreaded AVL
tree tree and returns a pointer to tree cast to the
appropriate type. After one of these functions is called, threaded or
right-threaded functions, as appropriate, must be used with tree;
unthreaded functions may not be used.
- Function: avl_tree * avlt_unthread (avlt_tree *tree)
-
- Function: avl_tree * avltr_unthread (avltr_tree *tree)
-
Cuts all threads in threaded or right-threaded, respectively, AVL tree
tree and returns a pointer to tree cast to
avl_tree
*
. After one of these functions is called, unthreaded functions must
be used with tree; threaded or right-threaded functions may not be
used.
libavl was written by Ben Pfaff blp@gnu.org.
libavl's generic tree algorithms and AVL algorithms are based on those
found in Donald Knuth's venerable Art of Computer Programming
series from Addison-Wesley, primarily Volumes 1 and 3. libavl's
red-black tree algorithms are based on those found in Cormen et al.,
Introduction to Algorithms, 2nd ed., from MIT Press.
a
Adel'son-Velskii, G. M.
Art of Computer Programming
author
AVL tree
avl_comparison_func
avl_copy, avl_copy, avl_copy
avl_copy_func
avl_count
avl_create
avl_delete
avl_destroy
avl_find
avl_find_close
avl_force_delete
avl_force_insert
avl_free
avl_init_traverser
avl_insert
AVL_MAX_HEIGHT
avl_node
avl_node_func
avl_probe
avl_replace
avl_traverse
avl_traverser
avl_tree
avl_walk
avlt_count
avlt_create
avlt_delete
avlt_destroy
avlt_find
avlt_find_close
avlt_force_delete
avlt_force_insert
avlt_free
avlt_init_traverser
avlt_insert
avlt_next
avlt_node
avlt_probe
avlt_replace
avlt_thread
avlt_traverse
avlt_traverser
avlt_tree
avlt_unthread
avlt_walk
avltr_count
avltr_create
avltr_delete
avltr_destroy
avltr_find
avltr_find_close
avltr_force_delete
avltr_force_insert
avltr_free
avltr_init_traverser
avltr_insert
avltr_next
avltr_node
avltr_prev
avltr_probe
avltr_replace
avltr_thread
avltr_traverse
avltr_traverser
avltr_tree
avltr_unthread
avltr_walk
b
binary tree
h
hash table
k
Knuth, Donald Ervin
l
Landis, E. M.
p
Pfaff, Benjamin Levy
r
rb_copy
rb_count
rb_create
rb_delete
rb_destroy
rb_find
rb_find_close
rb_force_delete
rb_force_insert
rb_free
rb_init_traverser
rb_insert
RB_MAX_HEIGHT
rb_node
rb_probe
rb_replace
rb_traverse
rb_traverser
rb_tree
rb_walk
rebalancing
red-black tree
right threads
t
threads
u
unthreaded
x
xmalloc
Footnotes
In tree traversal, inorder refers
to visiting the nodes in their sorted order from smallest to largest.
In general, you
should build the sort of tree that you need to use, but occasionally it
is useful to convert between tree types.
It
can be changed if this would not change the ordering of the nodes in the
tree; i.e., if this would not cause the data in the node to be less than
or equal to the previous node's data or greater than or equal to the
next node's data.
This document was generated on 6 October 1999 using the
texi2html
translator version 1.51a.