A ternary search tree is a type of trie (sometimes called a prefix tree) where nodes are arranged in a manner similar to a binary search tree, but with up to three children rather than the binary tree’s limit of two. Like other prefix trees, a ternary search tree can be used as an associative map structure with the ability for incremental string search.

Description

Each node of a ternary search tree stores a single character, an indicator and pointers to its three children conventionally named equal kid, lo kid and hi kid, which can also be referred respectively as middle (child), lower (child) and higher (child). The lists of class tstTree created in the package name this objects as:

The indicator marks whether or not the node is the end of a word. The lo kid pointer must point to a node whose character value is less than the current node. The hi kid pointer must point to a node whose character is greater than the current node. The equal kid points to the next character in the word. The figure below shows a ternary search tree with the strings “cat”, “bug”, “cats” and “up”:

Ternary-Search-Tree

As with other trie data structures, each node in a ternary search tree represents a prefix of the stored strings. All strings in the middle subtree of a node start with that prefix.

One of the advantage of using ternary search trees over tries is that ternary search trees are a more space efficient (involve only three pointers per node as compared to 26 in standard tries). Further, ternary search trees can be used any time a hashtable would be used to store strings. Ternary search trees are efficient to use(in terms of space) when the strings to be stored share a common prefix.

Searches in a ternary search tree are more efficient when the strings inserted are shuffled (not in alphabetical order).

Functions

The function newTree() creates a new object of class tstTree. Takes as input a character vector or a file (.txt or .csv) with the strings to construct the tree, were each character will be a node. After processing all strings, it reports the total number of words and nodes in the tree.

# Create a tree with the names of the US states
states <- sample(state.name)
stateTree <- newTree(states)
#> Tree created with 50 words and 358 nodes

str(stateTree)
#> List of 5
#>  $ ch  : chr [1:358] "W" "a" "s" "h" ...
#>  $ flag: num [1:358] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ L   : num [1:281] 23 NA NA NA NA NA NA NA NA NA ...
#>  $ R   : num [1:299] NA 11 NA NA NA NA NA NA NA NA ...
#>  $ C   : num [1:357] 2 3 4 5 6 7 8 9 10 NA ...
#>  - attr(*, "class")= chr [1:2] "list" "tstTree"

The created tree can then be updated with more words with the function updateTree() to add a batch of words or with addWord() to add a single string. The name of the tree to be modified must be passed as an argument to both functions. updateTree also reports the number of strings added and the total number of nodes in the modified tree.

Use dimTree() with a tstTree class object to know the dimensions of the tree. It returns a numeric vector where the first number is the total number of strings and the second is the total number of nodes.

# Add some Canada regions to the previous stateTree
regions  <- c("Quebec", "Ontario", "Manitoba", "Saskatchewan", "Alberta", "British Columbia")
US.CanadaTree <- updateTree(stateTree, regions)
#> Tree updated with 6 words and the total nodes are 408

# Add one more region
US.CanadaTree <- addWord(US.CanadaTree, "Yukon")

# View the final dimensions of the tree
dimTree(US.CanadaTree)
#> [1]  57 413

To know if a particular string has been added to the tree use searchWord() with a tstTree class object and the string to look for. It returns TRUE or FALSE depending on whether or not the string is in the tree.

Another way to search for words is the completeWord() function. It receives as input an incomplete string and returns all strings in the tree that begins exactly with that input string.

# Search a specific state
searchWord(US.CanadaTree, "Alabama")
#> [1] TRUE
searchWord(US.CanadaTree, "Baltimore")
#> [1] FALSE

# Complete strings: States and regions that begin with "A" and "Al"
completeWord(US.CanadaTree, "A")
#> [1] "Alaska"   "Alabama"  "Alberta"  "Arkansas" "Arizona"
completeWord(US.CanadaTree, "Al")
#> [1] "Alaska"  "Alabama" "Alberta"

More information about ternary search trees can be found at Wikipedia – Ternary Search Tree.