class SuffixTree

Declaration

class SuffixTree { /* full declaration omitted */ };

Description

A data structure for fast substring queries. Suffix trees represent the suffixes of their input strings in their leaves. A suffix tree is a type of compressed trie structure where each node represents an entire substring rather than a single character. Each leaf of the tree is a suffix. A suffix tree can be seen as a type of state machine where each state is a substring of the full string. The tree is structured so that, for a string of length N, there are exactly N leaves in the tree. This structure allows us to quickly find repeated substrings of the input string. In this implementation, a "string" is a vector of unsigned integers. These integers may result from hashing some data type. A suffix tree can contain 1 or many strings, which can then be queried as one large string. The suffix tree is implemented using Ukkonen's algorithm for linear-time suffix tree construction. Ukkonen's algorithm is explained in more detail in the paper by Esko Ukkonen "On-line construction of suffix trees. The paper is available at https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

Declared at: llvm/include/llvm/Support/SuffixTree.h:137

Member Variables

public llvm::ArrayRef<unsigned int> Str
Each element is an integer representing an instruction in the module.
private llvm::SpecificBumpPtrAllocator<SuffixTreeNode> NodeAllocator
Maintains each node in the tree.
private llvm::SuffixTreeNode* Root = nullptr
The root represents the empty string. It is maintained by the\p NodeAllocator like every other node in the tree.
private llvm::BumpPtrAllocator InternalEndIdxAllocator
Each internal node is guaranteed to never have its end index change during the construction algorithm; however, leaves must be updated at every step. Therefore, we need to store leaf end indices by reference to avoid updating O(N) leaves at every step of construction. Thus, every internal node must be allocated its own end index.
private unsigned int LeafEndIdx = -1
The end index of each leaf in the tree.
private llvm::SuffixTree::ActiveState Active
The point the next insertion will take place at in the construction algorithm.

Method Overview

  • public SuffixTree(const std::vector<unsigned int> & Str)
  • public llvm::SuffixTree::iterator begin()
  • public llvm::SuffixTree::iterator end()
  • private unsigned int extend(unsigned int EndIdx, unsigned int SuffixesToAdd)
  • private llvm::SuffixTreeNode * insertInternalNode(llvm::SuffixTreeNode * Parent, unsigned int StartIdx, unsigned int EndIdx, unsigned int Edge)
  • private llvm::SuffixTreeNode * insertLeaf(llvm::SuffixTreeNode & Parent, unsigned int StartIdx, unsigned int Edge)
  • private void setSuffixIndices()

Methods

SuffixTree(const std::vector<unsigned int>& Str)

Description

Construct a suffix tree from a sequence of unsigned integers.

Declared at: llvm/include/llvm/Support/SuffixTree.h:235

Parameters

const std::vector<unsigned int>& Str
The string to construct the suffix tree for.

llvm::SuffixTree::iterator begin()

Declared at: llvm/include/llvm/Support/SuffixTree.h:344

llvm::SuffixTree::iterator end()

Declared at: llvm/include/llvm/Support/SuffixTree.h:345

unsigned int extend(unsigned int EndIdx,
                    unsigned int SuffixesToAdd)

Description

Construct the suffix tree for the prefix of the input ending at\p EndIdx. Used to construct the full suffix tree iteratively. At the end of each step, the constructed suffix tree is either a valid suffix tree, or a suffix tree with implicit suffixes. At the end of the final step, the suffix tree is a valid tree.

Declared at: llvm/include/llvm/Support/SuffixTree.h:229

Parameters

unsigned int EndIdx
The end index of the current prefix in the main string.
unsigned int SuffixesToAdd
The number of suffixes that must be added to complete the suffix tree at the current phase.

Returns

The number of suffixes that have not been added at the end of this step.

llvm::SuffixTreeNode* insertInternalNode(
    llvm::SuffixTreeNode* Parent,
    unsigned int StartIdx,
    unsigned int EndIdx,
    unsigned int Edge)

Description

Allocate an internal node and add it to the tree.

Declared at: llvm/include/llvm/Support/SuffixTree.h:208

Parameters

llvm::SuffixTreeNode* Parent
The parent of this node. Only null when allocating the root.
unsigned int StartIdx
The start index of this node's associated string.
unsigned int EndIdx
The end index of this node's associated string.
unsigned int Edge
The label on the edge leaving \p Parent to this node.

Returns

A pointer to the allocated internal node.

llvm::SuffixTreeNode* insertLeaf(
    llvm::SuffixTreeNode& Parent,
    unsigned int StartIdx,
    unsigned int Edge)

Description

Allocate a leaf node and add it to the tree.

Declared at: llvm/include/llvm/Support/SuffixTree.h:197

Parameters

llvm::SuffixTreeNode& Parent
The parent of this node.
unsigned int StartIdx
The start index of this node's associated string.
unsigned int Edge
The label on the edge leaving \p Parent to this node.

Returns

A pointer to the allocated leaf node.

void setSuffixIndices()

Description

Set the suffix indices of the leaves to the start indices of their respective suffixes.

Declared at: llvm/include/llvm/Support/SuffixTree.h:213