class SuffixTree
Declaration
class SuffixTree { /* full declaration omitted */ };
Description
A data structure for fast substring queries. Suffix trees represent the suffixes of their input strings in their leaves. A suffix tree is a type of compressed trie structure where each node represents an entire substring rather than a single character. Each leaf of the tree is a suffix. A suffix tree can be seen as a type of state machine where each state is a substring of the full string. The tree is structured so that, for a string of length N, there are exactly N leaves in the tree. This structure allows us to quickly find repeated substrings of the input string. In this implementation, a "string" is a vector of unsigned integers. These integers may result from hashing some data type. A suffix tree can contain 1 or many strings, which can then be queried as one large string. The suffix tree is implemented using Ukkonen's algorithm for linear-time suffix tree construction. Ukkonen's algorithm is explained in more detail in the paper by Esko Ukkonen "On-line construction of suffix trees. The paper is available at https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
Declared at: llvm/include/llvm/Support/SuffixTree.h:137
Member Variables
- public llvm::ArrayRef<unsigned int> Str
- Each element is an integer representing an instruction in the module.
- private llvm::SpecificBumpPtrAllocator<SuffixTreeNode> NodeAllocator
- Maintains each node in the tree.
- private llvm::SuffixTreeNode* Root = nullptr
- The root represents the empty string. It is maintained by the\p NodeAllocator like every other node in the tree.
- private llvm::BumpPtrAllocator InternalEndIdxAllocator
- Each internal node is guaranteed to never have its end index change during the construction algorithm; however, leaves must be updated at every step. Therefore, we need to store leaf end indices by reference to avoid updating O(N) leaves at every step of construction. Thus, every internal node must be allocated its own end index.
- private unsigned int LeafEndIdx = -1
- The end index of each leaf in the tree.
- private llvm::SuffixTree::ActiveState Active
- The point the next insertion will take place at in the construction algorithm.
Method Overview
- public SuffixTree(const std::vector<unsigned int> & Str)
- public llvm::SuffixTree::iterator begin()
- public llvm::SuffixTree::iterator end()
- private unsigned int extend(unsigned int EndIdx, unsigned int SuffixesToAdd)
- private llvm::SuffixTreeNode * insertInternalNode(llvm::SuffixTreeNode * Parent, unsigned int StartIdx, unsigned int EndIdx, unsigned int Edge)
- private llvm::SuffixTreeNode * insertLeaf(llvm::SuffixTreeNode & Parent, unsigned int StartIdx, unsigned int Edge)
- private void setSuffixIndices()
Methods
¶SuffixTree(const std::vector<unsigned int>& Str)
SuffixTree(const std::vector<unsigned int>& Str)
Description
Construct a suffix tree from a sequence of unsigned integers.
Declared at: llvm/include/llvm/Support/SuffixTree.h:235
Parameters
- const std::vector<unsigned int>& Str
- The string to construct the suffix tree for.
¶llvm::SuffixTree::iterator begin()
llvm::SuffixTree::iterator begin()
Declared at: llvm/include/llvm/Support/SuffixTree.h:344
¶llvm::SuffixTree::iterator end()
llvm::SuffixTree::iterator end()
Declared at: llvm/include/llvm/Support/SuffixTree.h:345
¶unsigned int extend(unsigned int EndIdx,
unsigned int SuffixesToAdd)
unsigned int extend(unsigned int EndIdx,
unsigned int SuffixesToAdd)
Description
Construct the suffix tree for the prefix of the input ending at\p EndIdx. Used to construct the full suffix tree iteratively. At the end of each step, the constructed suffix tree is either a valid suffix tree, or a suffix tree with implicit suffixes. At the end of the final step, the suffix tree is a valid tree.
Declared at: llvm/include/llvm/Support/SuffixTree.h:229
Parameters
- unsigned int EndIdx
- The end index of the current prefix in the main string.
- unsigned int SuffixesToAdd
- The number of suffixes that must be added to complete the suffix tree at the current phase.
Returns
The number of suffixes that have not been added at the end of this step.
¶llvm::SuffixTreeNode* insertInternalNode(
llvm::SuffixTreeNode* Parent,
unsigned int StartIdx,
unsigned int EndIdx,
unsigned int Edge)
llvm::SuffixTreeNode* insertInternalNode(
llvm::SuffixTreeNode* Parent,
unsigned int StartIdx,
unsigned int EndIdx,
unsigned int Edge)
Description
Allocate an internal node and add it to the tree.
Declared at: llvm/include/llvm/Support/SuffixTree.h:208
Parameters
- llvm::SuffixTreeNode* Parent
- The parent of this node. Only null when allocating the root.
- unsigned int StartIdx
- The start index of this node's associated string.
- unsigned int EndIdx
- The end index of this node's associated string.
- unsigned int Edge
- The label on the edge leaving \p Parent to this node.
Returns
A pointer to the allocated internal node.
¶llvm::SuffixTreeNode* insertLeaf(
llvm::SuffixTreeNode& Parent,
unsigned int StartIdx,
unsigned int Edge)
llvm::SuffixTreeNode* insertLeaf(
llvm::SuffixTreeNode& Parent,
unsigned int StartIdx,
unsigned int Edge)
Description
Allocate a leaf node and add it to the tree.
Declared at: llvm/include/llvm/Support/SuffixTree.h:197
Parameters
- llvm::SuffixTreeNode& Parent
- The parent of this node.
- unsigned int StartIdx
- The start index of this node's associated string.
- unsigned int Edge
- The label on the edge leaving \p Parent to this node.
Returns
A pointer to the allocated leaf node.
¶void setSuffixIndices()
void setSuffixIndices()
Description
Set the suffix indices of the leaves to the start indices of their respective suffixes.
Declared at: llvm/include/llvm/Support/SuffixTree.h:213