|
Data Structure - Trie
Introduction
Trie (digital tree, radix tree or prefix tree) is a tree data structure
that is used to store a set of strings or string/value pairs.
Each node of the tree corresponds to a prefix of a string from the set.
The root of the tree represents an empty prefix.
All the descendants of a node share the common prefix associated with that node.
Trie can be used to efficiently perform the following operations:
- Check if a given string belongs to the set.
- Find all strings that start from a given prefix.
- Find all strings that match given pattern (wildcard or regular expression).
- Find all approximate matches from a given string, i.e. implement fuzzy search.
Trie is often used as a data structure to store dictionary of words. It is often found in
the following applications: full-text index, syntax correction programs, auto suggestion lists.
Nodes in the trie can have a lot of children. For a UCS-2 encoded strings, each node
might have up to 65536 children. This is unfortunate and in practical applications strings
are usually encoded using variable length encoding schemes. For example,
Hu-Tucker
encoding.
Ternary search tree is a balance between binary search tree and trie: it is
space efficient and supports the same operations as a trie. Each node in ternary search
tree can have up to three children.
Implementation
Both implemented data structures support fuzzy search based on
Levenshtein distance,
see LevenshteinMatcher class in Trie.cs.
Problems
Links
|
Trie build from words:
|
|