|
Introduction
Simple Full-Text search engine with no external dependencies written in C#.
The aim of this project is to showcase algorithms, data structures and techniques that are used to create full-text search engines.
Source Repository
Query Language
- WORD(apple) - single word
- WILD(app*) - wildcard pattern
- EDIT(apple, 1) - Levenshtein (edit distance, fuzzy search)
Conjunction operators
- OR - boolean or
- AND - boolean and
- SEQ - sequence of words, phrase
Examples of queries:
- AND(WORD(apple), OR(WILD(a*), EDIT(apple, 1)))
- SEQ(WORD(hello), WORD(world))
Data Structures
- Dictionary of the persistent index is implemented using:
- Key-value storage for document metadata is based on persistent B-Tree implementation: B-Tree
- Packed Integers.
- Classic data structures:
Algorithms
- Approximate term matching is based on Levenshtein automaton.
- Query Language parser is implemented using recursive descent parser technique.
- A method for encoding/decoding occurrences uses:
- Classic algorithms:
- Binary search
- Heap sort
- Huffman Encoding
References
|