Extraction Algorithms¶
Binary search extractor (primary)¶
The main extractor infers one character at a time using binary search over printable ASCII bounds.
Core steps¶
- Build a per-position SQL expression for character code.
- Test midpoint predicates (
char_code >= mid). - Use timing significance result to choose upper/lower half.
- Iterate until bounds converge.
- Verify likely candidates and return final character.
Complexity¶
- Character search complexity:
O(log n)for range sizen. - For printable ASCII, this is typically a small bounded number of checks.
Traditional linear extractor (baseline)¶
The baseline implementation checks values sequentially:
- test
char_code == 32, then33, and so on, - stop when a match appears true by threshold rule.
Complexity is O(n) and acts as a comparison baseline in benchmarks.
Parallel extraction strategy¶
ParallelExtractor splits positions into chunks and extracts each chunk concurrently using a ThreadPoolExecutor.
Tradeoff:
- lower wall-clock time potential,
- higher instantaneous load and possible timing instability.
Use moderate worker counts and confirm reliability before scaling.