Deep Dive: Search Algorithms (utils/rag.py)¶

Overview¶

The retriever combines multiple strategies to maximize accuracy and robustness for restaurant queries.

Strategies (Applied in Order)¶

Exact name match (fast path)
Fuzzy name match (SequenceMatcher with threshold ~0.6)
Cuisine/category match
Rating-focused search (prioritize 5.0 when asking for "best" or "maximum rating")
City-based filtering (Odessa vs Midland)
Partial word/substring match
Semantic vector retrieval (FAISS) as needed
Fallback: top-ranked by rank_score

Normalization¶

Lowercasing, punctuation stripping, whitespace collapsing
Typo tolerance via fuzzy similarity

Scoring¶

Base score + boosts:
5.0-star boost for "maximum/best" rating queries
Extra weight for higher review_count

Required Columns¶

The retriever ensures presence of: name, url, rating, review_count, city, address, categories, price, latitude, longitude, id, hours (missing ones filled with defaults).

Prompt Context (for LLM)¶

Candidate cards list with name, rating, price, reviews, address, categories, hours
Optional FAISS passages for citations
System rules: never invent businesses; use only provided info; cite when using passages

Why Multi-Strategy?¶

Real users type typos and partial names
Some queries are categorical ("best sushi in Odessa")
Others are statistical ("how many 5-star?")
Layered approach yields >95% overall accuracy in our tests