Synonymic Query Expansion for Smarter Search
âA user types âdoctorâ, but the data says âphysicianâ. Without expansion, itâs a missed connection.â
Letâs Start with the Problem
Youâve got a solid enterprise search system â indexed records, blazing fast, vector and keyword search blended together. But users still complain:
- âI searched for âattorneyâ but it didnât show âlawyerâ results.â
- âWhy does âAIâ return different results than âartificial intelligenceâ?â
Thatâs the invisible gap: semantic mismatch between what users type and how data is written.
And thatâs where synonymic query expansion steps in.
What Is Synonymic Query Expansion?
Itâs the technique of expanding a query with known synonyms before sending it to the search engine. Itâs one of the oldest tricks in information retrieval â and one of the most reliable for structured or semi-structured datasets.
For example:
User Query: "software engineer"
Expanded Query: "software engineer" OR "developer" OR "programmer"
You donât just search for what the user typed â you search for what they might have meant.
How It Works Under the Hood
A simplified flow looks like this:
- User input: âpediatricianâ
- Synonym resolver (LLM, lookup table, or hybrid) returns:
["child doctor", "kidâs physician", "children's healthcare"]
- Query construction:
("pediatrician" OR "child doctor" OR "kidâs physician" OR "children's healthcare")
- Search engine receives the expanded query and matches broader results.
Example with Elasticsearch DSL
{
"query": {
"bool": {
"should": [
{ "match": { "title": "pediatrician" }},
{ "match": { "title": "child doctor" }},
{ "match": { "title": "kidâs physician" }},
{ "match": { "title": "children's healthcare" }}
]
}
}
}
Or, with OpenSearch and vector search:
query_vector = embed("pediatrician")
synonyms = ["child doctor", "kidâs physician"]
expanded_vectors = [embed(term) for term in synonyms]
Now, Where Do Synonyms Come From?
You can:
- Use static dictionaries (WordNet, domain glossaries)
- Maintain a manual synonym map in config or SSM
- Use LLMs (e.g. âWhat are 3 synonyms for âsurgeonâ in healthcare domain?â)
- Leverage search logs (top co-clicked queries)
A good system often mixes all of the above.
Real-World Use Cases
- Healthcare search: âheart attackâ â âmyocardial infarctionâ
- E-commerce filters: âcouchâ â âsofaâ, âlounge chairâ
- Legal tools: âcontract breachâ â âviolation of agreementâ
- Resume search: âdeveloperâ â âsoftware engineerâ, âSDEâ, âbackend engineerâ
â ď¸ But Donât Go Wild
Query expansion has tradeoffs:
- â Expanding too far can reduce precision.
- â Bad synonyms can pollute results.
- â LLM-generated synonyms can be context-blind.
So you want guardrails:
- â Synonym whitelist per domain
- â Max expansion terms per query
- â Confidence thresholds from LLM or logs
Bonus: Hybrid Strategy
Can vector similarity fix this problem entirely? Sometimes, yes â especially if youâre using high-quality embeddings that understand semantic closeness. For example, a good embedding model will place âdoctorâ and âphysicianâ near each other in vector space.
But hereâs the catch:
- Vector search is fuzzy â itâs great at semantic proximity but doesnât always guarantee keyword-level coverage.
- You may still want exact matches for filters, sorting, or compliance-heavy use cases.
Thatâs why smart systems use a hybrid strategy:
- Keyword search + synonym expansion for speed and control
- Vector similarity to capture nuance and meaning
- LLMs for fallback or recovery when both fail
Itâs not about finding all the matches â itâs about not missing the obvious ones.
Closing Thoughts
You want users to find what they mean, not just what they type. Sometimes, the fastest way to improve search is teaching your system to speak the userâs language.
âA good search system doesnât just understand queries â it empathizes with them.â