Synonymic Query Expansion for Smarter Search

¡ 9 min

“A user types ‘doctor’, but the data says ‘physician’. Without expansion, it’s a missed connection.”

Let’s Start with the Problem

You’ve got a solid enterprise search system — indexed records, blazing fast, vector and keyword search blended together. But users still complain:

That’s the invisible gap: semantic mismatch between what users type and how data is written.

And that’s where synonymic query expansion steps in.

What Is Synonymic Query Expansion?

It’s the technique of expanding a query with known synonyms before sending it to the search engine. It’s one of the oldest tricks in information retrieval — and one of the most reliable for structured or semi-structured datasets.

For example:

User Query: "software engineer"
Expanded Query: "software engineer" OR "developer" OR "programmer"

You don’t just search for what the user typed — you search for what they might have meant.

How It Works Under the Hood

A simplified flow looks like this:

  1. User input: “pediatrician”
  2. Synonym resolver (LLM, lookup table, or hybrid) returns:
    ["child doctor", "kid’s physician", "children's healthcare"]
  3. Query construction:
    ("pediatrician" OR "child doctor" OR "kid’s physician" OR "children's healthcare")
  4. Search engine receives the expanded query and matches broader results.

Example with Elasticsearch DSL

{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "pediatrician" }},
        { "match": { "title": "child doctor" }},
        { "match": { "title": "kid’s physician" }},
        { "match": { "title": "children's healthcare" }}
      ]
    }
  }
}

Or, with OpenSearch and vector search:

query_vector = embed("pediatrician")
synonyms = ["child doctor", "kid’s physician"]
expanded_vectors = [embed(term) for term in synonyms]

Now, Where Do Synonyms Come From?

You can:

A good system often mixes all of the above.

Real-World Use Cases

⚠️ But Don’t Go Wild

Query expansion has tradeoffs:

So you want guardrails:

Bonus: Hybrid Strategy

Can vector similarity fix this problem entirely? Sometimes, yes — especially if you’re using high-quality embeddings that understand semantic closeness. For example, a good embedding model will place “doctor” and “physician” near each other in vector space.

But here’s the catch:

That’s why smart systems use a hybrid strategy:

It’s not about finding all the matches — it’s about not missing the obvious ones.

Closing Thoughts

You want users to find what they mean, not just what they type. Sometimes, the fastest way to improve search is teaching your system to speak the user’s language.

“A good search system doesn’t just understand queries — it empathizes with them.”