Three kinds of drift quietly degrade retrieval quality between releases. Here's how to see them coming.
The classic semantic-search failure mode is not catastrophic. Nothing crashes. No alarm fires. The system you shipped six months ago is quietly worse this quarter than it was last quarter, and the only signal is a product manager pulling 200 queries into a spreadsheet on a Friday afternoon and scoring them by hand.
This is drift. Three kinds, all silent, all gradual, and the discipline that catches them is wiring metrics that fire on shape-of-the-data changes rather than on errors. The full chapter is Chapter 10 of Semantic Search in Production.
The documents in your index change. New product categories get added; old ones get retired; the writing style shifts because the content team turned over. The embedding model is the same; the queries are the same; the index is full of different stuff.
Symptom: recall@k on existing eval queries stays flat, but real user queries (especially navigational ones for new content) start missing.
Users start searching for things they didn't search for before. New product launches, seasonal patterns, the cultural moment shifting -- the queries hitting your system are not the queries you tested against at launch.
Symptom: your eval set looks healthy. Click-through rates on real traffic drop. Support tickets mention "search doesn't find X" where X is something nobody was looking for a year ago.
You upgrade the embedding model on the write path. Old vectors are still in the index from the old model. The two embedding spaces don't compare. Recall craters on the half of the corpus that's bilingual. See re-embedding strategy for the migration patterns.
Symptom: a step-function drop in retrieval quality timed to a deploy that "shouldn't have changed anything."
The hardest version is when all three are happening at once at small rates. Each contributes 1-2 percentage points of recall loss per month. Six months later you've lost 10+ points. No deploy caused it; no metric crossed a threshold; nobody noticed until the cumulative effect was unignorable.
The mitigation is wiring continuous monitoring that surfaces the gradual case, not just the catastrophic case.
None of these metrics should page on a single bad day. Drift is gradual; page-worthy drift is a sustained trend. The right alarm shape is "metric X has been Y% below its trailing 30-day average for N consecutive days." That catches sustained regression without firing on noise.
Semantic Search in Production Chapter 10 covers all three drift modes in depth, the monitoring stack with full metric definitions, the alarm-shape patterns, and the runbook for each drift mode with concrete steps.
Semantic Search in Production
The book on hybrid search and RAG retrieval. Twelve chapters. PDF + EPUB. Free updates as the field moves. $39 one-time, secure checkout.
Read more & buy $39 →Published by Yaw Labs.