Every embedding system needs a re-embedding plan before launch. Here are the three shapes and the failure mode that does not throw an error.
Sooner or later, you change the embedding model. A new provider releases a better one. The dimensions you picked at launch are not the dimensions you want now. The model you're on gets deprecated. When that day comes, you need a re-embedding strategy -- and the worst time to design one is in the middle of a migration that's already broken.
The classic failure mode is silent: you switch the model, embed new content with the new model, leave old content embedded with the old model, and end up with a "bilingual" index where half the vectors live in one semantic space and half live in another. Recall craters. Nothing throws an error. Users notice.
This guide covers the three migration shapes and the discipline that lets you change models without surprise downtime. The full chapter is Chapter 10 of Semantic Search in Production.
Two embedding models -- even from the same provider -- produce vectors in different semantic spaces. A vector from text-embedding-ada-002 and a vector from text-embedding-3-small for the same text cannot be meaningfully compared. The cosine similarity is noise. Your index becomes unreliable in proportion to how mixed it is.
Reasons you'll re-embed:
Embed everything with the new model. Atomic swap to the new index. Decommission the old.
Keep the old index serving traffic. Build the new index in parallel. Run both for a period; compare results in a shadow mode. Once new beats old on your eval set, cut over.
Both old and new indexes live side by side. Query both, fuse results, lazily re-embed an old document the first time it appears in results. Eventually everything is migrated; you never paid a big up-front cost.
The discipline is this: every embedding system in production should have a written re-embedding plan from day one. Not because you're planning to re-embed soon, but because the plan forces design decisions that make re-embedding possible later:
The symptom: recall@k on your eval set drops after a deploy that "shouldn't have changed anything." The cause is almost always that someone re-embedded part of the index with a different model (or different params) and the bilingual state is dragging quality down.
Wire two checks:
Semantic Search in Production Chapter 10 covers all three strategies in depth, the dual-index implementation pattern with code, the lazy fusion question (and why it's harder than it looks), the drift-detection wiring, and the migration plan template you can adapt.
Semantic Search in Production
The book on hybrid search and RAG retrieval. Twelve chapters. PDF + EPUB. Free updates as the field moves. $39 one-time, secure checkout.
Read more & buy $39 →Published by Yaw Labs.