Re-Embedding Strategy: A Practical Guide (2026)

Sooner or later, you change the embedding model. A new provider releases a better one. The dimensions you picked at launch are not the dimensions you want now. The model you're on gets deprecated. When that day comes, you need a re-embedding strategy -- and the worst time to design one is in the middle of a migration that's already broken.

The classic failure mode is silent: you switch the model, embed new content with the new model, leave old content embedded with the old model, and end up with a "bilingual" index where half the vectors live in one semantic space and half live in another. Recall craters. Nothing throws an error. Users notice.

This guide covers the three migration shapes and the discipline that lets you change models without surprise downtime. The full chapter is Chapter 10 of Semantic Search in Production.

Why re-embedding matters more than you think

Two embedding models -- even from the same provider -- produce vectors in different semantic spaces. A vector from text-embedding-ada-002 and a vector from text-embedding-3-small for the same text cannot be meaningfully compared. The cosine similarity is noise. Your index becomes unreliable in proportion to how mixed it is.

Reasons you'll re-embed:

The model you're on gets deprecated (this has already happened to ada-002 users).
A better model ships and the quality delta justifies the cost.
You change dimensions (smaller for cost, larger for quality).
You move providers entirely.
Your domain changes and a fine-tuned or different base model fits better.

Strategy 1: full rebuild

Embed everything with the new model. Atomic swap to the new index. Decommission the old.

Pros: simple. Clean cutover. No bilingual-index window.
Cons: embedding cost up front (real money on large corpora). Time-to-finish (hours to days). If you discover a problem mid-rebuild you have to start over.
Right for: small-to-medium corpora (under a few million docs). Cases where you have downtime budget. The default unless you have a reason not to.

Strategy 2: dual-index (shadow + cutover)

Keep the old index serving traffic. Build the new index in parallel. Run both for a period; compare results in a shadow mode. Once new beats old on your eval set, cut over.

Pros: no downtime. You can validate the new index against real traffic before users see it. Rollback is trivial -- the old index is still warm.
Cons: double the storage cost during the window. Double the index-write cost during the window. More code complexity (two write paths, two read paths).
Right for: large corpora where rebuild downtime is unacceptable. Cases where the new model is unfamiliar and you want to validate quality first.

Strategy 3: lazy re-embedding

Both old and new indexes live side by side. Query both, fuse results, lazily re-embed an old document the first time it appears in results. Eventually everything is migrated; you never paid a big up-front cost.

Pros: spreads cost over time. No big-bang migration. Survives partial failure -- you can pause and resume.
Cons: you live in the bilingual state forever. Fusion is hard (the two vectors are not comparable, so you can't just merge scores). Quality during the transition is mixed.
Right for: truly massive corpora where rebuild is infeasible. Cases where the new and old models are similar enough that bilingual results are tolerable.

The migration plan you write before launch

The discipline is this: every embedding system in production should have a written re-embedding plan from day one. Not because you're planning to re-embed soon, but because the plan forces design decisions that make re-embedding possible later:

Store the source text alongside the vector. You will need it to re-embed. If you only stored the vector, your old corpus is unreachable.
Version your vectors. Record which model + which version produced each vector. Without this, you can't tell what's bilingual.
Provision for double storage. Your storage tier should not be a blocker the first time you need a dual-index migration.
Wire metrics that detect drift. Recall@k against your golden set should fire an alarm before users notice, not after.

Detecting bilingual-index regressions

The symptom: recall@k on your eval set drops after a deploy that "shouldn't have changed anything." The cause is almost always that someone re-embedded part of the index with a different model (or different params) and the bilingual state is dragging quality down.

Wire two checks:

Every vector record stores its model version.
A nightly job aggregates the distribution of model versions in the index. Any drift from "100% one version" surfaces in a dashboard.

Want the full chapter?

Semantic Search in Production Chapter 10 covers all three strategies in depth, the dual-index implementation pattern with code, the lazy fusion question (and why it's harder than it looks), the drift-detection wiring, and the migration plan template you can adapt.

Semantic Search in Production

The book on hybrid search and RAG retrieval. Twelve chapters. PDF + EPUB. Free updates as the field moves. Free with a Token Limit News signup.

Read it free →

Published by Yaw Labs.

Semantic Search in Production -- the book.
Embedding Drift
Retrieval Eval
Hybrid Search