Sooner or later, you change the embedding model. A new provider releases a better one. The dimensions you picked at launch are not the dimensions you want now. The model you're on gets deprecated. When that day comes, you need a re-embedding strategy -- and the worst time to design one is in the middle of a migration that's already broken.

The classic failure mode is silent: you switch the model, embed new content with the new model, leave old content embedded with the old model, and end up with a "bilingual" index where half the vectors live in one semantic space and half live in another. Recall craters. Nothing throws an error. Users notice.

This guide covers the three migration shapes and the discipline that lets you change models without surprise downtime. The full chapter is Chapter 10 of Semantic Search in Production.

Why re-embedding matters more than you think

Two embedding models -- even from the same provider -- produce vectors in different semantic spaces. A vector from text-embedding-ada-002 and a vector from text-embedding-3-small for the same text cannot be meaningfully compared. The cosine similarity is noise. Your index becomes unreliable in proportion to how mixed it is.

Reasons you'll re-embed:

Strategy 1: full rebuild

Embed everything with the new model. Atomic swap to the new index. Decommission the old.

Strategy 2: dual-index (shadow + cutover)

Keep the old index serving traffic. Build the new index in parallel. Run both for a period; compare results in a shadow mode. Once new beats old on your eval set, cut over.

Strategy 3: lazy re-embedding

Both old and new indexes live side by side. Query both, fuse results, lazily re-embed an old document the first time it appears in results. Eventually everything is migrated; you never paid a big up-front cost.

The migration plan you write before launch

The discipline is this: every embedding system in production should have a written re-embedding plan from day one. Not because you're planning to re-embed soon, but because the plan forces design decisions that make re-embedding possible later:

Detecting bilingual-index regressions

The symptom: recall@k on your eval set drops after a deploy that "shouldn't have changed anything." The cause is almost always that someone re-embedded part of the index with a different model (or different params) and the bilingual state is dragging quality down.

Wire two checks:

  1. Every vector record stores its model version.
  2. A nightly job aggregates the distribution of model versions in the index. Any drift from "100% one version" surfaces in a dashboard.

Want the full chapter?

Semantic Search in Production Chapter 10 covers all three strategies in depth, the dual-index implementation pattern with code, the lazy fusion question (and why it's harder than it looks), the drift-detection wiring, and the migration plan template you can adapt.

Semantic Search in Production

The book on hybrid search and RAG retrieval. Twelve chapters. PDF + EPUB. Free updates as the field moves. $39 one-time, secure checkout.

Read more & buy $39

Published by Yaw Labs.

Related