Semantic Search With Sentence Transformers and a Cross-Encoder Model

by Audrey M. Roy Greenfeld | Tue, Apr 15, 2025

Continuing the Sentence Transformers exploration, I use a cross-encoder model to rank my notebooks by similarity to search queries.


Setup

from fastcore.utils import *
from pathlib import Path
from sentence_transformers import CrossEncoder

Bi-Encoder vs. Cross-Encoder Model

A cross-encoder model takes 2 texts as input and outputs 1 similarity score. That means you can't precompute embeddings like I did with the bi-encoder model, but rather must use the cross-encoder model to generate similarities each time.

Aspect Bi-Encoder Cross-Encoder
Input/Output Encodes texts separately into embeddings Takes text pair, outputs similarity score
Accuracy Lower accuracy but sufficient for initial retrieval Higher accuracy for relevance ranking
Computational Cost More efficient (can pre-compute embeddings) More expensive (must process each text pair)
Scalability Good for large-scale retrieval Poor for large datasets
Use Case Initial retrieval from large corpus Re-ranking a small set of candidates
Storage Requires storing embeddings No embedding storage needed

Cross-encoders excel at precision, but are typically used after a bi-encoder has narrowed down search results to 10-100 documents. In my case, I have less than 100 notebooks on this site, so I can get away with using just a cross-encoder.

Download a Cross-Encoder Model

ce_model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2")

Get All Notebook Paths

We put each notebook to be searched into a list.

def get_nb_paths(): 
    root = Path() if IN_NOTEBOOK else Path("nbs/")
    return L(root.glob("*.ipynb")).sorted(reverse=True)
nb_paths = get_nb_paths()
def read_nb_simple(nb_path):
    with open(nb_path, 'r', encoding='utf-8') as f:
        return f.read()
nbs = L(nb_paths).map(read_nb_simple)

Search for a Test Query String

Let's search my notebooks for a test string.

q = "Web search"
hits = ce_model.rank(q, nbs, return_documents=False)
hits[:10]
def print_search_result(hit): print(f"{hit['score']} {nb_paths[hit['corpus_id']]}")
L(hits[:10]).map(print_search_result)

Those results seem not as good as those from the bi-encoder. Let's try another cross-encoder model.

Another Cross-Encoder: ms-marco-MiniLM-L12-v2

ce_model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L12-v2")
hits = ce_model.rank(q, nbs, return_documents=False)
L(hits[:10]).map(print_search_result)

Fascinating how "Web" is emphasized so much, rather than the idea of "Web search".

Another Cross-Encoder: ms-marco-TinyBERT-L2-v2

ce_model = CrossEncoder("cross-encoder/ms-marco-TinyBERT-L2-v2")
hits = ce_model.rank(q, nbs, return_documents=False)
L(hits[:10]).map(print_search_result)

This seems the best! I like this ranking.

Reflection

After experimenting with a few cross-encoder models, I found that the TinyBERT model (cross-encoder/ms-marco-TinyBERT-L2-v2) gave the most intuitive results out of both the cross-encoder and bi-encoder models.

It seemed to understand the semantic relationship between "Web search" and my notebooks about search functionality better than the larger models.