What is TurboVec?

TurboVec is a high-performance vector indexing library written in Rust with Python bindings, built on top of Google Research's TurboQuant quantization algorithm. Its core mission is tackling two major pain points in RAG (Retrieval-Augmented Generation) systems: memory usage and search speed.

Why Do We Need TurboVec?

In traditional vector search scenarios, if you have a corpus of 10 million documents stored as float32 vectors, you'd need roughly 31 GB of RAM. TurboVec uses data-oblivious quantization to squeeze the same dataset down to just 4 GB — an 87% reduction in memory usage — while actually beating FAISS on search speed.

TurboVec vs FAISS

Feature FAISS TurboVec
Memory Usage High (float32 or PQ quantization) Ultra-low (TurboQuant quantization)
Training Required ✅ Needs a training phase ❌ No training, plug-and-play
Online Vector Addition ⚠️ Requires index rebuild ✅ Real-time addition, no rebuild needed
Filtered Search Post-processing needed ✅ Kernel-level support, zero performance cost
SIMD Optimization Yes Hand-written NEON (ARM) + AVX-512BW (x86)
Fully Local
Python Bindings
Rust Native Support

Core Technology: TurboQuant Algorithm

TurboQuant is a novel quantization algorithm introduced by Google Research in 2025. Its key features include:

  • Data-Oblivious: No need to train a codebook on a specific dataset, eliminating the overhead of traditional PQ (Product Quantization) training
  • Approaching Shannon's Lower Bound: Nears the theoretical optimum in distortion
  • Flexible Bit Width: Supports 2-bit, 4-bit, and 8-bit quantization, letting you balance recall and memory usage

In benchmarks, TurboVec outperforms FAISS IndexPQFastScan by 12-20% on ARM architecture, and matches or slightly edges it out on x86.


Installing TurboVec

TurboVec provides interfaces for both Python and Rust.

Python Installation

pip install turbovec

Need integration with LangChain, LlamaIndex, or other frameworks? Install the corresponding extras:

# LangChain integration
pip install turbovec[langchain]

# LlamaIndex integration
pip install turbovec[llama-index]

# Haystack integration
pip install turbovec[haystack]

# Agno integration
pip install turbovec[agno]

Rust Installation

Add the dependency to your Cargo.toml:

[dependencies]
turbovec = "0.1"

Quick Start: Python Basics

Creating an Index and Adding Vectors

import numpy as np
from turbovec import TurboQuantIndex

# Create an index: 1536 dimensions (OpenAI embedding default), 4-bit quantization
index = TurboQuantIndex(dim=1536, bit_width=4)

# Generate sample vectors (replace with your actual embeddings)
vectors = np.random.rand(10000, 1536).astype(np.float32)

# Add vectors to the index
index.add(vectors)

# Keep adding more vectors — no rebuild needed
more_vectors = np.random.rand(5000, 1536).astype(np.float32)
index.add(more_vectors)

print(f"Index contains {len(index)} vectors")
# Generate a query vector
query = np.random.rand(1536).astype(np.float32)

# Search for the 10 most similar vectors
scores, indices = index.search(query, k=10)

print("Similarity scores:", scores)
print("Vector indices:", indices)

Persistence: Save and Load

# Save index to disk
index.write("my_index.tq")

# Load index from disk
loaded_index = TurboQuantIndex.load("my_index.tq")

# Verify it works
scores, indices = loaded_index.search(query, k=10)

Advanced Feature 1: External ID Mapping

In real-world apps, you usually need to link vector indices to document IDs in your database. TurboVec provides IdMapIndex for exactly this.

Adding Vectors with IDs

import numpy as np
from turbovec import IdMapIndex

# Create an index that supports external IDs
index = IdMapIndex(dim=1536, bit_width=4)

# Suppose you have 3 vectors with external IDs 1001, 1002, 1003
vectors = np.random.rand(3, 1536).astype(np.float32)
external_ids = np.array([1001, 1002, 1003], dtype=np.uint64)

# Add vectors along with their external IDs
index.add_with_ids(vectors, external_ids)

# Search returns external IDs, not internal indices
query = np.random.rand(1536).astype(np.float32)
scores, ids = index.search(query, k=10)

print("Returned external IDs:", ids)  # [1001, 1003, 1002, ...]

Removing Vectors

IdMapIndex supports O(1) removal by external ID:

# Remove the vector with ID 1002
index.remove(1002)

# Search again — 1002 won't show up
scores, ids = index.search(query, k=10)
print("IDs after removal:", ids)  # 1002 no longer appears

Persisting ID-Mapped Indices

# Save
index.write("my_index.tvim")

# Load
loaded_index = IdMapIndex.load("my_index.tvim")

This is one of TurboVec's standout features. In traditional vector databases, if you want results limited to a specific tenant or time range, you typically search for a large pool of candidates and then filter on the application side — which hurts recall and wastes performance.

TurboVec supports filtering at the kernel level via the allowlist parameter. The SIMD kernel skips disallowed slots directly during computation.

Scenario: Multi-Tenant RAG System

Say you're building a multi-tenant RAG system where each tenant can only access their own documents:

import numpy as np
from turbovec import IdMapIndex

# Create the index
idx = IdMapIndex(dim=1536, bit_width=4)

# Say we have 10,000 vectors, each mapped to a document ID
vectors = np.random.rand(10000, 1536).astype(np.float32)
doc_ids = np.arange(1, 10001, dtype=np.uint64)
idx.add_with_ids(vectors, doc_ids)

# Simulate a DB query to get tenant A's document IDs
# In real life, this would come from PostgreSQL / MySQL
tenant_a_docs = np.array([1, 5, 10, 15, 20, 25, 30, 35, 40, 45], dtype=np.uint64)

# Search within tenant A's document scope
query = np.random.rand(1536).astype(np.float32)
scores, ids = idx.search(query, k=5, allowlist=tenant_a_docs)

print("Tenant A's most relevant docs:", ids)
# Output will only include IDs from tenant_a_docs

Performance Benefits

Filtering happens inside the SIMD kernel using a 32-vector block granularity short-circuit mechanism:

  • If a block has zero allowed slots, the entire block's LUT lookup and scoring are skipped
  • If a block has some allowed slots, only those slots get scored
  • For highly selective filters (a small fraction of total IDs allowed), most SIMD computation is avoided entirely

The output length is min(k, len(allowlist)). When the allowlist is shorter than k, you get exactly len(allowlist) results — no junk padding.


Integration with Major RAG Frameworks

TurboVec integrates seamlessly with LangChain, LlamaIndex, Haystack, and Agno — just swap out the import.

LangChain Integration

from langchain_community.vectorstores import TurboVec
from langchain_openai import OpenAIEmbeddings

# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create a TurboVec vector store
vector_store = TurboVec.from_documents(
    documents=documents,  # Your list of Documents
    embedding=embeddings,
    bit_width=4  # 4-bit quantization
)

# Similarity search
results = vector_store.similarity_search("your question", k=5)

# Persist
vector_store.save_local("turbovec_index")

# Load
loaded_store = TurboVec.load_local("turbovec_index", embeddings)

LlamaIndex Integration

from llama_index.vector_stores.turbovec import TurboVecVectorStore
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Create a TurboVec vector store
vector_store = TurboVecVectorStore(dim=1536, bit_width=4)

# Create the index
index = VectorStoreIndex.from_documents(
    documents,
    vector_store=vector_store
)

# Query engine
query_engine = index.as_query_engine()
response = query_engine.query("your question")
print(response)

Haystack Integration

from haystack_integrations.document_stores.turbovec import TurboVecDocumentStore
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack import Pipeline

# Create document store
document_store = TurboVecDocumentStore(dim=768, bit_width=4)

# Embedder
embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

# Build pipeline
pipeline = Pipeline()
pipeline.add_component("embedder", embedder)
# ... add more components

Rust Native Usage

If you're building a high-performance backend service in Rust, you can use TurboVec's native Rust API directly.

Basic Usage

use turbovec::TurboQuantIndex;
use ndarray::Array2;

fn main() {
    // Create index: 1536 dimensions, 4-bit quantization
    let mut index = TurboQuantIndex::new(1536, 4);

    // Prepare vector data (random example)
    let vectors = Array2::<f32>::random((10000, 1536), &mut rand::thread_rng());

    // Add vectors
    index.add(&vectors);

    // Prepare query vector
    let query = Array1::<f32>::random(1536, &mut rand::thread_rng());

    // Search for 10 most similar vectors
    let results = index.search(&query, 10);

    println!("Top 10 results: {:?}", results);

    // Persist
    index.write("index.tv").unwrap();

    // Load
    let loaded = TurboQuantIndex::load("index.tv").unwrap();
}

Index with External IDs

use turbovec::IdMapIndex;

fn main() {
    let mut index = IdMapIndex::new(1536, 4);

    let vectors = Array2::<f32>::random((100, 1536), &mut rand::thread_rng());
    let ids = vec![1001u64, 1002, 1003, /* ... */];

    index.add_with_ids(&vectors, &ids);

    let query = Array1::<f32>::random(1536, &mut rand::thread_rng());
    let (scores, returned_ids) = index.search(&query, 10);

    println!("Returned external IDs: {:?}", returned_ids);

    // Remove
    index.remove(1002);

    // Persist
    index.write("index.tvim").unwrap();
    let loaded = IdMapIndex::load("index.tvim").unwrap();
}

Performance Benchmarks

According to official benchmarks, here's how TurboVec performs across different datasets and bit widths:

Recall Comparison (TurboQuant vs FAISS IndexPQ)

Test setup: 100K vectors, k=64

Dataset Bit Width TurboVec R@1 FAISS R@1 Advantage
GloVe d=200 4-bit +0.3 pts Baseline TurboVec higher
OpenAI d=1536 2-bit +0.4 pts Baseline TurboVec higher
OpenAI d=1536 4-bit +1.2 pts Baseline TurboVec higher
OpenAI d=3072 4-bit +3.4 pts Baseline TurboVec significantly higher

Across all tests, both converge to recall 1.0 at k=4.

Speed Comparison

  • ARM (NEON): TurboVec is 12-20% faster than FAISS IndexPQFastScan
  • x86 (AVX-512BW): TurboVec matches or slightly beats FAISS

Memory Usage

Vector Count Dimensions float32 Memory TurboVec (4-bit) Memory Savings
10 million 1536 ~31 GB ~4 GB 87%
1 million 1536 ~3.1 GB ~400 MB 87%
100K 1536 ~310 MB ~40 MB 87%

Hands-On: Building a Local RAG System

Here's a complete example showing how to build a fully local RAG system with TurboVec — no cloud services needed.

Environment Setup

pip install turbovec sentence-transformers langchain-community langchain-openai

Full Code

import numpy as np
from turbovec import IdMapIndex
from sentence_transformers import SentenceTransformer
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load embedding model (runs locally, no API key needed)
print("Loading embedding model...")
model = SentenceTransformer('all-MiniLM-L6-v2')  # 384 dimensions
dim = 384

# 2. Load and split documents
print("Loading documents...")
loader = TextLoader('./data/my_documents.txt', encoding='utf-8')
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
chunks = splitter.split_documents(documents)

print(f"Split into {len(chunks)} chunks")

# 3. Generate embeddings and build the index
print("Generating embeddings and building index...")
index = IdMapIndex(dim=dim, bit_width=4)

texts = [chunk.page_content for chunk in chunks]
metadata_list = [chunk.metadata for chunk in chunks]

# Batch generate embeddings
embeddings = model.encode(texts, show_progress_bar=True)

# Add vectors with metadata (using index as external ID)
external_ids = np.arange(len(texts), dtype=np.uint64)
index.add_with_ids(embeddings.astype(np.float32), external_ids)

# Save the index
index.write("rag_index.tvim")
print("Index saved")

# 4. Search function
def search(query: str, k: int = 5):
    """Search for the most relevant chunks"""
    # Load the index
    idx = IdMapIndex.load("rag_index.tvim")

    # Generate query embedding
    query_embedding = model.encode([query])[0].astype(np.float32)

    # Search
    scores, ids = idx.search(query_embedding, k=k)

    # Return results
    results = []
    for score, doc_id in zip(scores, ids):
        doc_id = int(doc_id)
        results.append({
            'text': texts[doc_id],
            'score': float(score),
            'metadata': metadata_list[doc_id]
        })

    return results

# 5. Test search
if __name__ == "__main__":
    query = "What is TurboVec?"
    results = search(query, k=3)

    print("\n=== Search Results ===")
    for i, result in enumerate(results, 1):
        print(f"\n[{i}] Similarity: {result['score']:.4f}")
        print(f"Content: {result['text'][:200]}...")

Adding Filter Support

If your documents have category tags, you can filter during search:

def search_with_filter(query: str, category: str, k: int = 5):
    """Search with category filter"""
    idx = IdMapIndex.load("rag_index.tvim")

    # Find document IDs belonging to the specified category
    allowed_ids = np.array([
        i for i, meta in enumerate(metadata_list)
        if meta.get('category') == category
    ], dtype=np.uint64)

    if len(allowed_ids) == 0:
        return []

    # Generate query embedding
    query_embedding = model.encode([query])[0].astype(np.float32)

    # Search with filter
    scores, ids = idx.search(query_embedding, k=k, allowlist=allowed_ids)

    results = []
    for score, doc_id in zip(scores, ids):
        doc_id = int(doc_id)
        results.append({
            'text': texts[doc_id],
            'score': float(score),
            'metadata': metadata_list[doc_id]
        })

    return results

# Search only documents in the "tech" category
results = search_with_filter("How to install?", category="tech", k=3)

FAQ

Q1: What scenarios is TurboVec good for?

  • Memory-constrained environments: store large-scale vector indices in limited RAM
  • Privacy-sensitive applications: data can't leave your local machine or VPC
  • Dynamically growing corpora: frequent vector additions with no tolerance for rebuild overhead
  • Multi-tenant RAG: fine-grained permission filtering during search

Q2: What scenarios is TurboVec not ideal for?

  • Massive distributed search: if you need to shard indices across multiple machines, consider Milvus, Weaviate, or other distributed vector databases
  • Complex metadata filtering: TurboVec currently only supports ID-based filtering; complex metadata queries need application-level handling

Q3: How do I choose the bit width?

  • 4-bit: recommended default — great balance between recall and memory
  • 2-bit: maximum compression, slight recall trade-off, ideal for extremely memory-constrained scenarios
  • 8-bit: highest precision, roughly double the memory of 4-bit, recall close to float32

Q4: Can TurboVec and FAISS be used together?

Absolutely. You could use FAISS for coarse search and TurboVec for re-ranking, or the other way around. Their API design philosophies differ and they complement each other well.


Summary

TurboVec is a vector search library worth watching. Powered by the TurboQuant algorithm, it strikes an excellent balance across memory usage, search speed, and ease of use:

  • 87% less memory: 10M vectors go from 31GB down to 4GB
  • Faster than FAISS: 12-20% faster on ARM, neck-and-neck on x86
  • No training needed: plug-and-play with online vector addition
  • Kernel-level filtering: ideal for multi-tenant RAG
  • Ecosystem integration: seamless with LangChain, LlamaIndex, Haystack

If your RAG project is hitting a memory bottleneck or needs fully local deployment, give TurboVec a try.

Project repo: https://github.com/RyanCodrai/turbovec

Paper: TurboQuant: Data-Oblivious Vector Quantization