Search Engines¶

Vajra provides three search engine implementations for different use cases.

Engine Comparison¶

Engine	Best For	Features
`VajraSearch`	Small corpora (<1K docs)	Pure Python, educational
`VajraSearchOptimized`	Production use	Sparse matrices, caching, fastest
`VajraSearchParallel`	Batch processing	Multi-threaded, high throughput

VajraSearch (Base)¶

The reference implementation using pure categorical abstractions.

from vajra_bm25 import VajraSearch, DocumentCorpus

corpus = DocumentCorpus.load_jsonl("small_corpus.jsonl")
engine = VajraSearch(corpus)

results = engine.search("query terms", top_k=10)

Characteristics:

Pure Python implementation
Clear categorical structure
Good for learning and small datasets
No external dependencies

VajraSearchOptimized (Recommended)¶

Production-ready engine with vectorized operations.

from vajra_bm25 import VajraSearchOptimized, DocumentCorpus

corpus = DocumentCorpus.load_jsonl("large_corpus.jsonl")

engine = VajraSearchOptimized(
    corpus,
    k1=1.5,           # BM25 k1 parameter
    b=0.75,           # BM25 b parameter
    cache_size=1000,  # LRU cache for repeated queries
    use_eager=True,   # Pre-compute score matrix
)

results = engine.search("query terms", top_k=10)

Characteristics:

Sparse matrix operations (CSR format)
NumPy/SciPy vectorization
Optional Numba JIT compilation
LRU query caching
Eager score pre-computation

Automatic Mode Selection¶

The engine automatically selects the best mode:

< 10K documents: Dense matrices
≥ 10K documents: Sparse matrices (99%+ memory savings)

Configuration Options¶

engine = VajraSearchOptimized(
    corpus,
    # BM25 Parameters
    k1=1.5,              # Term frequency saturation (default: 1.5)
    b=0.75,              # Length normalization (default: 0.75)

    # Performance Options
    use_sparse=True,     # Force sparse matrices (auto-detected)
    use_eager=True,      # Pre-compute BM25 scores at index time
    cache_size=1000,     # Query result cache size (0 to disable)
)

VajraSearchParallel¶

For high-throughput batch processing.

from vajra_bm25 import VajraSearchParallel, DocumentCorpus

corpus = DocumentCorpus.load_jsonl("corpus.jsonl")

engine = VajraSearchParallel(
    corpus,
    max_workers=4,  # Number of parallel workers
)

# Process multiple queries in parallel
queries = ["machine learning", "deep learning", "neural networks"]
batch_results = engine.search_batch(queries, top_k=10)

for query, results in zip(queries, batch_results):
    print(f"Query: {query}")
    for r in results:
        print(f"  {r.rank}. {r.document.title}")

Characteristics:

Thread-based parallelism
Optimized for batch queries
Higher throughput, slightly higher latency per query
Memory overhead from worker threads

Choosing the Right Engine¶

┌─────────────────────────────────────────────────────────────┐
│                    Which Engine?                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Documents < 1,000?                                          │
│      └── Yes → VajraSearch (simple, educational)             │
│      └── No  ↓                                               │
│                                                              │
│  Need batch processing?                                      │
│      └── Yes → VajraSearchParallel                           │
│      └── No  ↓                                               │
│                                                              │
│  Otherwise → VajraSearchOptimized (recommended)              │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Performance Comparison¶

At 500K documents:

Engine	Single Query	Batch (100 queries)
VajraSearchOptimized	1.89ms	189ms
VajraSearchParallel (4 workers)	2.5ms	95ms

Use VajraSearchOptimized for single queries, VajraSearchParallel for batch workloads.