SeekStorm

search_benchmark_game

Search engine benchmark: SeekStorm, Tantivy, Lucene, …

Open-source search engine libraries (BM25 lexical search) are compared using the open-source search_benchmark_game developed by Tantivy and Jason Wolfe.

Detailled benchmark results for multifield search

Detailled benchmark results for singfield search

Benchmark code repository

Benchmark

Benchmark

Many benchmarks focus on search within a single field, but in real-world use cases, we often need to search through multiple fields at the same time.

How much this influences the performance, depends on the used data structures and algorithms of the search library.

Multi-field search at little additional cost is desirable, but can’t be taken for granted with every search library.

Often, the number of terms in the (e.g.) title field is negligible compared to the (e.g.) body field, yet searching through both fields can be expensive in some search libraries.

Btw., concatenating both fields before indexing is not a viable solution because it prohibits both field filtering and BM25F scoring, where matches in a shorter title field should receive a higher score.

Benefits of search_benchmark_game

About the results

Results will vary, depending on your hardware, OS, concurrent OS activity, JVM, evironment temperature, query set, document corpus…

Before making any decisions, you should always replicate the tests on your infrastructure for your specific workload.