LongMemEval-S Benchmark

Public Benchmarks

Transparent, reproducible evaluation of Tellodb against industry leaders on standard agent memory benchmarks.

90.5%

Overall Accuracy

<100ms

P95 Retrieval Latency

+61pt

vs Mem0 Overall

#2

Overall Ranking

LongMemEval-S Results (%)

ModelOverallSingle SessionTemporalPreferencesKnowledge UpdatesMulti-Session
★ Tellodb90.5%98.0%88.3%95.2%96.1%74.8%
HydraDB90.8%100.0%91.0%96.7%97.4%76.7%
Zep71.2%92.9%62.4%56.7%83.3%57.9%
Mem029.1%38.7%25.6%40.0%52.6%20.3%

Overall Score Comparison

Tellodb
90.5%
HydraDB
90.8%
Zep
71.2%
Mem0
29.1%

Methodology

Dataset Specification

LongMemEval-S benchmark containing 6 categories evaluating single/multi-session recall, temporal updates, and profile matching.

Standard Infrastructure

All benchmarks execute on matching single-node cloud environments (4 vCPU, 16 GB RAM) to enforce latency isolation.

Open Source & Auditable

Evaluation code is open and reproducible. Access the harness atgithub.com/sharjeel619/tellodb

Baseline Sources

Competitor score baselines are extracted directly from public published results and verified in our own local test rig. Last updated May 2026. Run locally with: cargo run --release --bench longmemeval.