LongMemEval-S Benchmark
Public Benchmarks
Transparent, reproducible evaluation of Tellodb against industry leaders on standard agent memory benchmarks.
90.5%
Overall Accuracy
<100ms
P95 Retrieval Latency
+61pt
vs Mem0 Overall
#2
Overall Ranking
LongMemEval-S Results (%)
| Model | Overall | Single Session | Temporal | Preferences | Knowledge Updates | Multi-Session |
|---|---|---|---|---|---|---|
| ★ Tellodb | 90.5% | 98.0% | 88.3% | 95.2% | 96.1% | 74.8% |
| HydraDB | 90.8% | 100.0% | 91.0% | 96.7% | 97.4% | 76.7% |
| Zep | 71.2% | 92.9% | 62.4% | 56.7% | 83.3% | 57.9% |
| Mem0 | 29.1% | 38.7% | 25.6% | 40.0% | 52.6% | 20.3% |
Overall Score Comparison
Tellodb
90.5%
HydraDB
90.8%
Zep
71.2%
Mem0
29.1%
Methodology
Dataset Specification
LongMemEval-S benchmark containing 6 categories evaluating single/multi-session recall, temporal updates, and profile matching.
Standard Infrastructure
All benchmarks execute on matching single-node cloud environments (4 vCPU, 16 GB RAM) to enforce latency isolation.
Open Source & Auditable
Evaluation code is open and reproducible. Access the harness atgithub.com/sharjeel619/tellodb
Baseline Sources
Competitor score baselines are extracted directly from public published results and verified in our own local test rig. Last updated May 2026. Run locally with: cargo run --release --bench longmemeval.