LongMemEval-S Benchmark

Public Benchmarks

Transparent, reproducible evaluation of TelloDB against industry leaders on standard agent memory benchmarks.

90.5%

Overall Accuracy

<100ms

P95 Retrieval Latency

+61pt

vs Mem0 Overall

Overall Ranking

LongMemEval-S Results (%)

Model	Overall	Single Session	Temporal	Preferences	Knowledge Updates	Multi-Session
★ TelloDB	90.5%	98.0%	88.3%	95.2%	96.1%	74.8%
HydraDB	90.8%	100.0%	91.0%	96.7%	97.4%	76.7%
Zep	71.2%	92.9%	62.4%	56.7%	83.3%	57.9%
Mem0	29.1%	38.7%	25.6%	40.0%	52.6%	20.3%

Overall Score Comparison

TelloDB

90.5%

HydraDB

90.8%

Zep

71.2%

Mem0

29.1%

Methodology

Dataset Specification

LongMemEval-S benchmark containing 6 categories evaluating single/multi-session recall, temporal updates, and profile matching.

Standard Infrastructure

All benchmarks execute on matching single-node cloud environments (4 vCPU, 16 GB RAM) to enforce latency isolation.

Open Source & Auditable

Evaluation code is open and reproducible. Access the harness atgithub.com/sharjeel619/tellodb

Baseline Sources

Competitor score baselines are extracted directly from public published results and verified in our own local test rig. Last updated May 2026. Run locally with: cargo run --release --bench longmemeval.