Tiny-ML Leaderboard
Sub-150M parameter language models, same eval harness, transparent methodology.
Detailed Results
Sort
| # | Model | Org | Params | Eff. ⚡ | WikiText-2 byte_ppl ↓ | BLiMP ↑ | ARC-Easy ↑ | Training Tokens | Released | Links |
|---|
Model Release Timeline
Most recent firstBenchmark Overview
BLiMP ↑
Higher is better
ARC-Easy ↑
Higher is better
WikiText-2 Byte-Level Perplexity ↓
Lower is better · byte_ppl = exp(−loglik / total_bytes) · bubble size = perplexity (smaller bubble = better)
Model Efficiency
Leaderboard Score vs ParamsParameters vs Leaderboard Score
Scatter of each model's overall score vs its parameter count. Points above the dashed threshold line are ≥1σ above the trend. Top 3 marked.
Avg trend
High-efficiency threshold
Outperforming zone
Models Released by Org
Add your model
Open a PR with your model's benchmark results and reproduction steps. We require: params, training data provenance, eval harness used, and scores for all three of the benchmarks using lm-eval harness.