Tiny-ML Leaderboard

Sub-150M parameter language models, same eval harness, transparent methodology.

Why this exists. The community deserves a single place to compare tiny LMs fairly. We include every model with verifiable benchmarks — ours, our competitors', yours. Submit a model via PR.
Unknown org? The "Unknown" tag is for model makers submitting their model ahead of official release. If that's you — DM glintresearch on Discord.

Detailed Results

Sort
# Model Org Params Eff. ⚡ WikiText-2 byte_ppl ↓ BLiMP ↑ ARC-Easy ↑ Training Tokens Released Links

Model Release Timeline

Most recent first

Benchmark Overview

BLiMP ↑

Higher is better

ARC-Easy ↑

Higher is better

WikiText-2 Byte-Level Perplexity ↓

Lower is better · byte_ppl = exp(−loglik / total_bytes) · bubble size = perplexity (smaller bubble = better)

Model Efficiency

Leaderboard Score vs Params

Parameters vs Leaderboard Score

Scatter of each model's overall score vs its parameter count. Points above the dashed threshold line are ≥1σ above the trend. Top 3 marked.

Avg trend High-efficiency threshold Outperforming zone

Models Released by Org

Add your model

Open a PR with your model's benchmark results and reproduction steps. We require: params, training data provenance, eval harness used, and scores for all three of the benchmarks using lm-eval harness.