Table of Contents
A practical follow-on in my Mojo tooling journey: building mojo-benchsuite to make project-level benchmarking repeatable, comparable, and CI-friendly.
Exec Summary
For technical and business leaders: mojo-benchsuite complements Mojo’s low-level stdlib benchmarking with suite-level workflow needed in real projects.
- Not a replacement for stdlib
benchmark: it sits one level higher, orchestrating suites and reports. - More decision-useful metrics: includes p50/p95/p99 percentiles, total runtime, and loops-per-sample.
- Built for operational use: baseline save/compare, threshold-based regression checks, and JSON summaries for CI.
- Better signal quality: practical, longer-running workloads reduce noise compared with tiny synthetic micro-ops.
The outcome is straightforward: benchmarking becomes part of normal engineering workflow, not a one-off activity.
Context
My earlier Mojo package work focused on core building blocks (mojo-dotenv, mojo-toml, mojo-asciichart).
This time, I focused on the workflow layer around performance measurement.
Mojo already provides strong primitive benchmarking tools. What teams quickly need next is:
- run many benchmarks consistently,
- persist outputs for later comparison,
- detect regressions without overreacting to routine variance,
- and surface results in forms suitable for both humans and automation.
That is exactly the gap mojo-benchsuite addresses.
Why this matters in practice
Once a project has more than a handful of benchmarks, ad-hoc measurement starts to break down.
Typical pain points:
- benchmark files drift without a consistent suite runner;
- “mean-only” numbers hide tail behaviour;
- teams cannot reliably compare today’s run with last month’s baseline;
- CI cannot tell meaningful regressions from random jitter.
mojo-benchsuite targets those problems directly.
What was added
1) Improved measurement model
- explicit warmup, calibration, and sampling phases;
- calibrated
loops_per_samplefor low-noise per-operation timing; - richer result model: mean, min, max, p50, p95, p99, total runtime.
2) Reporting for different audiences
- console output for day-to-day development;
- Markdown reports for release notes and reviews;
- CSV reports for further analysis;
- summary JSON for CI and downstream automation.
3) Baseline and regression workflow
- save named baselines;
- compare current run to a baseline;
- apply configurable regression thresholds;
- optionally fail CI on material regressions.
4) Practical benchmark suites and examples
Beyond microbenchmarks, the repo now includes more realistic scenarios:
- configuration-style workloads,
- data-transform workloads,
- startup-cost shaped workloads,
- longer-running examples (including a GPU example path).
How to use it quickly
git clone https://github.com/databooth/mojo-benchsuite
cd mojo-benchsuite
pixi install
pixi run bench-all
Save and compare a baseline:
pixi run bench-save-baseline
pixi run bench-compare-baseline
Produce machine-readable summary:
pixi run bench-summaryDesign principles
- Explicit over magical: benchmark registration remains clear in each suite.
- Practical over theoretical: optimise for repeatable engineering workflow.
- Composable outputs: humans read console/Markdown, systems consume JSON/CSV.
- Signal over noise: favour percentile and workload realism over single-number vanity metrics.
What’s next
Two likely directions:
- release/publishing hardening to match other published Mojo packages;
- a potential convenience
BenchSuiteobject for registration ergonomics, while retaining the current explicit API.
Open for comments
This post and package are intentionally shared early for feedback from the Mojo community.
- What benchmark workflows are missing from your current projects?
- Which CI regression strategies have worked (or failed) for your teams?
- Should the next step prioritise packaging/release flow, or API ergonomics first?
If you have views, I’d really value your comments below.
Project links:
Building high-performance data and AI services with Mojo at DataBooth. Questions or want to collaborate? Get in touch.