A practical follow-on in my Mojo tooling journey: building mojo-benchsuite to make project-level benchmarking repeatable, comparable, and CI-friendly.

Exec Summary

For technical and business leaders: mojo-benchsuite complements Mojo’s low-level stdlib benchmarking with suite-level workflow needed in real projects.

  • Not a replacement for stdlib benchmark: it sits one level higher, orchestrating suites and reports.
  • More decision-useful metrics: includes p50/p95/p99 percentiles, total runtime, and loops-per-sample.
  • Built for operational use: baseline save/compare, threshold-based regression checks, and JSON summaries for CI.
  • Better signal quality: practical, longer-running workloads reduce noise compared with tiny synthetic micro-ops.

The outcome is straightforward: benchmarking becomes part of normal engineering workflow, not a one-off activity.


Context

My earlier Mojo package work focused on core building blocks (mojo-dotenv, mojo-toml, mojo-asciichart).
This time, I focused on the workflow layer around performance measurement.

Mojo already provides strong primitive benchmarking tools. What teams quickly need next is:

  • run many benchmarks consistently,
  • persist outputs for later comparison,
  • detect regressions without overreacting to routine variance,
  • and surface results in forms suitable for both humans and automation.

That is exactly the gap mojo-benchsuite addresses.

Why this matters in practice

Once a project has more than a handful of benchmarks, ad-hoc measurement starts to break down.

Typical pain points:

  1. benchmark files drift without a consistent suite runner;
  2. “mean-only” numbers hide tail behaviour;
  3. teams cannot reliably compare today’s run with last month’s baseline;
  4. CI cannot tell meaningful regressions from random jitter.

mojo-benchsuite targets those problems directly.

What was added

1) Improved measurement model

  • explicit warmup, calibration, and sampling phases;
  • calibrated loops_per_sample for low-noise per-operation timing;
  • richer result model: mean, min, max, p50, p95, p99, total runtime.

2) Reporting for different audiences

  • console output for day-to-day development;
  • Markdown reports for release notes and reviews;
  • CSV reports for further analysis;
  • summary JSON for CI and downstream automation.

3) Baseline and regression workflow

  • save named baselines;
  • compare current run to a baseline;
  • apply configurable regression thresholds;
  • optionally fail CI on material regressions.

4) Practical benchmark suites and examples

Beyond microbenchmarks, the repo now includes more realistic scenarios:

  • configuration-style workloads,
  • data-transform workloads,
  • startup-cost shaped workloads,
  • longer-running examples (including a GPU example path).

How to use it quickly

git clone https://github.com/databooth/mojo-benchsuite
cd mojo-benchsuite
pixi install
pixi run bench-all

Save and compare a baseline:

pixi run bench-save-baseline
pixi run bench-compare-baseline

Produce machine-readable summary:

pixi run bench-summary

Design principles

  1. Explicit over magical: benchmark registration remains clear in each suite.
  2. Practical over theoretical: optimise for repeatable engineering workflow.
  3. Composable outputs: humans read console/Markdown, systems consume JSON/CSV.
  4. Signal over noise: favour percentile and workload realism over single-number vanity metrics.

What’s next

Two likely directions:

  • release/publishing hardening to match other published Mojo packages;
  • a potential convenience BenchSuite object for registration ergonomics, while retaining the current explicit API.

Open for comments

This post and package are intentionally shared early for feedback from the Mojo community.

  • What benchmark workflows are missing from your current projects?
  • Which CI regression strategies have worked (or failed) for your teams?
  • Should the next step prioritise packaging/release flow, or API ergonomics first?

If you have views, I’d really value your comments below.


Project links:


Building high-performance data and AI services with Mojo at DataBooth. Questions or want to collaborate? Get in touch.