Benchmarking Python with Cython, C, C++, and Rust: A Practical Approach with Prime Sieves and Numerical Integration

Posted on 2025-08-14 :: 1037 Words :: Tags: Coding, Benchmark, Python, Cython, C/C++, Rust, Numerical computing

Exec Summary

If you’ve ever felt the limits of Python speed for compute-heavy tasks, you’re not alone. While Python is great for productivity and ease of use, sometimes you want to make things faster (without throwing away your entire codebase). In this post, I share a benchmarking project I built to explore exactly that: how much faster can your Python code run if you offload key computational parts to Cython, C, C++, or Rust?

Why Benchmarking Matters

Too often, we hear that “C is faster than Python” or “Rust can match C performance.” But how much faster? And is it worth the integration effort? I wanted a practical setup to compare them on real-world-ish problems — not just small loops, but algorithms you might typically use, however sufficiently simple to not obscure the underlying illustration of performance differences.

The Algorithms

I picked two classic problems with different computational characteristics:

Prime Number Sieve (Sieve of Eratosthenes) — memory- and integer-heavy with heavy loop dependence
Trapezoidal Rule Numerical Integration (fixed to f(x) = x²) — floating-point heavy with lots of arithmetic

Both are easy to code but non-trivial enough to get meaningful benchmark results.

How the Code Works

I wrote implementations in:

Pure Python
Cython (compiled Python with static typing)
C (shared libraries callable via ctypes)
C++ (using pybind11 for Python extension modules)
Rust (using PyO3 for seamless Python modules)

Then, a single benchmarking harness runs each version across increasing problem sizes, times the runs, compares outputs for correctness, and shows easy-to-read tables and plots.

Naive by Design

A key note: the trapezoidal integration is implemented in a deliberately naive way — looping across subintervals, summing squares. While there are closed-form formulas to solve this exact problem optimally, our aim is to compare raw language performance with identical algorithms, not to craft numerical libraries. As such, it’s more a demonstration of integration overhead and language/runtime speed.

This reflects a valuable principle in performance optimsation: Always prefer the correct and optimal algorithm first. If it’s fast enough for your needs, micro-optimising naive implementations may not be worth the effort.

Results and Insights

Running the benchmarks, you’ll see where Cython easily beats Python, where C and Rust are often neck-and-neck, and how C++ with pybind11 simplifies extension building. You’ll also notice how critical it is to balance algorithm choice with implementation efficiency.

What’s Next?

I made a justfile to build all native extensions with a single command, and the harness uses Plotly for interactive charts so you can visually compare run times and scalability.

If you’re interested, the entire codebase and build instructions take a look at my py-num-bench GitHub repository.

For now, consider where your Python code hits bottlenecks — might be time to explore these tools.

If you want to explore this project or get help with your own Python performance challenges, reach out or leave a comment below!

Appendix: Prime Sieve

Certainly! Here's a concise but clear background section on the Sieve of Eratosthenes algorithm you can add as an appendix to your project. It explains the problem, the algorithm's approach, its complexity, and practical relevance.

References

[1] https://www.databooth.com.au/posts/ [2] https://mkgmarketinginc.com/blog/seo/3-blog-content-tones-that-can-set-you-up-for-success/ [3] https://lokalise.com/blog/how-to-adapt-your-tone-of-voice-for-new-markets/ [4] https://www.digital.nsw.gov.au/delivery/digital-service-toolkit/resources/writing-content/content-101/finding-a-tone-of-voice [5] https://storychief.io/blog/types-of-tone-of-voice [6] https://www.brafton.com.au/blog/creation/4-brand-tone-of-voice-examples-to-use-when-building-your-own/ [7] https://blog.copyfol.io/tone-of-voice-examples [8] https://www.databooth.com.au/about/ [9] https://www.reddit.com/r/copywriting/comments/cbzly9/style_guidetone_of_voice_examples_anyone_have/ [10] https://www.fastercapital.com/content/Trade-show-marketing--Booth-Design-Trends--Staying-Ahead--The-Latest-Booth-Design-Trends-for-Trade-Shows.html [11] https://api.drum.lib.umd.edu/server/api/core/bitstreams/c74b0e99-eb2e-4fb6-bca2-20bbe52d902a/content

Appendix: Background on the Sieve of Eratosthenes

The Sieve of Eratosthenes is a classical and ancient algorithm, dating back to around 250 BCE, for efficiently finding all prime numbers up to a given integer $n$.

What is the problem?

A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. Finding primes up to a large limit is a fundamental problem in number theory and has many applications in cryptography, hashing, and computational mathematics.

How does the sieve work?

The algorithm iteratively eliminates composite numbers (non-primes) by marking the multiples of each prime starting from 2:

List all integers from 2 up to $n$.
Start with the smallest unmarked number $p$ (initially 2), which is prime.
Mark all multiples of $p$ (i.e., $2p, 3p, 4p, \ldots $) as composite.
Move to the next unmarked number greater than $p$ and repeat step 3.
Stop when $p^2 > n$; all remaining unmarked numbers are primes.

A key optimisation is starting the marking from $p^2$ instead of $2p$, since smaller multiples will have been marked by smaller primes.

Why is it efficient?

The sieve runs in approximately $O(n \log \log n)$ time, which is very efficient for enumerating all primes not exceeding $n$.
It requires $O(n)$ memory to store the boolean array that tracks prime candidates.
It only uses addition and simple marking operations, avoiding expensive divisions or multiplications.

Practical considerations

The sieve can be refined by considering only odd numbers or using "wheel factorisation" to skip multiples of small primes.
For very large ranges, segmented sieves divide the problem into manageable blocks.
The sieve is ideal for generating lists of primes but is not the fastest for testing primality of individual numbers.

[1] https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes [2] https://en.wikipedia.org/wiki/Generation_of_primes [3] https://cp-algorithms.com/algebra/sieve-of-eratosthenes.html [4] https://brilliant.org/wiki/sieve-of-eratosthenes/ [5] https://www.geeksforgeeks.org/dsa/sieve-of-eratosthenes/ [6] https://www.topcoder.com/thrive/articles/sieve-of-eratosthenes-algorithm [7] https://www.youtube.com/watch?v=V08g_lkKj6Q [8] https://www.khanacademy.org/computing/computer-science/cryptography/comp-number-theory/v/sieve-of-eratosthenes-prime-adventure-part-4 [9] https://www.youtube.com/watch?v=eKp56OLhoQs [10] https://byjus.com/maths/sieve-of-eratosthenes/