Be Abstract, Be Fast
2024-10-19
Keywords: Rust, C++, Zero Overhead Principle, Zero Cost Abstractions, F#.
Abstractions are one of the most powerful tools in a programmer's toolkit. They allow us to simplify complex systems, making code more readable and easier to maintain. But with that simplicity often comes a trade-off: performance. In many languages, higher-level abstractions can introduce overhead that slows down your program. However, not all languages follow this pattern. Rust and C++ take a different approach with the "Zero Overhead Principle"—a design philosophy that promises you can write high-level, abstract code without sacrificing performance. But how does this principle hold up in practice? In this post, we’ll dive into Rust’s optimizations and see how even more abstract, declarative code can perform just as well—if not better—than lower-level imperative solutions.
As Bjarne Stroustrup explained on The Lex Fridman Podcast...
"If you have an abstraction, it should not cost anything compared to writing the equivalent code at a lower level. So, if I have, say, a matrix multiplier, it should be written in such a way that you could not drop to the C level of abstraction and use arrays and pointers to run faster."
This is a beautiful idea, so let us see it in action by looking at some simple Rust code.
fn sum_even(vec: &Vec<i32>) -> i32 {
let mut sum = 0;
for n in vec {
if n % 2 == 0 {
sum += n
}
}
sum
}
The above code is a simple function that takes a vector of i32
as input and returns the sum of all even numbers in it. It is written in a classic imperative style, where we loop through the vector one element at a time and add the even numbers to a mutable accumulator. This is simple, and for many programming languages, it will be the most performant solution. Now, let's try to get a bit more abstract and create a declarative solution.
fn sum_even_2(vec: &Vec<i32>) -> i32 {
vec.iter()
.filter(|x| *x % 2 == 0)
.sum()
}
The above solution does exactly the same thing, but it is written in a declarative way. This would be the idiomatic implementation in a functional language. While some (myself included) might prefer this approach to solving the problem, it often comes with a performance penalty. However, in Rust, this is not the case. Remember our friend, the Zero Overhead Principle. This is an abstraction over the loop-and-accumulate solution, so it shouldn't cost us anything to use this method rather than reverting to the loop and accumulator approach.
So let us see how it performs in practice.
Function Running time (ns)
sum_even 8,840
sum_even_2 7,349
This test was conducted using an array of 100,000 32-bit integers. Each function was run 10,000 times, and the average running time was calculated. This should be precise enough. The benchmark was performed on an M2 Pro.
The result might surprise you: the approach using iterators was faster than our low-level solution. How could this be? It turns out that iterators in Rust are highly optimized. This includes inlining functions, unrolling loops, and removing bounds checks. The last part is especially important in this case. Looping over and accessing the contents of a vector causes the runtime to perform a bounds check each time the vector is accessed. This bounds check can be omitted for iterators because Rust knows beforehand that the iteration will stay within the bounds of the vector.
Furthermore, LLVM might be able to optimize some of the instructions into SIMD (Single Instruction, Multiple Data) instructions, which allow us to process multiple integers at once.
The "Zero Overhead Principle" isn’t just a theoretical idea; it’s something you can see in action. As we saw, using iterators and writing more abstract, declarative code not only didn’t hurt performance—it actually ran faster than the more traditional imperative approach.
The lesson here is that in Rust, you can have the best of both worlds: clean, abstract code without paying the usual performance price. So, next time you’re writing Rust, don’t be afraid to embrace these abstractions. Test it out for yourself—try writing more abstract code, run some benchmarks, and see how it performs. You might be surprised by the results!