For work, I was going to store some hashed tokens in a database. I was going to keep it simple and go with HMAC-SHA256 for it but having recently read Jean-Philippe Aumasson’s book “Serious Cryptography” I remembered that BLAKE2 should be quicker:
BLAKE2 was designed with the following ideas in mind:
…
It should be faster than all previous hash standards
Cool, I thought, let’s consider BLAKE2 then. First, let’s write a simple benchmark to see just how much faster BLAKE2 would be than HMAC-SHA256. Performance is not important for my use case as the hashing will almost certainly not be a bottleneck but I was curious. So I write a benchmark:
import (
"crypto/hmac"
"crypto/rand"
"crypto/sha256"
"log"
"testing"
"golang.org/x/crypto/blake2b"
)
func BenchmarkHashes(b *testing.B) {
token := []byte("some-api-token")
secretKey := generateSecretKey()
b.ResetTimer()
_ = b.Run("HMACSHA256", func(b *testing.B) {
for b.Loop() {
_ = HMACSHA256(token, secretKey)
}
})
_ = b.Run("BLAKE2b", func(b *testing.B) {
for b.Loop() {
_ = BLAKE2b(token, secretKey)
}
})
}
func generateSecretKey() []byte {
key := make([]byte, 32)
_, err := rand.Read(key)
if err != nil {
panic(err)
}
return key
}
func HMACSHA256(token []byte, secretKey []byte) []byte {
h := hmac.New(sha256.New, secretKey)
h.Write(token)
return h.Sum(nil)
}
func BLAKE2b(token []byte, secretKey []byte) []byte {
hasher, err := blake2b.New256(secretKey)
if err != nil {
log.Fatal(err)
}
hasher.Write(token)
return hasher.Sum(nil)
}
Run it and the results are:
cpu: Apple M1 Max
BenchmarkHashes/HMACSHA256-10 3442680 341.0 ns/op 512 B/op 6 allocs/op
BenchmarkHashes/BLAKE2b-10 1966382 584.0 ns/op 416 B/op 2 allocs/op
OK… So BLAKE2 is slower than HMAC-SHA256. Yes, we have less allocations which is nice, but it does take quite a few more CPU cycles. My first thought is that it might indeed be faster but only if the input is like way, way longer. So I switch to the following token:
token := []byte(strings.Repeat("some-api-token", 10000))
cpu: Apple M1 Max
BenchmarkHashes/HMACSHA256-10 19536 59946 ns/op 512 B/op 6 allocs/op
BenchmarkHashes/BLAKE2b-10 6452 190926 ns/op 416 B/op 2 allocs/op
Hmmmm… This is even worse. Now, it could be that we’re dealing with a non optimal implementation. SHA256 is implemented in the stdlib in Go and very likely well optimized, whereas the BLAKE2 implementation I’m using comes from the golang.org/x/crypto
package. Perhaps we can find a better one. The first search result recommends github.com/minio/blake2b-simd
which has been archived since 2018. Not promising, but let’s give it a shot.
import (
blakeMinio "github.com/minio/blake2b-simd"
)
func BLAKE2bMinio(token []byte, secretKey []byte) []byte {
hasher := blakeMinio.NewMAC(32, secretKey)
hasher.Write(token)
return hasher.Sum(nil)
}
These are the results with the original token.
cpu: Apple M1 Max
BenchmarkHashes/HMACSHA256-10 3622322 316.7 ns/op 512 B/op 6 allocs/op
BenchmarkHashes/BLAKE2b-10 2881012 415.6 ns/op 416 B/op 2 allocs/op
BenchmarkHashes/BLAKE2b-minio-10 2138151 566.4 ns/op 480 B/op 2 allocs/op
So this is even slower… I quickly glance over the codebase of github.com/minio/blake2b-simd
and notice all the architecture specific files but there aren’t any for ARM architecture. Let’s try on a machine with an AMD64 processor.
cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
BenchmarkHashes/HMACSHA256-4 761613 1488 ns/op 512 B/op 6 allocs/op
BenchmarkHashes/BLAKE2b-4 2287569 566.9 ns/op 416 B/op 2 allocs/op
BenchmarkHashes/BLAKE2b-minio-4 1541990 765.4 ns/op 480 B/op 2 allocs/op
Right, so it seems that the implementations I’m using are only optimized for AMD64. Or are they? One final test, I spin up an ARM based VPS on Hetzner to test it out. Note, that since the benchmark was not able to determine the exact CPU model the CPU line was omitted.
BenchmarkHashes/HMACSHA256-2 1000000 1007 ns/op 512 B/op 6 allocs/op
BenchmarkHashes/BLAKE2b-2 1367085 879.6 ns/op 416 B/op 2 allocs/op
BenchmarkHashes/BLAKE2b-minio-2 1123965 1080 ns/op 480 B/op 2 allocs/op
So with this, it seems that the issue only occurs on Apple Silicon processors. In the benchmarks with other processors, the BLAKE2b does win. I’m not entirely sure what causes this as CPU architectures are not my strong suite. If you do - please let me know by posting a comment.