Sterling has too many projects Blogging about programming, microcontrollers & electronics, 3D printing, and whatever else...

Golang Set Types Matter (A Little)

| 813 words | 4 minutes | golang sets maps
untitled

This is just to settle a dispute that mostly exists in my own head. It likely matters very, very little. However, it matters enough. I am currently interviewing for new jobs and during one interview, I was asked to design an algorithm for which I needed a set data type. Golang provides no built-in set data type, but it does provide a map, so I defined my set as follows:

type Set map[string]struct{}
var exists = struct{}{}

func (s *Set) Add(item string) {
    s[item] = exists
}

func (s *Set) Contains(item string) bool {
    _, hasItem := s[item]
    return hasItem
}

I actually did not define anything that formally. I mostly just used a map that way and said I was using it as a set as part of the interview. I suggested I could make it into a formal type if the interview wanted to see me do it. The dispute was that he suggested that instead of using struct{} as the value type for the map, it would be faster to use interface{} instead (a.k.a. any in recent versions of Golang). However, I was pretty sure struct{} should be faster.

I believed this for two reasons.

  1. I remember reading it somewhere, probably on StackOverflow or similar.

  2. It makes logical sense that a language as mature as Golang with as much emphasis on compiler optimization as it has, would completely abstract away the struct{} part of map[string]struct{} where the only possibly value is struct{}{}. Whereas map[string]interface{} still must still have a pointer value because interface{} is the opposite of struct{} as the language’s most generic type: values of that type may hold a pointer to literally any type in Golang.

Anyway, this thought has been bugging me, so I did a quick benchmark to verify this empirically. This is not at all exhaustive, but the results are clear and tell the story I expected to be told. Note that I use a map[int]... rather than map[string]..., but that’s purely for the purpose of laziness. It’s easier to create unique integer keys than string keys. (Sadly, Golang lacks the “AA"++ operation that made generating string keys relatively easy in Perl. I couldn’t be bothered to find a library to do that for this. Writing this post took longer than writing the benchmark.)

Here’s the benchmark program:

package main

import "testing"

type InterfaceMap map[int]any

var IExists any = nil

func BenchmarkInterfaceSet(b *testing.B) {
	m := make(InterfaceMap, 10)
	for i := 0; i < b.N; i++ {
		m[i] = IExists
	}
}

func nop() {}

var presetImap = make(InterfaceMap, 10_000)

func init() {
	m := presetImap
	for i := 0; i < 10_000; i++ {
		m[i] = IExists
	}
}

func BenchmarkInterfaceGet(b *testing.B) {
	m := presetImap
	for i := 0; i < b.N; i++ {
		if _, isSet := m[i]; isSet {
			nop()
		}
	}
}

type StructMap map[int]struct{}

var SExists = struct{}{} // H-Huh, H-Huh... He said... sexists. ~~ Butthead

func BenchmarkStructSet(b *testing.B) {
	m := make(StructMap, 10)
	for i := 0; i < b.N; i++ {
		m[i] = SExists
	}
}

var presetSmap = make(StructMap, 10_000)

func init() {
	m := presetSmap
	for i := 0; i < 10_000; i++ {
		m[i] = SExists
	}
}
func BenchmarkStructGet(b *testing.B) {
	m := presetSmap
	for i := 0; i < b.N; i++ {
		if _, isSet := m[i]; isSet {
			nop()
		}
	}
}

And the results of go test -bench=. -count 5 on my laptop:

goos: linux
goarch: amd64
pkg: github.com/zostay/scratch
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkInterfaceSet-16         7397239               185.9 ns/op
BenchmarkInterfaceSet-16         9243465               200.5 ns/op
BenchmarkInterfaceSet-16         9258003               173.6 ns/op
BenchmarkInterfaceSet-16         9453066               172.6 ns/op
BenchmarkInterfaceSet-16         9301214               175.2 ns/op
BenchmarkInterfaceGet-16        99072818                12.15 ns/op
BenchmarkInterfaceGet-16        100000000               12.14 ns/op
BenchmarkInterfaceGet-16        97749759                11.83 ns/op
BenchmarkInterfaceGet-16        100000000               11.76 ns/op
BenchmarkInterfaceGet-16        100000000               11.64 ns/op
BenchmarkStructSet-16           13275610               121.8 ns/op
BenchmarkStructSet-16           14263959               141.3 ns/op
BenchmarkStructSet-16           14711151               146.4 ns/op
BenchmarkStructSet-16           15161526               151.7 ns/op
BenchmarkStructSet-16           15210423               148.7 ns/op
BenchmarkStructGet-16           100000000               10.02 ns/op
BenchmarkStructGet-16           100000000               10.11 ns/op
BenchmarkStructGet-16           100000000               10.13 ns/op
BenchmarkStructGet-16           100000000               10.03 ns/op
BenchmarkStructGet-16           127146160                9.405 ns/op
PASS
ok      github.com/zostay/scratch     31.801s

I figured the Set operations would perform differently, but I did not really expect the Get operations to be as different as they are. The median Set operation is 20% faster using struct{} and the Get is 18% faster in my little test with no benchmark from struct{} being slower than any equivalent in interface{}. That’s conclusive as far as I’m concerned.

Now, in actuality, this performance difference will matter very little in most applications. However, if performance matters to your application at all, it is clear that using map[int]struct{} will have superior CPU performance to map[int]interface{}. I would expect that improvement to be seen generally anywhere a struct{} will suffice in place of an interface{}. I would expect memory performance to be improved as well, though I leave that hypothesis as an exercise for the reader to test.

Cheers.

The content of this site is licensed under Attribution 4.0 International (CC BY 4.0).