Golang Slices are Dangerous
The headline is a little bit click-bait-y, but I want to highlight one of the easily overlooked risks of using slices in Golang. First of all, there’s nothing really wrong with the way slices work. A trade-off was made. It is not necessarily the same trade-off I would make, but such is engineering.
Consider Arrays
First, let’s map what a slice is because it might not be obvious unless you’ve
been writing Golang programs for a while. Let’s start by thinking about an
array. In Golang, an array is a fixed string of elements of a given type. For
example, we might define an array of int32
with ten elements as follows:
var myArray int32[10]
The compiler keeps track of how many elements are in the array and any attempt
to access an item outside the array will result in a compile time error. The
memory used by an array is the number of elements times the length of each
element. In the case of myArray
in the example above, the size is exactly 40
bytes.
Each element of an array is individually addressable and can be read from and written to using the array syntax that every C-like language typically uses:
fmt.Println(myArray[5])
myArray[0] = 42
The usefulness of an array is pretty limited, though. You have to know at the time you write your program the number of elements you need. Once set, the number of elements cannot be changed at all.
Enter the Slice
A slice, on the other hand, is like an array, but provides one additional layer of abstraction. The underlying data is identical to an array. But the slice itself is just a data structure that looks something like this:
type slice struct {
ptr uintptr
length int
capacity int
}
On a 64-bit system, the size of a slice is 24 bytes. Then, the pointer points to
an array that has capacity elements allocated. If the elements are int32
and the capacity is 10, then this will be an additional 40 byte allocation.
As such, if you try to access an item outside the slice, it is always a runtime error, not a compile time error. Your program will need to take care to avoid out of bounds errors or it will panic.
Here’s an example of a few different ways to initialize a slice in a program:
var array1 int32[10]
var array2 int32[20]
slice1 := int32[:]
slice2 := make([]int32, 10)
slice3 := int32[:10]
slice4 := make([]int32, 10, 20)
First, we allocate two arrays:
- The array named
array1
contains ten elements and is 40 bytes long. - The array named
array2
contains twenty elements and is 80 bytes long.
Then, we allocate four slices. These numbers will assume a 64-bit system, which should be the more usual case in 2023.
- The slice named
slice1
is a 24 bytes data structure pointing to data inarray1
, having a length of 10, and capacity of 10. - The slice named
slice2
is a 24 bytes data structure pointing to a new dynamically allocated array of data. The new allocation is 40 bytes long. The length of the slice is 10 and the capacity is 10. - The slice named
slice3
is another 24 bytes of data in a structure pointing toarray2
for storage, having a length of 10, but a capacity of 20. - The slice named
slice4
is yet another 24 bytes of data in a structure pointing to a new dynamically allocated array of data. This new array is 80 bytes long. The length of the slice is set to 10 and the capacity is set to 20.
This additional layer of abstraction allows us to capture a subset of an underlying array (whether statically or dynamically allocated). This, then lets us share windows on a single array and also perform operations that let us easily expand the storage of a slice.
Consider these operations on the four slice we created above:
slice1 = append(slice1, 42)
slice2 = append(slice2, 42)
slice3 = append(slice3, 42)
slice4 = append(slice4, 42)
Let’s consider what each of these append
operations actually do. In the case
of slice1
and slice2
, the capacity is equal to the length. This means, that
to add another element will require a new allocation. I don’t know what
allocation strategy Golang always uses, but in my experience, it doubles the
previous allocation. That means that the first two operations will dynamically
allocate 80 bytes to hold 20 int32
elements (4-bytes each). Then it will copy
the 10 values in the previous data to the first 10 values of the newly allocated
array. And then set the eleventh value to 42. The data in array1
will still be
available (at least until the name goes out of scope) and won’t have been
modified at all.
The second two are a little more interesting. In these cases, the capacity is
greater than the length. Therefore, appending 42 will simply increase the length
of the slice by one and set the eleventh element to 42. This means, in the case
of the operation on slice3
that array2[10]
has been set to 42.
Therefore, if you are using append
and you have multiple slices that
originally referred to the same underlying data, they may or may not point to
the same data following the append
. Furthermore, if more than one slice refers
to the same data, modifying the data in one slice will change the data in all
slices that refer to that same data. And it means that an append
performed on
one slice can result in assignment to and so on…
And this is where the danger comes in…
Sharing a Slice
Consider the following code, which shares the memory of a slice.
originalSlice := make([]int32, 10)
for i := range originalSlice {
originalSlice[i] = i
}
copySlice := originalSlice
It’s obvious that any change to originalSlice
will be reflect in copySlice
,
right? If I perform originalSlice[0] = 42
, then I would expect copySlice[0]
to also be 42
. And the same is true in reverse. Any modification of
copySlice
will modify originalSlice
. This is because both variables point to
same underlying memory.
However, instead of making a copy, let’s just grab the last 5 elements of
originalSlice
as a sub-slice like this and then change the original:
subSlice := originalSlice[5:]
originalSlice[5] = 42
Pop quiz: what is the value of subSlice[0]
?
If you say 42
, then you are correct. If you expect it to be 5
, then you are
mistaken. I just want to hammer it home, that taking a sub-slice is literally
just getting a pointer to a subset of the original slice’s elements. If you have
worked with slices long, this really ought to be a “duh” but it can still catch
me off guard at times.
Where my Problems Come From
To be clear, I don’t really have a problem getting tripped up by this in most cases. Generally, I don’t even use overlapping slice references at all and when I do, it is usually easy to be careful.
However, when processing text, things can get pretty complicated. In these cases, I do sometimes get tripped up and make silly mistakes. I think this is a combination of a number of factors.
-
In Golang, the
string
type is somewhat unique in that you cannot modify astring
value. You must transform it instead, which is why if you’re doing more than the simplest forms of string manipulation, you shouldn’t work with strings directly, but work with astrings.Builder
or a[]byte
array or something similar. A part of my brain however, still has a tendency to think “string are immutable so text data is immutable.” -
Some operations like
append
and[]byte()
casting will perform fairly complex allocation and copy operations. I kind of wish Go wouldn’t do that, though it would also make some common operations even more tedious if it did not do that. Anyway, it can be easy to forget that if I perform anappend
on a slice, I might now have an entirely different underlying array from all the others. Or if I forget that I took a slice from a piece of data that I expect to be unchanged and perform anappend
on the slice, I will be changing that other data in ways I do not want. -
I also have a long history of writing code in other languages. I’ve been reasonably competent with a dozen or so languages and knowledgeable enough to dink around in at least a couple dozen more. And many of those have been C-ish, like Go. And in C, if you want to perform the operation a sub-slice performs, it looks more like
originalSlice+5
, notoriginalSlice[5:]
. Similarly, in Perl, the language I’ve spent most of my career doing, the Golang syntax more closely resembles a sub-array copy expressions like@original_slice[5..9]
. Offhand, I can’t even remember how to share the memory of an array in Perl, it’s such a rare operation. Slicing an array by reference like this is not that common in other languages, certainly not with as much centrality to the language as they have in Go.
Dealing with it
Let’s wrap up by considering how to cope with how Golang handles slices and shared memory here. If you want a cheap subset of the data, keep using those sub-slices. However, you really need to copy your data:
-
If the subset of the original data needs to be preserved separate from the original, or
-
If you need to manipulate the subset without risking the original.
It may seem expensive to do this, but the copy
built-in is generally pretty
fast for most copies unless you start getting into significant megabytes of
data. In which case, maybe you just need to be really careful or come up with a
specialized data structure.
But for the typical case, this will suffice:
copySubSlice := make([]int32, 5)
copy(copySubSlice, originalSlice[5:])
Explicitly allocate your new slice and then add that copy()
built-in to
perform the copy from one slice to the other. Now, modifications to each will be
completely independent of each other. This is very fast in the majority of
cases, it just is not quite as automatic as someSlice[5:]
.
Anyway, the trade-off Golang has made is to make getting segments of an array cheap and involve little to no copying in the default case. However, this is not the best in all situations. You should absolutely make those copies when you need them to protect yourself from obscure bugs and data corruption.
Anyway, I wanted to write the up to help me remember for next time and I hope it might come in handy for someone else who is either new-ish to the language or, like me, familiar with enough languages that one sometimes get a little mixed up because Golang handles this sort of situation a little differently. And it also contains a little primer on arrays and slices, which contains information I haven’t seen a lot of places.
I hope it has been useful.
Cheers.