Sterling has too many projects Blogging about programming, microcontrollers & electronics, 3D printing, and whatever else...

Golang Slices are Dangerous

| 1840 words | 9 minutes | golang slices arrays risk
an aerial view of a winding road in the snow

The headline is a little bit click-bait-y, but I want to highlight one of the easily overlooked risks of using slices in Golang. First of all, there’s nothing really wrong with the way slices work. A trade-off was made. It is not necessarily the same trade-off I would make, but such is engineering.

Consider Arrays

First, let’s map what a slice is because it might not be obvious unless you’ve been writing Golang programs for a while. Let’s start by thinking about an array. In Golang, an array is a fixed string of elements of a given type. For example, we might define an array of int32 with ten elements as follows:

var myArray int32[10]

The compiler keeps track of how many elements are in the array and any attempt to access an item outside the array will result in a compile time error. The memory used by an array is the number of elements times the length of each element. In the case of myArray in the example above, the size is exactly 40 bytes.

Each element of an array is individually addressable and can be read from and written to using the array syntax that every C-like language typically uses:

fmt.Println(myArray[5])
myArray[0] = 42

The usefulness of an array is pretty limited, though. You have to know at the time you write your program the number of elements you need. Once set, the number of elements cannot be changed at all.

Enter the Slice

A slice, on the other hand, is like an array, but provides one additional layer of abstraction. The underlying data is identical to an array. But the slice itself is just a data structure that looks something like this:

type slice struct {
    ptr uintptr
    length int
    capacity int
}

On a 64-bit system, the size of a slice is 24 bytes. Then, the pointer points to an array that has capacity elements allocated. If the elements are int32 and the capacity is 10, then this will be an additional 40 byte allocation.

As such, if you try to access an item outside the slice, it is always a runtime error, not a compile time error. Your program will need to take care to avoid out of bounds errors or it will panic.

Here’s an example of a few different ways to initialize a slice in a program:

var array1 int32[10]
var array2 int32[20]

slice1 := int32[:]
slice2 := make([]int32, 10)
slice3 := int32[:10]
slice4 := make([]int32, 10, 20)

First, we allocate two arrays:

  1. The array named array1 contains ten elements and is 40 bytes long.
  2. The array named array2 contains twenty elements and is 80 bytes long.

Then, we allocate four slices. These numbers will assume a 64-bit system, which should be the more usual case in 2023.

  1. The slice named slice1 is a 24 bytes data structure pointing to data in array1, having a length of 10, and capacity of 10.
  2. The slice named slice2 is a 24 bytes data structure pointing to a new dynamically allocated array of data. The new allocation is 40 bytes long. The length of the slice is 10 and the capacity is 10.
  3. The slice named slice3 is another 24 bytes of data in a structure pointing to array2 for storage, having a length of 10, but a capacity of 20.
  4. The slice named slice4 is yet another 24 bytes of data in a structure pointing to a new dynamically allocated array of data. This new array is 80 bytes long. The length of the slice is set to 10 and the capacity is set to 20.

This additional layer of abstraction allows us to capture a subset of an underlying array (whether statically or dynamically allocated). This, then lets us share windows on a single array and also perform operations that let us easily expand the storage of a slice.

Consider these operations on the four slice we created above:

slice1 = append(slice1, 42)
slice2 = append(slice2, 42)
slice3 = append(slice3, 42)
slice4 = append(slice4, 42)

Let’s consider what each of these append operations actually do. In the case of slice1 and slice2, the capacity is equal to the length. This means, that to add another element will require a new allocation. I don’t know what allocation strategy Golang always uses, but in my experience, it doubles the previous allocation. That means that the first two operations will dynamically allocate 80 bytes to hold 20 int32 elements (4-bytes each). Then it will copy the 10 values in the previous data to the first 10 values of the newly allocated array. And then set the eleventh value to 42. The data in array1 will still be available (at least until the name goes out of scope) and won’t have been modified at all.

The second two are a little more interesting. In these cases, the capacity is greater than the length. Therefore, appending 42 will simply increase the length of the slice by one and set the eleventh element to 42. This means, in the case of the operation on slice3 that array2[10] has been set to 42.

Therefore, if you are using append and you have multiple slices that originally referred to the same underlying data, they may or may not point to the same data following the append. Furthermore, if more than one slice refers to the same data, modifying the data in one slice will change the data in all slices that refer to that same data. And it means that an append performed on one slice can result in assignment to and so on…

And this is where the danger comes in…

Sharing a Slice

Consider the following code, which shares the memory of a slice.

originalSlice := make([]int32, 10)

for i := range originalSlice {
  originalSlice[i] = i
}

copySlice := originalSlice

It’s obvious that any change to originalSlice will be reflect in copySlice, right? If I perform originalSlice[0] = 42, then I would expect copySlice[0] to also be 42. And the same is true in reverse. Any modification of copySlice will modify originalSlice. This is because both variables point to same underlying memory.

However, instead of making a copy, let’s just grab the last 5 elements of originalSlice as a sub-slice like this and then change the original:

subSlice := originalSlice[5:]
originalSlice[5] = 42

Pop quiz: what is the value of subSlice[0]?

If you say 42, then you are correct. If you expect it to be 5, then you are mistaken. I just want to hammer it home, that taking a sub-slice is literally just getting a pointer to a subset of the original slice’s elements. If you have worked with slices long, this really ought to be a “duh” but it can still catch me off guard at times.

Where my Problems Come From

To be clear, I don’t really have a problem getting tripped up by this in most cases. Generally, I don’t even use overlapping slice references at all and when I do, it is usually easy to be careful.

However, when processing text, things can get pretty complicated. In these cases, I do sometimes get tripped up and make silly mistakes. I think this is a combination of a number of factors.

  1. In Golang, the string type is somewhat unique in that you cannot modify a string value. You must transform it instead, which is why if you’re doing more than the simplest forms of string manipulation, you shouldn’t work with strings directly, but work with a strings.Builder or a []byte array or something similar. A part of my brain however, still has a tendency to think “string are immutable so text data is immutable.”

  2. Some operations like append and []byte() casting will perform fairly complex allocation and copy operations. I kind of wish Go wouldn’t do that, though it would also make some common operations even more tedious if it did not do that. Anyway, it can be easy to forget that if I perform an append on a slice, I might now have an entirely different underlying array from all the others. Or if I forget that I took a slice from a piece of data that I expect to be unchanged and perform an append on the slice, I will be changing that other data in ways I do not want.

  3. I also have a long history of writing code in other languages. I’ve been reasonably competent with a dozen or so languages and knowledgeable enough to dink around in at least a couple dozen more. And many of those have been C-ish, like Go. And in C, if you want to perform the operation a sub-slice performs, it looks more like originalSlice+5, not originalSlice[5:]. Similarly, in Perl, the language I’ve spent most of my career doing, the Golang syntax more closely resembles a sub-array copy expressions like @original_slice[5..9]. Offhand, I can’t even remember how to share the memory of an array in Perl, it’s such a rare operation. Slicing an array by reference like this is not that common in other languages, certainly not with as much centrality to the language as they have in Go.

Dealing with it

Let’s wrap up by considering how to cope with how Golang handles slices and shared memory here. If you want a cheap subset of the data, keep using those sub-slices. However, you really need to copy your data:

  • If the subset of the original data needs to be preserved separate from the original, or

  • If you need to manipulate the subset without risking the original.

It may seem expensive to do this, but the copy built-in is generally pretty fast for most copies unless you start getting into significant megabytes of data. In which case, maybe you just need to be really careful or come up with a specialized data structure.

But for the typical case, this will suffice:

copySubSlice := make([]int32, 5)
copy(copySubSlice, originalSlice[5:])

Explicitly allocate your new slice and then add that copy() built-in to perform the copy from one slice to the other. Now, modifications to each will be completely independent of each other. This is very fast in the majority of cases, it just is not quite as automatic as someSlice[5:].

Anyway, the trade-off Golang has made is to make getting segments of an array cheap and involve little to no copying in the default case. However, this is not the best in all situations. You should absolutely make those copies when you need them to protect yourself from obscure bugs and data corruption.

Anyway, I wanted to write the up to help me remember for next time and I hope it might come in handy for someone else who is either new-ish to the language or, like me, familiar with enough languages that one sometimes get a little mixed up because Golang handles this sort of situation a little differently. And it also contains a little primer on arrays and slices, which contains information I haven’t seen a lot of places.

I hope it has been useful.

Cheers.

The content of this site is licensed under Attribution 4.0 International (CC BY 4.0).