Go gotchas: rune is the new char

Let’s talk about S#. What, you know only about C#, or maybe F#?

That’s German double S — ß, which may be called “eszett” or “sharp S”.

If you've been to Munich, you've probably visited Maximilianstraße. It has this “sharp S” at the end.

Maximilianstraße is quite a long word. Not sure how many character it has. Let’s try to count them using Go, and, simply for comparison, horrendous JavaScript.

I’ll start with JavaScript:

Now same with Go:

That’s strange. JavaScript counted only 16 characters.

Well, doesn’t matter. Let’s print only our “sharp S”:

Wait, what?

Ok, I’ll count it by hand…

Where’s my ß?

Oh, here it is. Why does it take two characters, though?

As with many other matters, Go support of Unicode is somewhat… lacking. Go is okay with Unicode. Doesn't have anything against it. But it will not vote for it on the next Unicode vs UTF-8 elections (wait, they’re not the same?!)

So, when dealing with string slices, as we did now, we need to work harder. Consider this:

Now to get our letter out, we convert string to slice of runes. Runes are the “proper characters”, as we think of them.

Interestingly, though, for loop is aware of those quirks:

Note how 15th index is smartly skipped in the code above.

Remember this next time you plan to work with possibly Unicode slices in Go.

You can see the entire example on my GitHub:

And play with it on Go Playground: https://play.golang.org/p/VfucB9MoKs

Solutions Architect @Depop, author of “Hands-on Design Patterns with Kotlin” book and “Web Development with Kotlin” course