Go range iterators demystified

deergomoo · on July 14, 2024

I’m glad this is being added for the ergonomic benefits. But the number of times the article points out cases where conventions will need to be formed by the community makes me fear this will add to the already longer-than-I’d-like list of anti-patterns or footguns that Go’s design and type system makes easy to fall into.

Someone · on July 14, 2024

FTA: “Maybe you just want to use the range keyword to iterate over every element of your collection. Easy enough.

  func (s Slice) All() func(yield func(i int) bool) {
    return func(yield func(i int) bool) {
      for i := range s {
        if !yield(s[i]) {
          return
        }
      }
    }
  }

”

So, for an “easy enough” example correctly, you have to write func five times in order to, if I understand this correctly, wrote a function returning a function that takes a function as an argument?

For comparison, C# does that this way (https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...)

  IEnumerable<int> ProduceEvenNumbers(int upto)
  {
    for (int i = 0; i <= upto; i += 2)
    {
        yield return i;
    }
  }

Yes, that introduces “magic” where the runtime figures out that ProduceEvenNumbers won’t continue, but why give such functions the flexibility not to listen to such requests (in the golang version, is forgetting the if and just yielding instead ever useful?)

Seb-C · on July 14, 2024

I don't know why most examples about the Go iterators shows unnecessarily convoluted examples. The authors probably don't know as much of the language as they think they do.

Go is far from perfect and certainly has a lot of flaws, but IMO the iterators design fits well into the language and does not deserve the criticism.

The range only needs a function, so what is the point of calling a function that returns a function?

This is enough to make a range iterator, you don't have to call `All` manually:

    func (slice Slice) All(yield func(string) bool) {
        for _, s := range slice {
            if !yield(s) {
                break
            }
        }
    }

https://go.dev/play/p/7AYOXzlHU4t?v=gotip

In Go, the expression `object.method` already returns the method as a function, with `object` being already bound in the scope of the function. The syntax is so that if you have this definition:

    type T string
    func (t T) f(a int) int {
        return a
    }

Then:

- `T.f` returns `func(t T, a int) int`

- `T("whatever").f` returns `func(a int) int` where `t` is already set to `"whatever"`

- `T("whatever").f(42)` calls the function and returns `42`

zachmu · on July 15, 2024

This is true, you can just return the method without invoking it.

I don't tend to use that convention though, because most of the time when I use this pattern, I'm returning a function that uses a parameter I pass in. You can see this in the predicate example. I prefer keeping my call sites consistent and always using functions that return a function when invoked, rather than sometimes invoking them and sometimes just passing the func reference. YMMV.

Someone · on July 15, 2024

That looks a lot better, indeed. I still don’t understand why every implementation has to do that if in if !yield(s) check, though.

Is it ever useful to be able to do something there before returning? If so, wouldn’t it be cleaner to implement this as an interface with two methods, one that simply has to yield for every item produced and an optional one that gets called when the runtime knows it won’t ever run the first one anymore?

Seb-C · on July 16, 2024

Because yield is implemented as a callback, there is no compiler magic to automatically stop the iteration when the caller side uses break or continue, so this condition cannot really be avoided.

Also there are many cases where you might need to have some special logic after the last iteration, so simply killing the underlying function is not an option.

pjmlp · on July 14, 2024

I am all for having them in the language, however the way they have been designed, or how the new magic fields for structure aligment (in Go 1.23) are being designed, this shows how attacking other languages as PhD level complexity and then coming out with such special case designs, is kind of ironic.

I give it 10 more years of such special cased improved, for Go not to be any better than the languages the community regularly complains about, while Go is "perfect".

tapirl · on July 14, 2024

Agree. Now, the old built-in iterators (containers) are incompatible with the new custom iterators.

But it looks the Go core team never care about the consistency problem. The old built-in generics are also incompatible with the new custom generics.

rplnt · on July 14, 2024

The article says there are helper functions to convert between the two, if I understood it correctly.

tapirl · on July 14, 2024

That is why I say the two are not compatible.

WhyNotHugo · on July 14, 2024

All this added complexity seems to be just sugar syntax too. The examples here clearly prove that we could write clean code that achieved the same functionality without this feature.

We could also just use an Iterator pattern for such use cases too.

Go used to be such a simple language. I wonder what's driving them to keep adding complexity to the language itself. It makes me respect Harelang's goal of "freeze the language once 1.0 is out" a lot more.

bheadmaster · on July 14, 2024

The proposal [0] explains the motivation behind this "syntax sugar":

    There is no standard way to iterate over a sequence of values in Go. For lack of any convention, we have ended up with a wide variety of approaches. Each implementation has done what made the most sense in that context, but decisions made in isolation have resulted in confusion for users.

    In the standard library alone, we have archive/tar.Reader.Next, bufio.Reader.ReadByte, bufio.Scanner.Scan, container/ring.Ring.Do, database/sql.Rows, expvar.Do, flag.Visit, go/token.FileSet.Iterate, path/filepath.Walk, go/token.FileSet.Iterate, runtime.Frames.Next, and sync.Map.Range, hardly any of which agree on the exact details of iteration. Even the functions that agree on the signature don’t always agree about the semantics. For example, most iteration functions that return (T, bool) follow the usual Go convention of having the bool indicate whether the T is valid. In contrast, the bool returned from runtime.Frames.Next indicates whether the next call will return something valid.

    When you want to iterate over something, you first have to learn how the specific code you are calling handles iteration. This lack of uniformity hinders Go’s goal of making it easy to easy to move around in a large code base. People often mention as a strength that all Go code looks about the same. That’s simply not true for code with custom iteration.

[0] https://github.com/golang/go/discussions/56413

WhyNotHugo · on July 19, 2024

Ah, there was no standard so they defined a new one. Classic https://xkcd.com/927/

jdougan · on July 14, 2024

If I wanted ranges I'd have probably done them in a similar way to the ones in D aka. Dlang. Several methods instead of 1, but very clear.

rplnt · on July 14, 2024

Wouldn't that always require you to have a dedicated type? I don't know D but I'm imagining what python has.

Nice thing about this go approach is you can have just functions as shown in examples. One practical use is functions building range functions on top of existing interfaces. Or as the filters examples shown, you can create filter generators on regular slices/maps.

tsimionescu · on July 14, 2024

Creating a small wrapper type over a slice/map is trivial in Go (`type X[T] []T`) , and then you can define the range functions as methods on that slice type. If they allowed generic instance methods it would be even simpler.

jdougan · on July 14, 2024

Exactly. Similar in D. And using multiple methods leaves obvious places to expand the capability (bidirectional, random access, etc.)

ncruces · on July 14, 2024

Generic instance methods are not going to happen.

The only point of methods in Go is to implement interfaces, and that doesn't really work out.

This has been known since the generics proposal.

tsimionescu · on July 14, 2024

That's not the only point of methods, even if it's the only one the designers of Go envisioned. Another very relevant purpose is method chaining syntax. That is, with instance methods you can write a.b().c(), with functions you have to write c(b(a)). This turns out to be extremely relevant for longer chains.

Of course, other than generic methods, this could also be supported by just supporting universal function call syntax. That is, the compiler could simply take f(x, a) and x.f(a) to be perfectly equivalent, regardless of whether f is a method of x's type or a free floating method. There is some minor complication because of backwards compatibility, but that's easily fixed (the syntax you use can prefer the function f vs the method f if there is any ambiguity).

On the other hand, generic methods can be extremely useful in their own right, for other reasons as well. Having generic methods in an interface, such that a type has to have a generic method to implement that interface, is perfectly reasonable as a feature request - it wouldn't contradict anything in the spirit of Go. Of course, the implementation can have problems and trade-offs, I'm not claiming this is an easy feature to implement. But I don't think it's excluded.

masklinn · on July 14, 2024

> Creating a small wrapper type over a slice/map is trivial in Go

And yet it's specifically one thing rsc did not want. Further issues described in the rangefunc proposals:

- it would require the desugaring to run off of method-set analysis of userland types, something which does not currently exist

- it severely complicates resource management around the iterator, as you need a 3-step iteration for resource-bearing iterators (acquire iterator, defer cleanup, perform iteration)

And one not actually listed explicitly: for the limited amount of optimisations the Go compiler does, internal iteration is a lot easier to optimise as it pretty much inlines down to a `for` loop, the termination of which is much easier to analyse than bouncing through a bunch of pull calls.

Not only that, but `for range` works off of underlying type, so this is already valid Go:

    import "fmt"

    type Foo []int

    func main() {
        f := Foo([]int{1, 2, 3})
        for _, v := range f {
            fmt.Println(v)
        }
    }

neonsunset · on July 14, 2024

One could approach it the other way. Once more projects adopt all kinds of wrapper functions and types, the deficiency of Go will become a more widespread knowledge as compiler will get progressively less able to cope with added abstractions.

Hopefully it will put the common fallacy of "Go or Rust" to rest as the weight class and capabilities are on opposite ends of spectrum, with much closer comparison being "Rust or C# or Swift or Kotlin" if one looks for a Rust alternative that makes a tradeoff of not forcing many small decision to reduce decision fatigue by conceding to a reasonable extent certain areas Rust excels at.

In any case, for its touted simplicity Go sure doesn't look like a simple and straightforward to follow language anymore.

The 3-step approach to iteration is also well solved[0] in C#, and works even in rather complex cases like `File.ReadLines(...)` where the line iterator internally handles IO, file handle acquisition and disposal. Just `foreach (var msg in File.ReadLines("messages.jsonl"))` and you won't be able to make a footgun out of it.

This also applies to the usage chained with filter/map/etc.

    var messages = File
        .ReadLines("messages.jsonl")
        .Select(line => JsonSerializer.Deserialize<Message>(line))
        .ToArray();

[0]: https://sharplab.io/#v2:EYLgtghglgdgPgAQEwEYCwAoBAGABAlAOgEk... note .GetEnumerator() and .Dispose() calls

tsimionescu · on July 14, 2024

Any new feature requires something that currently doesn't exist in the compiler. Just because the change would be larger doesn't make it worse, or at least this can't be the only argument.

And if you do implement interface-based iteration with an Iterator interface, it's not hard to also add a ClosableIterator interface and have the loop handle the auto-close as well.

jdougan · on July 14, 2024

Andrei Alexandrescu's design notes. He approached it from his prior experience with C++.

https://www.informit.com/articles/printerfriendly/1407357

frou_dh · on July 14, 2024

"Community" is kinda the key word there, because when someone is "along for the ride" over a period of many years then a ratcheting of complexity is easier to cope with than it is to arrive cold in a big/complex language.

pjmlp · on July 14, 2024

I agree with the idea, as I imagine someone landing today on C++23, C23, C# 12, Java 22, will have a much hard time that those of us that know them pretty much since their first baby steps as programming languages.

As per my own experience since the mid 1980's.

However, exactly because Go had such an history of language evolution to learn from since FORTRAN came to be in 1958, maybe some of the early decisions could have been done better, instead of Apple style, "we are not doing X, for years later, X is actually something we want".

g15jv2dp · on July 14, 2024

I guess people who'd like to get into go today can just go look elsewhere, then...

heavenlyblue · on July 14, 2024

This is why we need to keep reinventing languages to grow another generation of juniors in seniors because they are not intelligent enough to correctly use abstraction

rundev · on July 14, 2024

`yield` being a function that is passed into the iterator seems like suboptimal design to me. Questions like "What happens if I store `yield` somewhere and call it long after the loop ended?" and "What happens if I call `yield` from another thread during the loop?" naturally arise. None of this can happen with a `yield` keyword like in JavaScript or C#. So why did the Go-lang people go for this design?

Hendrikto · on July 18, 2024

> Questions like "What happens if I store `yield` somewhere and call it long after the loop ended?" and "What happens if I call `yield` from another thread during the loop?" naturally arise.

Nothing special. `yield` is just a normal function. Once you realize this, it actually is very easy to reason about. I just think the naming is confusing. I think about it as `body`.

bheadmaster · on July 14, 2024

> Questions like "What happens if I store `yield` somewhere and call it long after the loop ended?" and "What happens if I call `yield` from another thread during the loop?" naturally arise.

The fact that you can store `yield` somewhere allows for more flexible design of iterator functions, e.g. (written in a hurry as a proof of concept so will panic at the end): https://go.dev/play/p/QpVYmmC6g5b?v=gotip

Those hairy details may be hard to remember (or even decide), but they won't matter for most of the users - most users will just use `yield` in the simplest way, without storing them or calling them from another goroutine.

TheDong · on July 14, 2024

It is an explicit goal of the go team to minimize the number of keywords in the language. Simple languages have fewer keywords so go must have few keywords. https://go.dev/ref/spec#Keywords Look how simple that is.

This is why things like ‘close(channel)’ are magic builtin functions, not keywords (more complicated) or a method like ‘channel.Close’ (works with interfaces and consistent with files and such, so not simple).

foldr · on July 14, 2024

Languages where ‘yield’ is a keyword use a fundamentally different design (external va internal iteration). I don’t think it’s plausible that the Go team rejected this design because it would require another keyword. They presumably rejected it because of the additional complexity (you either need some form of coroutines or the compiler needs to convert the iterator code to a state machine).

pansa2 · on July 14, 2024

> It is an explicit goal of the go team to minimize the number of keywords in the language.

It's understandable - because unfortunately people judge languages by very shallow metrics. Several times I've seen people use "number of keywords" as a proxy for language complexity.

However, that's completely misguided. `static` in C++ (and, IMO, `for` in Go) demonstrate that overloading a keyword to mean multiple things is harder to understand than having a larger number of more meaningful keywords.

weinzierl · on July 14, 2024

I (still) find it confusing that in C++ the symbols to declare a pointer and a reference are the same as the ones for dereference and address-of.

That pointers declared with * in Go are more like references (&) and that there are no true pointers (I think) does not really help.

pjmlp · on July 14, 2024

Go has true pointers.

If you mean pointers arithmetic, that can be achieved with unsafe package.

weinzierl · on July 16, 2024

From my vantage, pointers without arithmetic are typically called references, as opposed to "true" pointers. I did not mean it in a derogatory way, both have their place and Go is a great language even (or maybe despite) without what I would call "true" pointers.

pjmlp · on July 16, 2024

From CS point of view, references are pointers that you can't get the underlying value behind them.

Being able to do arithmetic is orthogonal to that.

In fact check a language like D, C#, Swift, with references, pointers, and pointers with arithmetic.

trealira · on July 16, 2024

The fact that, in Go, you can have pointers to pointers, and reassign pointer variables like any other, would imply, IMO, that pointers are first-class values, and so they are true pointers, even without being able to pointer arithmetic with them.

dingnuts · on July 14, 2024

how is for overloaded? in other languages you can just do for (;;) { which is the same; while is a redundant keyword

the_gipsy · on July 14, 2024

That's cheating for dev-rel marketing. And it's contradictory, because many more keywords could be (magical or normal) functions, like in some other languages.

tapirl · on July 14, 2024

Correction:

    // Only care about the iteration count
    for range aContainer { ... }
    
    // Just the values
    for v := range myChannel { ... }

    // Indexes and values (or keys and values for a map)
    for i, v := range mySlice { ... }

pansa2 · on July 14, 2024

What's the rationale behind Go choosing internal iteration (the iterator calls a function for each value, like Ruby) over external (the iterator returns each value, like Python and C#)?

My understanding is that internal iteration makes it easier to write iterators (producers) but harder to write the consuming code. That's why Go needs to re-write the body of each `for` loop as a function body, including special handling for `break`, `return` etc.

External iteration OTOH makes it harder to write producers but easier to write consumers. Python and C# therefore allow external iterators to be written via coroutines/generators.

Wouldn't Go's goroutines make the coroutine approach to external iterators straightforward? Whereas the re-writing necessary for internal iterators seems convoluted?

neonsunset · on July 14, 2024

This looks incredibly clunky and counter-intuitive.

Compare that to yield return, .Select and .Where methods in C#, or Filter and Map in many popular languages - it's not a good look.

Or when writing manually, compare it to

    static IEnumerable<T> Filter<T>(
        IEnumerable<T> source, Func<T, bool> predicate)
    {
        foreach (var item in source)
        {
            if (predicate(item))
                yield return item;
        }
    }

Could also compare to how easy it is to use for extremely common patterns in general purpose code:

    var numbers = Enumerable.Range(0, 10);
    var even = numbers.Where(n => n % 2 is 0);
    var strings = even.Select(n => n.ToString());

doctor_eval · on July 14, 2024

Isn’t the only real difference that the yield function is being passed into the iterator instead of being a reserved word? I don’t think it’s clunky, although it took a few minutes for me to get it.

pansa2 · on July 14, 2024

In C#, `yield` is a kind of return, not a function call. The two approaches to iteration are quite different (external vs internal iteration).

pjmlp · on July 14, 2024

No, before C# got generators, some of the machinery had to be manually implemented, with yeld, the compiler generates the necessary implementation for IEnumerable.

meling · on July 14, 2024

I was hoping the blog author would have revealed some plans for supporting a new iteration API in dolt. The range over func API is particularly useful if you need to iterate and compute over something that doesn’t all fit in memory (as is necessary for a slice and map).

zachmu · on July 15, 2024

Say more? What are you using Dolt for that you would find this useful?

rochak · on July 14, 2024

The syntax just doesn’t sit right with me due to some reason. It gives me the same heebie jeebies as Python’s decorators. Not a fan of either. Maybe I need to get used to them.

pansa2 · on July 14, 2024

Defining a Go iterator is similar to defining a Python decorator that takes an argument. Both involve defining a function that returns another function, with the inner function having another function as its parameter.

This isn’t a particularly difficult concept, but manipulating functions in this way does feel unusual in such heavily-procedural languages. This has two downsides:

* Programmers who only use those languages are unfamiliar with the concept, and

* The languages’ syntaxes aren’t designed to make it particularly clear.

Hendrikto · on July 14, 2024

I think Python iterators are designed very cleanly.

They just take the function they decorate, and return another. It does not get simpler yet more flexible than this.

foldr · on July 14, 2024

There isn't any new syntax here.

tapirl · on July 14, 2024

`range aFunction` is kinda of new syntax, though I agree it is more a new semantic thing.

asmor · on July 14, 2024

This feels like it's undermining channels, the feature go really wants you to use in other places too (but people tend to still use mutexes). Channels aren't quite as lazy (if you use an unbuffered one you supply one element in advance), but they're close.

blixt · on July 14, 2024

Channels require a different goroutine to send values while also receiving them (which is how you’d have two loops communicating, essentially what you get from range funcs).

There’s nothing stopping you from doing this but it does mean you are introducing the requirement of thread safety in your code, in the case where the iterator is stateful.

I would argue anything that needs a range func beyond the simple functional things like filters is probably a stateful iterator (or generator if you’d like), and as such having range funcs is a great way to write code that doesn’t go wrong due to parallelism.

Now you could add two way communication to your channel iterator (or any other locking mechanism) for safety but honestly I think range funcs perfectly solve this use case, and have already used them to keep my code more readable and correct.

All this said, while I’m still a fan of Go and have used it regularly since 0.9 as well as contributed to the language, I will agree with the other comments that sometimes the language design bends over backward to be purist at the cost of having to add more footguns in user land.

tapirl · on July 14, 2024

> Channels require a different goroutine to send values while also receiving them ...

The requirement is a limit of the current design and implementation. Channels can be enhanced to avoid the requirement.

masklinn · on July 14, 2024

For this use case, not only do channels add an insane amount of overhead, they're also broken in all sorts of way e.g. there is no way to properly clean up resources around the iteration, and they add more opportunities for race conditions since the object under iteration has to be shared with the channel's producing goroutine.

mxey · on July 14, 2024

https://research.swtch.com/coro at the very end explains that channels are too slow.

Re mutexes: one of Rob Pike‘s Go Proverbs (https://go-proverbs.github.io/) was "channels orchestrate, mutexes serialize“. Both have their places.

thwarted · on July 14, 2024

Here's a rough implementation of iterators using channels that I was experimenting with. It needs some syntactic sugar to do proper cleanup of the go routine if the loop exits early; probably the iterator function needs to return a func that closes the channel called after the loop, and some error handling/closed channel checking.

https://gist.github.com/thwarted/a6e5e7ca5ce552311a7d5ece13d...

tapirl · on July 14, 2024

What is meaningfulness of the two `yield` functions? I mean why not expand/inline them.

To use channels as iterators efficiently, we need find a way to let the functions creating the channels return without creating new goroutines.

thwarted · on July 14, 2024

The yield functions only exist as syntactic sugar to make it look like iterators in other languages and to make it clear where the value emission point is (I had mentioned this in tweets and skeets when I was originally working on this, if didn't make it into the gist).

An unbuffered channel is really a scheduler abstraction. Consuming from an unbuffered channel blocks, the thread can enter and immediately begin executing the go routine that was blocking on producing. The go routine is acting like a closure around the channel state.

I had some further experiments interleaving these iterators, but didn't clean it up at the time before I had sufficiently convinced myself it was possible and I got distracted with other things.

tapirl · on July 14, 2024

The idea listens promising. It is more intuitive than the current design.

jakjak123 · on July 14, 2024

Channels complicate error handling quite a bit, so I tend to avoid them

tapirl · on July 14, 2024

Could you elaborate more?

jakjak123 · on July 14, 2024

If you have shipped some task to a channel, or is waiting for some work to complete on a channel, there is no native way to propagate the error that your task may have failed. Also if a error did happen during processing the task you put on the channel, the stacktrace suddenly is not the whole story anymore. Channels also has no way to make sure the context.Context is reasonably propagated

tapirl · on July 14, 2024

Is it possible to enhance channels to fit into the iterator need? I mean using them without always creating new goroutines.