Zip it

Zip traversables with an index

I have bumped my head into this a bit too often. Time to solve it for once and for all.

Zipping with an index

There are times where you want to traverse a collection, and do something with the value and the index. This is is the Java version of it:

ArrayList<String> lst = ...;
for (i = 0; i <  lst.size(); i++) {
  fn(lst.get(i), i);
}

Scala has something better for it: collections have a zipWithIndex operation. It would make your code look like this:

val  lst = List(“a”, “b”, ...)
for ((value, index) <- lst.zipWithIndex) fn(value, index)

Just to make sure you understand what it does, it turns List(“a”, “b”, ...) into

List((“a”, 0), (“b”, 1), ...)

Not all collections are equal

Clearly, not all collection share the same properties, so you cannot implement all operations on all collections, or at least not in an efficient way. I get that. But I also use Traversable a lot. And it turns out Traversable does not have a zipWithIndex operation. What!?

The topic has come up often on StackOverflow and the mailing lists, like here. One of the arguments here is that:

... traversable does not guarantee the order in which the elements will be visited

I guess what they mean to say is that Traversable does not guarantee that the order is constant. There obviously is an order. There is just no guarantee that the order will be the same every time you visit the elements of the collection. Iterator has an easy way out: it guarantees that the order is the same for every time you traverse the collection, as long as you stay don’t visit it more than once; it simply doesn’t support traversing it multiple times.

However, Iterable does allow you to traverse the elements in the collections a number of times, and I haven’t been able to find anything suggesting that the order is guaranteed to be the same with every attempt. In fact, I can easily implement an Iterable that traverses the collection in a completely different order with every attempt. And, behold, Iterable does have a zipWithIndex operation.

An easy workaround?

So, Traversable doesn’t have zipWithIndex. Isn’t there an easy way to turn the Traversable into something that implements zipWithIndex, such as Iterator?

Sure there is. You can just do this:

traversable.toIterator.zipWithIndex

Easy enough, right? Until you find out there is a not so desirable side effect. Calling toIterator does turn it into an Iterator. In fact, it’s a very specific one...

def toIterator: Iterator[A] = toStream.iterator

This little line here has a HUGE impact. I use Traversable a lot for reading data from a file. I want to process these files in a streaming way, and not load everything in memory. However, that’s exactly what is happening here. So in trying to solve a little problem, by turning it into an Iterator I introduced a much bigger problem.

Restoring the balance

There is a way out of here. This version might be a bit too simple for some people’s taste, but it works fine in the situations where I need it:

case class ExtendedTraversable[T](traversable: Traversable[T]) {
  def zipWithIndex() = new Traversable[(T, Int)] {
    def foreach[U](f: ((T, Int)) => U) {
      var pos = 0
      traversable.foreach {
        t =>
          f(t, pos)
          pos += 1
      }
    }
  }
}

implicit def traversableToExtendedTraversable[T](traversable: Traversable[T]):
  ExtendedTraversable[T] = ExtendedTraversable(traversable)

With this, I can call zipWithIndex on all Traversables. The Traversable getting returned hasn’t actually visited all elements of the underlying collection just yet. It will only start producing the pairs once another method is getting called on it.

Now, obviously, the Traversable interface does not guarantee a constant order . As a consequence, you cannot expect that the version that zips the elements with an index will always result in the same pairs. There might be situations in which it won’t. But I would expect that you normally know if the input collection has a fixed order. And if the input collection has a fixed order, then all elements will always have the same index associated with it.

(That, by the way, also means that if the input collection is a parallel Traversable, this solution won’t work. But if the input collection is not parallel, then making the zipped with index version a parallel collection will guarantee that the same index is always associated to the same element.)

Conclusion

I see no real reason why Traversable couldn’t have a zipWithIndex method.

For processing lines of an input file, Traversable is a much better interface than Iterator. Scala’s Source has a getLines that returns an Iterator, but neither Source nor Iterator are reusable, and for Source, you always need to make sure to close it. Traversable is much better since (1) resource management (opening and closing the file) can be done inside the class, and (2) you can reuse it as many times as you want.

If Traversable is used for traversing the lines inside a file, not having zipWithIndex is really unfortunate. You would typically use it for error reporting, but now you can’t.

So, I vote for having it in. Until it’s in, I will continue to use the extension outlined above.