Jez Higgins

Freelance software generalist
software created
extended or repaired


Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About

Saturday 09 February 2008 A filtering iterator in C++

Back in December I shipped myself over to Cambridge to do a technical presentation on the wonders of iteration. It went pretty well although I still think I'm not quite getting my real point over, possibly because I haven't fully worked out what my real point is. I know it's in there, I need to surface it better.

The code I present is in Java and Python, primarily for my convenience. I'd written the original talk using Java, and the examples I'd lifted were in Python. Further, most people can read Java and Python even if they don't use either of them, and you can generally fit a reasonably complete example on a single slide without an excess of punctation or other distractions. Consequently they're good from a pedagogic standpoint. The audience that evening was largely composed of guys who worked in C++, and I asserted throughout that the ideas I was presenting could be implemented in C++ without any particular difficulty. I was picked up on that by someone (who's name I unfortunately didn't catch) at the end, who argued that the typing would get in the way. Specifically, he thought that a C++ version of this Java iterator would be awkward.

public class FilterIterator implements Iterator
{
  public FilterIterator(Iterator iterator, Predicate predicate)
  {
    iter_ = iterator;
    pred_ = predicate;
    findNext();
  }

  public boolean hasNext()
  {
    return next_ != null;
  }

  public Object next()
  {
    Object current = next_;
    findNext();
    return current;
  }

  public void findNext()
  {
    next_ = null;
    while(iter_.hasNext() && next_ == null)
    {
      Object candidate = iter_.next();
      if(pred_.test(candidate))
        next_ = candidate;
    }
  }

  ...
}

Writing new iterators in Java is very easy, because all you have to do it fulfil the java.util.Iterator interface. With that done, you can throw your new iterator around and it'll drop right in thanks to that common interface. Iterators in C++ can be more awkward, it's true. There is no common base class, and iterators are typically tied to the type of whatever they iterate over. A std::vector<string>'s iterators are of type std::vector<string>::iterator, a std::deque<string>'s iterators are std::deque<string>::iterator, and the two are not interchangable, at runtime at least, despite both having essentially identical characteristics. Generally this isn't a problem but for the situations I describe in the presentation, using iterators to abstract data sources or to build a view of something, it appears to present something of an obstacle.

I continued to assert that the obstacle was, in fact, illusory, and that a little bit of scaffolding would take you a long way. He disagreed. There was a bit of back and forth around the room (including a suggestion that this was the Sapir-Whorf hypothesis in action), but I didn't win him over and short of starting to write code on the whiteboard it didn't look like I was going to. So we all went to the pub.

In the following few days I did write the code, and I think I would have brought him round in the end.

A naive C++ filtering iterator

template<typename iterator_type, typename predicate_type>
class filter_iterator
{
public:
  typedef typename iterator_type::value_type value_type;

  filter_iterator(const iterator_type& begin, const iterator_type& end, const predicate_type& pred):
    current_(begin), end_(end), pred_(pred)
  {
    while((current_ != end_) && (!pred_(*current_)))
      ++current_;
  } // filter_iterator

  value_type& operator*() const { return *current_; }
  value_type& operator->() const { return *current_; }

  filter_iterator& operator++() { advance(); return *this; }

  bool operator==(const filter_iterator& rhs) const { return current_ == rhs.current_; }
  bool operator!=(const filter_iterator& rhs) const { return !(operator==(rhs)); }

private:
  void advance()
  {
    do
    {
      ++current_;
    }
    while((current_ != end_) && (!pred_(*current_)));
  } // advance

  iterator_type current_;
  iterator_type end_;
  predicate_type pred_;
};

There are some obvious differences from the Java version, primarily due to differences in language idiom. Java iterators know the range they traverse, while C++ uses a pair of iterators to denote a range.

This iterator does indeed filter, and its use is straightforward :

class Even
{
public:
  bool operator()(int& i) { return i%2 == 0; }
};

...

std::vector<int> vec;

... populate the vector ...

filter_iterator<std::vector<int>::iterator, Even> fb(vec.begin(), vec.end(), Even());
filter_iterator<std::vector<int>::iterator, Even> fe(vec.end(), vec.end(), Even());

for( ; fb != fe; ++fb)
{
  std::cout << *fb << std::endl;

  ... do something else with *fb ...
} // for ...

Job done, right?

Well, no. It works, but the usage is cumbersome at best. The type of the instantiated filter_iterator is related not only both to the type of iterator it wraps, and also the type of the predicate. Not only is filter_iterator<std::vector<int>::iterator, Even> not substitutable for a std::vector<int>::iterator, it's not even a substitute for filter_iterator<std::vector<int>::iterator, Odd>. It's not exactly adding flexibility, is it?.

But what to do?

Simplification through type erasure

Huge chunks of the C++ Standard Library are built around generic programming techniques, most notably the containers, iterators, and algorithms of the STL. The approaches I describe in the talk are, quite plainly, object-oriented techniques. The proliferation of types in the code above arise from the tension between the generic and object-oriented programming.

Type erasure gives us a way to relieve that tension. In our filter_iterator, we don't actually care about the actual type of iterator being wrapped. It isn't exposed through the public interface, and so our client code doesn't care about it either. In the public interface, we only care that it returns an int (or whatever) when dereferenced. Internally, the implementation only requires that wrapped iterator can be advanced by calling operator++ and compared for equality. If only there was a common base class along the lines of

class iterator_base
{ virtual int& get(); // aka operator*() virtual void advance(); // aka operator++() virtual bool equal(const iterator_base& rhs); // aka operator==() }

If you squint a bit, looks almost like the java.util.Iterator, doesn't it? Perhaps Josh Bloch was on to something after all. Anyway ... We can effectively introduce such a base class through the use of an adaptor or wrapper. If we using a template class with a non-template base class, we create a type specific wrapper to swaddle around the iterator, which we can then manipulate through the base class. Because the base class isn't parameterised, it doesn't expose anything about the wrapped iterator. By the classical computer science technique of an extra layer of indirection, we've slipped in a common base class down the side of our existing iterators.

template<typename value_type>
class iterator_holder
{ public: template<typename iterator_type> iterator_holder(const iterator_type& iter) : iter_(new holder<iterator_type>(iter)) { } ~iterator_holder() { delete iter_; } ... value_type& get() const { return iter_->get(); } void advance() { return iter_->advance(); } private: class holder_base { public: virtual ~holder_base() { } virtual holder_base* clone() const = 0; bool compare(holder_base* rhs) { return (type() == rhs->type()) && (up_compare(rhs)); } // compare virtual const std::type_info& type() const = 0; virtual void advance() = 0; virtual value_type& get() const = 0; protected: virtual bool up_compare(holder_base* rhs) const = 0; }; // class holder_base template<typename iterator_type> class holder : public holder_base { public: holder(const iterator_type& iter) : iter_(iter) { } virtual holder_base* clone() const { return new holder(iter_); } virtual const std::type_info& type() const { return typeid(iter_); } // type virtual void advance() { ++iter_; } virtual value_type& get() const { return *iter_; } protected: virtual bool up_compare(holder_base* rhs) const { holder* r = dynamic_cast<holder*>(rhs); return iter_ == r->iter_; } // up_compare private: iterator_type iter_; }; // class holder holder_base* iter_; }; // class iterator_holder

That's enough code, and I won't dwell on the details. The main thing to note that outmost class iterator_holder is parameterised on the value type, on what the held iterator points to, rather than the type of the iterator itself. The holder and holder_base are the template class with non-template base described above.

We can rewrite filter_iterator to use iterator_holder, and our example becomes

filter_iterator<int, Even> fb(vec.begin(), vec.end(), Even());
filter_iterator<int, Even> fe(vec.end(), vec.end(), Even());

for( ; fb != fe; ++fb)
{
  std::cout << *fb << std::endl;

  ... do something else with *fb ...
} // for ...

But look at this. We can also write this

std::deque<int> deq;
...

filter_iterator<int, Even> fb(deq.begin(), deq.end(), Even());
filter_iterator<int, Even> fe(deq.end(), deq.end(), Even());

for( ; fb != fe; ++fb)
{
  std::cout << *fb << std::endl;

  ... do something else with *fb ...
} // for ...

Exchanging the vector for a deque doesn't require any other change. That's rather a useful result.

One down, one to go

We can perform a similar exercise with the predicate type. So long as it has some function that takes a value_type and returns bool we really don't care about the precise wheres and why fors. Writing a similar predicate_holder, the filter_iterator reduces to

template<typename value_type>
class filter_iterator
{
public:
  template<typename iterator_type, typename predicate_type>
  filter_iterator(const iterator_type& begin,
                  const iterator_type& end,
                  const predicate_type& pred):
    current_(begin), end_(end), pred_(pred)
  {
    while((current_ != end_) && (!pred_.test(current_.get())))
      current_.advance();
  } // filter_iterator

  filter_iterator(const filter_iterator& rhs) :
    current_(rhs.current_), end_(rhs.end_), pred_(rhs.pred_) { }

  filter_iterator& operator=(const filter_iterator& rhs)
  {
    current_ = rhs.current_;
    end_ = rhs.end_;
    pred_ = rhs.pred_;
    return *this;
  } // operator=

  bool operator==(const filter_iterator& rhs) const { return current_ == rhs.current_; }
  bool operator!=(const filter_iterator& rhs) const { return !(operator==(rhs)); }

  value_type& operator*() const { return current_.get(); }
  value_type& operator->() const { return current_.get(); }

  filter_iterator& operator++() { advance(); return *this; }
  filter_iterator operator++(int)
  {
    filter_iterator c(*this);
    advance();
    return c;
  } // operator++

private:
  void advance()
  {
    do
    {
      current_.advance();
    }
    while((current_ != end_) && (!pred_.test(current_.get())));
  } // advance

  iterator_holder<value_type> current_;
  iterator_holder<value_type> end_;
  predicate_holder<value_type> pred_;
}; // class filter_iterator

It doesn't look hugely different from the first version, but the type signature is much more straightforward. Not only is it much easier to work with, it's significantly more flexible.

filter_iterator<int> fb(vec.begin(), vec.end(), Even());
filter_iterator<int> fe(vec.end(), vec.end(), Even());

for( ; fb != fe; ++fb)
{
  ... do something with *fb ...
} // for ...

// and now the odds
fb = filter_iterator<int>(vec.begin(), vec.end(), Odd());
fe = filter_iterator<int>(vec.end(), vec.end(), Odd());

for( ; fb != fe; ++fb)
{
  ... do something with *fb ...
} // for ...

Victory is mine, I gloated to myself before realising I'd rather exceeded my brief. The iterator_holder class, far from being an implementation detail for the filter_iterator, is a useful piece of kit in its own right, providing a runtime polymorphic type-safe wrapper for arbitrary iterators. It's the thing I should have been aiming for in the first place. Everything else - filtering iterators, transformers, or whatever - could be built on and with it. Ah well - maybe next time.

Further reading

External Polymorphism - An Object Structural Pattern for Transparently Extending C++ Concrete Data Types. What we've just done has a name.

Googling around I found A Fistful Of Idioms - Giving STL Iterators a Base Class, an article from 2000 in which my chum Steve Love develops an any_iterator wrapper class. I must have read it at the time, but can't explicitly recall doing so. Towards the end of last year, Thomas Becker published On the Tension Between Object-Oriented and Generic Programming in C++ and What Type Erasure Can Do About It in which he developed any_iterator, a type-safe, heterogeneous C++ iterator. The Adobe Source Libraries also have an any_iterator. All of those are rather better developed that what we stumbled over writing filter_iterator. Perhaps I should pre-emptively google next time?

Boost.Iterator includes a filter_iterator. It also includes several other useful iterator adaptors, including a transform_iterator and a zip_iterator. Matthew Wilson describes a bidirectional filter_iterator in an extract from his Extended STL book. Here's another. All are essentially equivalent to the naive version presented above, if rather more complete and, in the Boost case, rather more formally specified.

On a slightly different track, Mr Edd has an opaque_iterator, designed to reduce compile-time dependencies. I do like the look of that - it's jolly clever.

Kevlin Henney's article in the August 2000 C++ Report, Valued Conversion, covers the same techniques describing a class which can hold any value. That code grew up to become Boost.Any.


Tagged c++, code, and iterators


Jez Higgins

Freelance software generalist
software created
extended or repaired

Older posts are available in the archive or through tags.

Feed

Follow me on Twitter
My code on GitHub

Contact
About