C++ iterator wrapping a stream not 1-1

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

Sometimes we want to write an iterator that consumes items from some underlying iterator but produces its own items slower than the items it consumes, like this:

ColonSep items("aa:foo::x");
// Prints "aa, foo, , x"
for(auto s : items)
{
    std::cout << s << ", ";
}

When we pass a 9-character string (i.e. an iterator that yields 9 items) to ColonSep, above, we only repeat 4 times in our loop, because ColonSep provides an iterable range that yields one value for each whole word it finds in the underlying iterator of 9 characters.

To do something like this I'd recommend consuming the items of the underlying iterator early, so it is ready when requested with operator*. We also need our iterator class to hold on to the end of the underlying iterator as well as the current position.

First we need a small container to hold the next item we will provide:

struct maybestring
{
    std::string value_;
    bool at_end_;

    explicit maybestring(const std::string value)
    : value_(value)
    , at_end_(false)
    {}

    explicit maybestring()
    : value_("--error-past-end--")
    , at_end_(true)
    {}
};

A maybestring either holds the next item we will provide, or at_end_ is true, meaning we have reached the end of the underlying iterator and we will report that we are at the end ourself when asked.

Like the simpler iterators we have looked at, we still need a little container to return from the postincrement operator:

class stringholder
{
    const std::string value_;
public:
    stringholder(const std::string value) : value_(value) {}
    std::string operator*() const { return value_; }
};

Now we are ready to write our iterator class, which always has the next value ready in its next_ member, and holds on to the current and end positions of the underlying iterator in wrapped_ and wrapped_end_:

class myit
{
private:
    typedef std::string::const_iterator wrapped_t;
    wrapped_t wrapped_;
    wrapped_t wrapped_end_;
    maybestring next_;

The constructor holds on the underlying iterator pointers, and immediately fills next_ with the next value by calling next_item passing in true to indicate that this is the first item:

public:
    myit(wrapped_t wrapped, wrapped_t wrapped_end)
    : wrapped_(wrapped)
    , wrapped_end_(wrapped_end)
    , next_(next_item(true))
    {
    }

    // Previously provided by std::iterator
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

next_item looks like this:

private:
    maybestring next_item(bool first_time)
    {
        if (wrapped_ == wrapped_end_)
        {
            return maybestring();  // We are at the end
        }
        else
        {
            if (!first_time)
            {
                ++wrapped_;
            }
            return read_item();
        }
    }

next_item recognises whether we've reached the end of the underlying iterator and saves the empty maybstring if so. Otherwise, it skips forward once (unless we are on the first element) and then calls read_item:

    maybestring read_item()
    {
        std::string ret = "";
        for (; wrapped_ != wrapped_end_; ++wrapped_)
        {
            char c = *wrapped_;
            if (c == ':')
            {
                break;
            }
            ret += c;
        }
        return maybestring(ret);
    }

read_item implements the real logic of looping through the underlying iterator and combining those values together to create the next item to provide.

The hard part of the iterator class is done, leaving only the more normal functions we must provide:

public:
    value_type operator*() const
    {
        assert(!next_.at_end_);
        return next_.value_;
    }

    bool operator==(const myit& other) const
    {
        // We only care about whether we are at the end
        return next_.at_end_ == other.next_.at_end_;
    }

    bool operator!=(const myit& other) const { return !(*this == other); }

    stringholder operator++(int)
    {
        assert(!next_.at_end_);
        stringholder ret(next_.value_);
        next_ = next_item(false);
        return ret;
    }

    myit& operator++()
    {
        assert(!next_.at_end_);
        next_ = next_item(false);
        return *this;
    }
}

Note that operator== is only concerned with whether or not we are an end iterator or not. Nothing else matters for providing correct iteration.

Our final bit of bookkeeping is the range class that allows our new iterator to be used in a for loop:

class ColonSep
{
private:
    const std::string str_;
public:
    ColonSep(const std::string str) : str_(str) {}
    myit begin() { return myit(std::begin(str_), std::end(str_)); }
    myit end()   { return myit(std::end(str_),   std::end(str_)); }
};

A lot of the code above is needed for all code that does this kind of job. Next time we'll look at how to use templates to make it useable in the general case.

C++ iterator wrapper/adaptor example

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

If you want to wrap an iterable range with another that transforms the underlying iterators in some way and allows looping or constructing other objects:

for (auto ch : Upper("abcdef"))
{
    // Prints "ABCDEF"
    std::cout << ch;
}
Upper up(std::string("fOo"));
std::string newfoo(std::begin(up), std::end(up));
assert(newfoo == "FOO");

then, similar to an ordinary iterable range you will need to make a range class and a iterator class:

class myit
{
private:
    std::string::const_iterator wrapped_;
    class charholder
    {
        const char value_;
    public:
        charholder(const char value) : value_(value) {}
        char operator*() const { return value_; }
    };
public:
    // Previously provided by std::iterator
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

    explicit myit(std::string::const_iterator wrapped) : wrapped_(wrapped) {}
    value_type operator*() const { return std::toupper(*wrapped_); }
    bool operator==(const myit& other) const { return wrapped_ == other.wrapped_; }
    bool operator!=(const myit& other) const { return !(*this == other); }
    charholder operator++(int)
    {
        charholder ret(std::toupper(*wrapped_));
        ++wrapped_;
        return ret;
    }
    myit& operator++()
    {
        ++wrapped_;
        return *this;
    }
};


class Upper
{
private:
    const std::string str_;
public:
    Upper(const std::string str) : str_(str) {}
    myit begin() { return myit(std::begin(str_)); }
    myit end()   { return myit(std::end(str_)); }
};

Notice the need to call the transforming/adapting function
std::toupper in two places.

Update: std::iterator is deprecated in C++17, so removed.

C++ iterator example (and an iterable range)

Series: Iterator, Iterator Wrapper, Non-1-1 Wrapper

To make your own iterable range in C++ that you can loop over or make a vector out of (mine is called Numbers):

// Prints:
// 3,4,
for (auto n : Numbers(3, 5))
{
    std::cout << n << ",";
}

// Fills vec with 7, 8, 9
Numbers nums(7, 10);
std::vector vec{std::begin(nums), std::end(nums)};

you need to write a class that is an Input Iterator, and provide an instance as the begin and end of your range:

class Numbers
{
private:
    const int start_;
    const int end_;
public:
    Numbers(int start, int end) : start_(start) , end_(end) {}
    myit begin() { return myit(start_); }
    myit end()   { return myit(end_); }
};

The hard bit is the Input Iterator:

#include <iterator>

class myit
{
private:
    int value_;
    class intholder
    {
        int value_;
    public:
        intholder(int value): value_(value) {}
        int operator*() { return value_; }
    };
public:
    // Previously provided by std::iterator - see update below
    typedef int                     value_type;
    typedef std::ptrdiff_t          difference_type;
    typedef int*                    pointer;
    typedef int&                    reference;
    typedef std::input_iterator_tag iterator_category;

    explicit myit(int value) : value_(value) {}
    int operator*() const { return value_; }
    bool operator==(const myit& other) const { return value_ == other.value_; }
    bool operator!=(const myit& other) const { return !(*this == other); }
    intholder operator++(int)
    {
        intholder ret(value_);
        ++*this;
        return ret;
    }
    myit& operator++()
    {
        ++value_;
        return *this;
    }
};

Update: thanks to Anthony Williams for the correction on the postincrement operator – see Generating Sequences for more. Note the need to return a fiddly object that holds the answer we will give when the return value is dereferenced.

Using std::iterator as a base class actually only gives you some typedefs for free, but it’s worth it to get the standard names, and to be clear what we are trying to do. Update: std::iterator is deprecated in C++17 – just add the 5 typedefs yourself.

I suspect I might need to add some extra &s to make this good C++11 style?

Absolute Truth in programming languages

Is enforcing truthfulness the opposite of beauty?

Can 2 + 2 = 5?

Improvements, corrections, further contributions are welcome.

$ cat five.cpp 
#include <iostream>
int operator+( int x, int y ) { return 5; }
int main() {
    std::cout << 2 + 2 << std::endl;
}
$ g++ five.cpp 
five.cpp:2:29: error: ‘int operator+(int, int)’ must have an argument of class or enumerated type
$ python
>>> int.__add__ = lambda y: 5
TypeError: can't set attributes of built-in/extension type 'int'
$ cat five.hs
import Prelude hiding ((+))
x + y = 5
main = print ( 2 + 2 )
$ ghc five.hs && ./five
5
$ cat five.rb
class Fixnum
    def +(y)
        5
    end
end
print 2 + 2
$ ruby five.rb
5
$ mzscheme 
> (define (+ x y) 5)
> (+ 2 2)
5