Chris Riesbeck
Last updated: July 3, 2009
Iterators are central to the generality and efficiency of the generic algorithms in the STL. All the STL containers (but not the adapters) define
(Reverse iterators, returned by rbegin() and rend(), are beyond the scope of this discussion.)
Containers that don't define the above are second-class citizens. They can't be used with the generic algorithms. Defining iterator types and methods for your containers, whether they're generic templated containers or not, is often a very useful thing to do.
What follows is a simplified introduction in how to define iterators for your own containers. I wish it was shorter, but C++ requires a number of little details to be taken care of, even when ignoring many important factors.
Fortunately, the most common way to define the iterator types and begin/end methods is also the simplest. If your container uses an STL container in an internal data member to hold its elements, all you need to do is delegate the types and methods to the underlying container.
For example, suppose you have a class CourseList that is a list of the students taking a course, and internally you use a vector of Student to store the course list. Then all you need to do is:
class CourseList { private: typedef vector<Student> StudentList; StudentList students; public: typedef StudentList::iterator iterator; typedef StudentList::const_iterator const_iterator; iterator begin() { return students.begin(); } iterator end() { return students.end(); } ... };
In other words, all we do is define
This is a common programming pattern called delegation, because we delegate the work to another class.
The StudentList typedef is optional, but it makes it easy to switch containers with just one line change.
C pointers are legal iterators, so if your internal container is a simple C array, then all you need to do is return the pointers. Thus if we used an array for our StudentList, the code would look like this:
class CourseList { private: typedef Student StudentList[100]; StudentList students; public: typedef Student * iterator; typedef const Student * const_iterator; iterator begin() { return &students[0]; } iterator end() { return &students[100]; } ... };
These two cases, iterators from internal STL containers, and iterators from pointers, cover 90% of the cases you need to worry about.
Sometimes you want a variation on an iterator for a normal container, e.g., you want an iterator that counts or one that checks to make sure it's within a legal range.
For example, Stroustrup's The C++ Programming Language, 3rd ed., gives an example of a range-checking iterator. Musser and Saini's STL Reference and Tutorial Guide gives an example of a counting iterator.
Defining this kind of iterator uses delegation a lot. Basically, the constructor for the new iterator takes some kind of existing iterator, plus whatever other kind of information it needs, and saves the existing iterator in a private data member. Then operator++(), operator*() and so on are defined mostly as calls to the same operators on the stored iterator, with whatever extra functionality you need.
Unfortunately, while conceptually simple, this kind of iterator definition turns out to require a lot of special C++ incantations, because it can be hard for the compiler to know when something's a type name. If you're trying to do this, definitely get a copy of Stroustrup or some other reference with a detailed example.
Our last case of when you want to define an iterator is when you have a new kind of container and it needs an iterator.
The STL does not use class hierarchies and inheritance. Therefore, when defining an iterator, there is no iterator superclass to start from. Something qualifies as an iterator as long as it defines some or all of the operators described on the iterators page.
For this example, we're only going to describe and define the most basic operations.
For our example container, we're going to implement a ring queue.
A ring queue is a finite-sized queue that never gets full and never grows past a certain size. Instead, new elements replace the oldest elements. A ring queue lets you keep track of the most recent values in some stream of numbers, automatically throwing away older ones as you add new ones. For example, you might use a ring queue to get the average of the N most recently processed numbers.
You create a ring queue like this:
RingQueue< int, 5 > ringQueue;
This constructs an empty 5-element ring queue.
Like a regular queue, you can push on the back and pop off the front but unlike a regular queue:
A common way to implement ring queues is as a simple array with two numbers:
Initially myBegin is 0, the beginning of the array. The end of the ring queue -- we'll call it myEnd but it's something we'll calculate as needed -- is given by the simple formula
myEnd = (myBegin + mySize) % N
We can define the following accessors:
The rules for changing myBegin and mySize (and hence myEnd) are simple but slightly surprising:
The table shows what happens when we add and remove some elements to a 4 element queue.
Action | myBuffer | myBegin | mySize | myEnd |
---|---|---|---|---|
[ ][ ][ ][ ] | 0 | 0 | 0 | |
push 1 | [1][ ][ ][ ] | 0 | 1 | 1 |
push 2 | [1][2][ ][ ] | 0 | 2 | 2 |
pop => 1 | [1][2][ ][ ] | 1 | 1 | 2 |
push 3 | [1][2][3][ ] | 1 | 2 | 3 |
push 4 | [1][2][3][4] | 1 | 3 | 0 |
push 5 | [5][2][3][4] | 1 | 4 | 1 |
push 6 | [5][6][3][4] | 2 | 4 | 2 |
pop => 3 | [5][6][3][4] | 3 | 3 | 2 |
Note that no data is ever shifted in the array. A push_back() when the queue is full simply replaces the oldest element with the newest. That's what makes this a ring queue.
Given our definitions of front() and back() above, you might think we could also define
You can do this and it will compile, but then you try the following code:
RingQueue<int, 4> rq; for ( int i = 0; i < 10 ++i ) rq.push_back( i * i ); cout << "There are " << rq.size() << " elements." << endl; cout << "Here they are:" << endl; copy( rq.begin(), rq.end(), ostream_iterator<int>( cout, "\n" ) ); cout << "Done" << endl;
This produces the following output:
There are 4 elements. Here they are: Done
What happened? Clearly the ring queue thinks it has the full 5 elements, but copy() doesn't seem to find them.
Look in the table above at what happens when the ring queue is full. When the queue is full, i.e., mySize is equal to 4, then myBegin is equal to myEnd! That's because when mySize is N, the following equation is true:
myEnd = (myBegin + mySize) % N = myBegin
That means that the end of the queue is equal to the beginning! Therefore, any loop that starts with
while ( rq.begin() != rq.end() ) ...
on a full ring queue will stop immediately, because it will look like the queue were empty.
Therefore, we need some other way to indicate where we are in the queue. An obvious solution is keep a simple offset from myBegin:
Our ring queue iterator therefore will need two things:
Each of the operators we need to define becomes simple, at least conceptually:
Conceptually, that's that. C++ being C++, there's a lot of little details that need to be taken care of, to make the compiler happy, but the basic ideas are what have just been described.
Container and iterator classes typically need to be closely linked. In particular
This kind of mutual referencing implies that we need to define the container before the iterator and vice versa. Furthermore the iterator typically needs access to private container methods and data members in order to do its job.
For this reason, the typical pattern for defining containers and their iterators is this:
Therefore, we have this code
// forward declare the iterator template < class T, int N > class RingIter; // define the container, make the iterator a friend template < class T, int N > class RingQueue { friend class RingIter< T, N >; ... }; // define the iterator template < class T, int N > class RingIter { ... };
Notice that the ring queue and its iterator have the same template parameters. This is typical.
The int N template argument used above is a special feature of C++ that lets us write
RingQueue< int, 5 > ringQueue;
to make a 5-element ring queue of integers. The N parameter is "hard-wired" to 5 at compile-time, just like the T type parameter is hard-wired to int. This lets us declare the array in the ring queue at compile-time instead of dynamically allocating it. Of course, this also means you can't make a ring queue of some variable size N.
Some of the STL functions, e.g., back_inserter(), need to know a few key pieces of information about a container in order to construct iterators. This information is communicated by a standardized set of typedef's at the start of the container. For example, a container should define value_type and iterator so that other functions can tell what kind of thing this container can hold and what kind of iterator it uses.
Fortunately, these type definitions are all pretty simple and usually identical from container to container. For our ring queue, we need
typedef RingIter<T, N> iterator; typedef ptrdiff_t difference_type; typedef size_t size_type; typedef T value_type; typedef T * pointer; typedef T & reference;
The only thing specific to our ring queue is the iterator type definition.
The main functions in a container that return iterators are begin() and end() (we're ignoring reverse and constant iterators here).
Typically, these simply call the constructor for the iterator class. There are usually at least two arguments:
In our case, the something else is the offset, so our definitions are simply
iterator begin() { return iterator( *this, 0 ); } iterator end() { return iterator( *this, mySize ); }
Notice that this is both readable and fairly safe from minor implementation changes, because we used the iterator typename instead of RingIter<T, N>.
Constructing the iterator is relatively straightforward except for one detail. We have to make sure that the iterator contains a reference to the actual container, not a copy. In C++, it's very easy to inadvertently get a copy. If the constructor was defined with
RingIter( RingQueue rq, int size ) { ... }
we'd get a copy of the ring queue because rq is passed by value.
We can fix this with
RingIter( RingQueue & rq;, int size ) { ... }
but suppose the whole class looks like this:
template < class T, int N > class RingIter { private: RingQueue<T, N> myRingQueue; int mySize; public: RingIter( RingQueue & rq, int size ) { myRingQueue = rq; ...
In this case, we will still get a copy because myRingQueue = rq will make a copy of rq.
Aha, we say, let's make myRingQueue a reference variable, i.e.,
template < class T, int N > class RingIter { private: RingQueue<T, N> & myRingQueue; int mySize; public: RingIter( RingQueue & rq, int size ) { myRingQueue = rq; ...
This won't compile. Reference variables must be initialized to the thing they reference. E.g.,
int x = 10, &y = x; // fine, y references x y = 4; // sets x to 4 int a = 10, &b; // illegal, b has nothing to reference
Fortunately, there is a solution. There are two places where variables can be initialized:
Thus, the following code works fine:
template < class T, int N > class RingIter { private: RingQueue<T, N> & myRingQueue; int mySize; public: RingIter( RingQueue & rq, int size ) : myRingQueue( rq ), mySize ( size ) {}
Initialization clauses are not assignment. They're initializations and work just like initializations in declarations.
The above code makes myRingQueue a reference to whatever rq is. Since rq is in turn a reference to some container, myRingQueue is a reference to the container. Anything we do with or to myRingQueue is applied directly to the original container, not a copy.
The inequality operator is usually trivial. If your iterator class just contains pointers and/or integer indices, then the default equality operator that C++ defines will work just fine.
We also need to define
The code for both is straightforward.
Consider this simple piece of C++ code using an iterator:
*dest = *source;
In this statement,
*dest is said to return an l-value. The "l" stands for "left" as in "left-hand sides of an assignment statement." An l-value is something that can be used to store a value.
What all this means is that when we define operator*() for an iterator class, we have to make sure that it returns a reference to a location. As long as we do that, the right thing will happen no matter which side of the assignment statement the dereferencing occurs.
In our code, all we need to do is return the array reference, but we have to make sure that the return type of operator*() is a reference.
T & operator*() { return myRingQueue[ myIncrement ]; }
We defined operator[] in RingQueue to make the above code simple.
Warning: watch your ampersands! The following won't compile:
T & operator*() { return & myRingQueue[ myIncrement ]; }The first ampersand says "return the address of the value specified by any return statements. The second ampersand says "return the address of myRingQueue[ myIncrement ]." So we end up trying to return the address of the address of myRingQueue[ myIncrement ], which the compiler will correctly say is not the address of an object of type T.
The increment operator in C++ can be called in two ways:
The prefix case is simple. We just increment the iterator's internal data members appropriately and return the modified iterator. In our case, that means
iterator & operator++() { ++myOffset; return *this; }
The return *this; is standard in iterator definitions.
The postfix case is a little more confusing, for two reasons:
C++ has a kludge for the first problem. When operator++() is used postfix, the compiler generates a call with a dummy integer. This integer has no useful value. It's simply there so that you can define operator++( int ) to handle the postfix case, and keep it separate from the code for the prefix case.
To handle the second problem, we need to create a copy of the iterator and return it.
iterator operator++ ( int ) { RingIter<T, N> clone( *this ); ++myOffset; return clone; }
Warning: Hackers will notice that two copies actually get created: one when clone is built, and another when clone is returned by value. Do not try to fix this by returning a reference to the clone! This will crash your code, because it will return the address of an object that doesn't exist after this code exits.
For our purposes in this class, the above should suffice for defining iterators for new containers. There are however many other details, including:
typedef RingIter<T, N> iterator; typedef RingCIter<T, N> const_iterator; ... iterator begin() { return iterator( *this, 0 ); } iterator begin() const { return const_iterator( *this, 0 ); }
For this and other issues in defining iterators, see an advanced book on C++ programming, such as Stroustrup's The C++ Programming Language, 3rd ed.
Comments? Let me know!