Iterators for Linked Lists

Background reading

Before reading the rest of these notes, you should read sections 3.1, 3.2, and 3.7 in the text. Those sections cover the basics of linked lists and also introduce you to the standard template library list class, which is the STL's implementation of a doubly linked list.

Improving on the author's classes

The singly and doubly linked list classes the author introduces in chapter 3 illustrate the basic ideas behind those classes. However, there are several problems with the author's implementation.

Both classes include methods for adding and removing items from the ends of the list, but lack general methods for inserting and removing items at arbitrary locations in the list. Since this capability is one of the major selling points for the linked list structure this is an important oversight.
The author has not provided iterator classes to go with the linked list classes. Iterators are an important component of modern C++ programming, and really should be included with any container class.
The author's classes do not follow the naming conventions used for most STL container classes.

In these notes I am going to fix all of these problems for the singly linked list class by almost completely rewriting the class. At the end of these notes you will find a programming assignment that asks you to rewrite the author's doubly linked list class in a similar way.

Getting started - the iterator

Since the major new element I am going to introduce in my singly linked list class is an iterator, we need to begin by thinking about how such an iterator would work.

For our first guess we place an iterator class inside the list class. The iterator is implemented as a pointer to a node, and contains operator overloads for the four usual iterator operations of dereference, increment, comparison, and assignment.

Almost as soon as we start to write the code for this class, we will discover that the iterator class is seriously broken. The main problem is that we will eventually also want to implement a method

iterator insert(iterator position,const T& value)

in the list class that can be used to insert new data items at arbitrary locations in the list. Given an iterator position, insert is supposed to place a new node containing the data value before the node that position points to. The problem we are going to run into in the course of making that insertion is that we will need a pointer to the node before the node that position points to, so we can hook that previous node to the new node we are going to create. Since we are dealing with a singly linked list class, there is no easy way to obtain that pointer.

Similarly, we would also like to implement an erase() method in our list class:

iterator erase(iterator position)

This method takes as its parameter an interator that points to a node that we would like to erase. In the course of removing that node we will need to make the node before the targeted node point to the node after the node we are removing. Since there is no easy way to the obtain a pointer to the node before the node that position points to, we are stuck once again.

A better approach

We can solve the problems with both insert and erase by doing something slightly odd with our iterator class. Instead of making the iterator class contain a pointer to a node, we make the iterator point to the node in front of the node we want to point to. This immediately solves the problem with insert. Assuming that the iterator contains a pointer named nodePtr, this is code for the insert method that is both simple and correct:

iterator insert(iterator position,const T& value) {
  node<T>* newNode = new node<T>(value, position.nodePtr->next);
  if (position.nodePtr == tail) tail = newNode;
  position.nodePtr->next = newNode;
  return position;
}

As happens so often in programming, this solution creates another problem. That problem is what to do about begin(). The begin() method is supposed to return an iterator that points to the beginning of the list. Since our iterator class is going to contain a pointer to the node before the node we want to point to, what should we put in the iterator that begin() returns?

The solution to this problem is to deploy another trick. We equip our list class with a dummy head node. This is an extra node that we slip into the list right from the very start. The dummy head node is a node that appears before the first actual node containing data. By making the head pointer in the list class point to the dummy head node and defining begin() to do this

iterator begin() const {
  return iterator(head);
}

we will have solved our problem.

The increment operators

One of the operations that our iterator needs to support is the increment operation. C++ actually has two increment operators, the preincrement operator, which is written

++itr;

and the postincrement operator, which is written

itr++;

When used alone in a statement these two operators have exactly the same effect. The difference in the two operators becomes apparant when we use these operators in combination with other operations in the same same statement: the preincrement operation is applied first before any other operations in a statement, while the postincrement operation is applied after all other operations in a statement.

Here is one of the more famous pieces of code in C, which makes use of the postincrement operator with a pair of pointers. This code is an implementation of the C standard library strcpy function, which copies characters from a C string pointed to by a pointer src to a C string pointed to by a pointer dest:

void strcpy(char* dest,const char* src) {
  while(*dest++ = *src++) ;
}

The rather peculiar while loop in this function does all of the useful work in the test expression and has an empty body. The test expression is a compound expression that does three things at the same time:

*dest = *src copies a single character from source array to the destination array.
The assignment has a side effect, it returns the character copied. That character is a numeric code that C interprets as a true/false value for use by the loop test. The null character, '\0', which marks the end of the source string, is interpreted as false, which causes the loop to terminate at the right time. All other characters are interpreted as true, which causes the loop to continue.
The increment operations, dest++ and src++ happen last in the test expression, which advances the two pointers to the next pair of characters on each round. Note that for the loop logic to work correctly, we have to do the character assignment before advancing the two pointers to the next location.

Because there are two different increment operators, C++ has had to adopt a peculiar convention to differentiate them. To overload the preincrement operator we construct a member function in our iterator class that takes the form

iterator& operator++()

The overload for this operator is supposed to advance the iterator to the next location and then return a reference to itself. That is usually done by putting the statement

return *this;

at the end of the method. The operator has to return this reference so the increment can be used in a compound expression.

To overload the postincrement operator we construct a member function in our iterator class that takes the form

iterator operator++(int)

The overload needs the int parameter because the rules for operator overloading in C++ stipulate that both overloads of ++ have to have the same name, operator++, and because two overloaded functions are not allowed to differ purely in their return types. To differentiate the two versions of operator++ from each other, C++ uses the convention of forcing the postincrement version to take a single integer parameter. All implementations of this operator will simply ignore that parameter, as it is used purely to differentiate the two versions of ++ from each other.

The postincrement overload is also peculiar in that it has to return a copy of the iterator before advancing the iterator. Remember, in postincrement the increment happens at the end of the expression, so we are forced to return a copy of the iterator before the increment to get the right behavior.

The full class

Here now is the full source code for our complete singly linked list class. This class features the use of a dummy head node, so that even empty lists will contain at least one node. The iterator for the list class is defined as an inner class in the list class, and stores a pointer to the node before the node that we want the iterator to point to.

// The node class for our linked list
template <typename T>
class node {
  public:
    T data;
    node<T> *next;
    
    node() : next(nullptr) {}
    node(const T& item, node<T> *ptr = nullptr) :
      data(item), next(ptr) {}
};

// Linked list class
template <typename T>
class list {
  public:
    list() {
      // Create the dummy head node
      head = tail = new node<T>();
    }
    list(const list<T>& other) = delete;
    list(list<T>&& other) = delete;
    ~list() {
      while (head->next != nullptr) {
        node<T>* temp = head;
        head = head->next;
        delete temp;
      }
      delete head;
    }
    
    list<T>& operator=(const list<T>& other) = delete;
    list<T>& operator=(list<T>&& other) = delete;
    
    // Inner class iterator
    class iterator {
    friend class list;
    private:
      node<T> *nodePtr;
      // The constructor is private, so only our friends
      // can create instances of iterators.
      iterator(node<T> *newPtr) : nodePtr(newPtr) {}
    public:
      iterator() : nodePtr(nullptr) {} 

      // Overload for the comparison operator !=
      bool operator!=(const iterator& itr) const {
        return nodePtr != itr.nodePtr;
      } 

      // Overload for the dereference operator *
      T& operator*() const {
        return nodePtr->next->data;
      }

      // Overload for the postincrement operator ++
      iterator operator++(int) {
        iterator temp = *this;
        nodePtr = nodePtr->next;
        return temp;
      }
    }; // End of inner class iterator
    
    iterator begin() const {
      return iterator(head);
    }
    
    iterator end() const {
      return iterator(tail);
    }
    
    iterator insert(iterator position,const T& value) {
      node<T>* newNode = new node<T>(value, position.nodePtr->next);
      if (position.nodePtr == tail) tail = newNode;
      position.nodePtr->next = newNode;
      return position;
    }

    iterator erase(iterator position) {
      node<T> *toDelete = position.nodePtr->next;
      position.nodePtr->next = position.nodePtr->next->next;
      if (toDelete == tail) tail = position.nodePtr;
      delete toDelete;
      return position;
    }
  private:
    node<T>* head;
    node<T>* tail;
};

To prevent problems caused by mixing inner classes with templates, I have taken the approach here of writing the code for all of the member functions in the class declarations, instead of designing this as a class declaration followed by separate method implementations.

The two most important new methods in the list class itself are the insert() and erase() methods. These methods replace the author's original methods for adding and removing items from the end of the list. Both method follow the conventions for insert() and erase() methods in the STL.

The insert() method takes as its first parameter an interator that points to a location in the list. The insert() method will create a new node that contains the data given in the second parameter and insert that new node in the list in the location before the node that the first parameter points to. After inserting the new node, insert() returns an interator that points to the new node. insert() can be used to add new nodes at any location in the list, even at the end of the list. By passing the iterator returned by the list's end() method as the first parameter to insert(), we can add a new item to the end of the list. You may want to take a careful look at the code for the insert() method and convince yourself that it will do the right thing in all cases, including adding new items at the front and back of the list.

The erase() method removes a node from the location pointed to by the iterator in its parameter, and returns an interator that points to the node after the node that was removed.

To test our new class here is a simple test program

#include <iostream>
#include "list.h"

int main(int argc,const char* argv[])
{
  list<int> v;
  
  v.insert(v.end(),2);
  v.insert(v.end(),4);
  v.insert(v.end(),5);
  
  auto iter = v.begin();
  iter = v.insert(iter, 1); // Insert 1 before 2
  iter++; // Points to 2
  v.insert(iter++,3); // Insert 3 before 2, advance to 4
  iter++; // Points to 5
  v.insert(iter,10); // Insert 10 before 5
    
  iter = v.begin();
  iter++; // Points to 3
  v.erase(iter); // Erase 3
  
  for(auto itr = v.begin();itr != v.end();itr++)
    std::cout << *itr << std::endl;
    
  return 0;
}

Compiling and running this test program confirms that our list class is working the way that it should.

Programming Assignment

Here is the code for the author's doubly linked list class. Modify this class to do the following things:

Introduce an inner iterator class. The iterator class will contain a single pointer to a node, and will point directly to the node that you want the iterator to refer to.
Get rid of the author's methods addToDLLTail, deleteFromDLLTail, addToDLLHead and deleteFromDLLHead and replace them with insert and erase methods as I did in the example above.
You can also get rid of the author's code for firstEl, find, and operator<<, since these all implement things that are more properly done with iterators.
Provide begin() and end() methods. begin() should return an iterator that points to the head node, and end() should return an interator with a pointer value of nullptr to indicate that the iterator points past the end of the list.
Since we will not be using a dummy head node with our doubly linked list class, the author's original constructor and destructor methods will continue to work just fine - do not modify them.

After modifying the author's class as indicated above, modify the simple test program I showed in the lecture notes above to work with your doubly linked list class. Run your test program and verify that produces the correct results.