Evaluating Infix Expressions

Tokens and the Tokenizer

The purpose of this lecture is to develop a program that can correctly evaluate complex arithmetic expressions like

(12 - 1)*52 - 473/(4 + 2^3)

The code I am going to show below is based on code found in Data Structures and Problem Solving Using C++, Second Edition by Mark Allen Weiss.

The first step in writing such a program is breaking up the input into a sequence of tokens, where each token represents either an arithmetic operation, a number, or parentheses. Weiss has written a token class to represent individual tokens and a tokenizer class that takes a line of characters from some input stream and breaks it up into a stream of tokens. Here is the code for these two classes.

enum TokenType { EOL, VALUE, OPAREN, CPAREN, EXP, MULT, DIV, PLUS, MINUS };

template <typename NumericType>
class Token
{
  public:
    Token( TokenType tt = EOL, const NumericType & nt = 0 )
      : theType( tt ), theValue( nt ) { }

    TokenType getType( ) const
      { return theType; }
    const NumericType & getValue( ) const
      { return theValue; }

  private:
    TokenType   theType;
    NumericType theValue;
};

template <typename NumericType>
class Tokenizer
{
  public:
    Tokenizer( istream & is ) : in( is ) { }

    Token<NumericType> getToken( );

  private:
    istream & in;
};

// Find the next token, skipping blanks, and return it.
// Print error message if input is unrecognized.
template <class NumericType>
Token<NumericType> Tokenizer<NumericType>::getToken( )
{
    char ch;
    NumericType theValue;

        // Skip blanks
    while( in.get( ch ) && ch == ' ' )
        ;

    if( in.good( ) && ch != '\n' && ch != '\0' )
    {
        switch( ch )
        {
          case '^': return EXP;
          case '/': return DIV;
          case '*': return MULT;
          case '(': return OPAREN;
          case ')': return CPAREN;
          case '+': return PLUS;
          case '-': return MINUS;

          default:
            in.putback( ch );
            if( !( in >> theValue ) )
            {
                cerr << "Parse error" << endl;
                return EOL;
            }
            return Token<NumericType>( VALUE, theValue );
        }
    }

    return EOL;
}

The getToken() method makes use of a slick trick to read numbers. When it sees something that looks like the beginning of a number it puts that character back on the stream and then asks to read a number using the usual >> notation.

Balancing Parentheses

A very simple application of the stack class and the tokenizer is to read an arithmetic expression and determine whether or not the parentheses in the expression are correctly matched. Here is an algorithm for solving this problem.

  1. Scan the tokens from left to right ignoring everything but open parenthesis and close parenthesis tokens.
  2. Whenever you encounter an open parenthesis token, push it on the stack.
  3. Whenever you encounter a close parenthesis token, pop the stack. If the stack is empty, output an error message.
  4. When you have read all the tokens, check to see whether or not the stack is empty. If the stack is empty, declare success. If the stack still has something on it you have an unbalanced open parenthesis in your expression.

Here is code for a program that implements this algorithm.

#include <stack>
#include <iostream>

#include "token.h"

using namespace std;

typedef Token<int> tokenType;

int main()
{
  stack<tokenType> s;
  
  Tokenizer<int> t(cin);

  tokenType curToken = t.getToken();
  while(curToken.getType() != EOL)
  {
    // Process the current token
    if(curToken.getType() == OPAREN)
      s.push(curToken);
    else if(curToken.getType() == CPAREN)
    {
      if(s.isEmpty())
      {
        cout << "Unbalanced close parenthesis." << endl;
        break;
      }
      else
        s.pop();
    }
    curToken = t.getToken();
  }

  if(!s.isEmpty())
    cout << "Unbalanced open parenthesis" << endl;
  else
    cout << "Expression balances." << endl;

  return 0;
}

Evaluating Postfix Expressions

Arithmetic expression in infix form have operators written between the numbers they operate on. An alternative form is postfix form, which writes the operator after the two things it operates on. For example, the postfix equivalent of

1 + 2

is

1 2 +

and the postfix equivalent of

1 + 2 * 3

is

1 2 3 * +

One useful feature of postfix notation is that it never requires parentheses. For example, the postfix equivalent of

(1 + 2)*3

is

1 2 + 3 *

There is a very simple algorithm for evaluating postfix expressions.

  1. Scan the expression from left to right.
  2. Push any numbers you encounter on a value stack.
  3. Any time you encounter an operator, pop the top two numbers off the value stack, apply the operation to them, and push the result back on the stack.
  4. The number that appears at the top of the stack at the end of the scan is the result of the calculation.

If you try this algorithm with any of the examples shown here you will see that it produces the right results.

Here is the code for a simple program that carries out this postfix evaluation algorithm.

#include <stack>
#include <iostream>
#include "token.h"

using namespace std;

typedef Token<int> tokenType;

int pow(int x,int y)
{
  int result = 1;
  while(y > 0)
  {
    result *= x;
    y--;
  }
  return result;
}

int main()
{
  stack<tokenType> s;
  
  Tokenizer<int> t(cin);

  tokenType curToken = t.getToken();
  while(curToken.getType() != EOL)
  {
    // Process the current token
    switch(curToken.getType())
    {
    case EOL:
      break;
    case VALUE:
      s.push(curToken);
      break;
    case OPAREN:
    case CPAREN:
      break;
    case EXP:
      {
      tokenType op1 = s.top();
      s.pop();
      tokenType op2 = s.top();
      s.pop();
      int result = pow(op2.getValue(),op1.getValue());
      s.push(Token<int>(VALUE,result));
      break;
      }
    case MULT:
      {
      tokenType op1 = s.top();
      s.pop();
      tokenType op2 = s.top();
      s.pop();
      int result = op2.getValue()*op1.getValue();
      s.push(Token<int>(VALUE,result));
      break;
      }
    case DIV:
      {
      tokenType op1 = s.top();
      s.pop();
      tokenType op2 = s.top();
      s.pop();
      int result = op2.getValue()/op1.getValue();
      s.push(Token<int>(VALUE,result));
      break;
      }
    case PLUS:
      {
      tokenType op1 = s.top();
      s.pop();
      tokenType op2 = s.top();
      s.pop();
      int result = op2.getValue()+op1.getValue();
      s.push(Token<int>(VALUE,result));
      break;
      }
    case MINUS:
      {
      tokenType op1 = s.top();
      s.pop();
      tokenType op2 = s.top();
      s.pop();
      int result = op2.getValue()-op1.getValue();
      s.push(Token<int>(VALUE,result));
      break;
      }
    }
    curToken = t.getToken();
  }

  tokenType last = s.top();
  cout << "Result: " << last.getValue() << endl;

  return 0;
}

Infix to Postfix Conversion

The last ingredient we need in order to be able to evaluate arbitrary arithmetic expressions is an algorithm to translate infix expression to postfix expressions.

Here is the beginning of an outline of that algorithm. The algorithm uses a queue to store the results and a stack to store operators while the algorithm runs. Each operator has a precedence associated with it, as shown in the table.

operatorprecedence
+1
-1
*3
/3
^6

Here is the outline of the algorithm.

  1. Scan the tokens from left to right.
  2. If you encounter a number token, move it immediately to the result queue.
  3. If you encounter an operator and the operator stack is empty, push the operator on the stack.
  4. If you encounter an operator whose precedence is greater than that of the operator at the top of the stack, push the new operator on the stack.
  5. If you encounter an operator whose precedence is less than or equal to the precedence of the operator at the top of the stack, pop the stack and move the operator from the stack to the output queue. Repeat this step until either the stack empties or an operator appears at the top of the stack whose precedence is smaller than the precedence of the current operator. Push the new operator on the stack.
  6. When you reach the end of the input, move any remaining operators from the stack to the result queue.

Handling Parentheses

The algorithm is relatively easy to modify to handle parentheses. Whenever you encounter an open parenthesis, push it on the stack. When you encounter a close parenthesis, pop operators off the stack and move them to the output queue until you encounter the matching open parenthesis.

Left Associative vs. Right Associative Operators

There is one correction we will have to make to rule 5 above. This correction applies to multiple appearances of the same operator. The rule as it is written does the correct thing with the example

2*3*4

The rule effectively treats this expression as if it were written

(2*3)*4

and translates it to

2 3 * 4 *

That is the correct postfix equivalent. Next, consider this example.

2^3^4

The correct way to evaluate this expression is to treat it as if it were written

2^(3^4)

and translate it as

2 3 4 ^ ^

In this case, rule 5 above does the wrong thing.

What's going on here? Most arithmetic operators are left associative, which means that multiple repetitions are grouped from left to right. The exponentiation operator on the other hand is right associative, which means it groups from right to left.

How do we modify rule 5 to do the right thing with both left associative operators and right associative operators? The solution that Weiss uses is to give operators one precedence when they appear at the top of the operator stack and a slightly different precedence when they appear as a new operator. Furthermore, you can also extend this rule to handle parentheses as a special case of operators. Here is a more complete operator precedence table.

operatorstack precedenceinput precedence
+21
-21
*43
/43
^56
(0100
)990

A simple optimization

The algorithm above describes how to correctly convert an expression in infix form into an expression in postfix form. The algorithm uses a queue designed to receive the tokens that make up the postfix expression as we build it from left to right and a stack that holds operator tokens. The stack simply helps to re-order the operators relative to the operands so we can do the conversion. When the conversion to postfix is complete, we could then read the contents of the queue from left to right by removing successive items from the front of the queue. We would then use a second stack to hold the operands as if we were running the usual postfix evaluation algorithm.

A simple optimization that can be applied to this process is to effectively eliminate the queue. What we do instead is transfer numbers directly from the input to the postfix stack, bypassing the queue. Instead of pushing operators on the queue only to remove them later, we can use up our operators right away by operating on the postfix stack. This is the optimization that the code below will use to simultaneously do the infix to postfix transformation and the postfix evaluation. This optimization allows us to optimize away the queue that would normally store the postfix expression before evaluation. Hence, you will see two stacks in the code below, but no queue.

Source code

We are now finally ready to see some code. The code below uses an Evaluator class to hold the major methods that implement the algorithm.

// PREC_TABLE matches order of Token enumeration
struct Precedence
{
    int inputSymbol;
    int topOfStack;
} PREC_TABLE [ ] =
{
    { 0, -1 }, { 0, 0 },       // EOL, VALUE
    { 100, 0 }, { 0, 99 },     // OPAREN, CPAREN
    { 6, 5 },                  // EXP
    { 3, 4 }, { 3, 4 },        // MULT, DIV
    { 1, 2 }, { 1, 2 }         // PLUS, MINUS
};


template <typename NumericType>
class Evaluator
{
public:
  Evaluator( const string & s ) : str( s )
    { opStack.push_back( EOL ); }

      // The only publicly visible routine
  NumericType getValue( );          // Do the evaluation

private:
  stack<TokenType>   opStack;      // Operator stack for conversion
  stack<NumericType> postFixStack; // Stack for postfix machine

  istringstream str;                // String stream

      // Internal routines
  NumericType getTop( );                // Get top of postfix stack
  void binaryOp( TokenType topOp );     // Process an operator
  void processToken(const Token<NumericType> &lastToken);
   // Handle LastToken
};


// Public routine that performs the evaluation.
// Examines the postfix stack to see if a single result
// is left and if so, returns it; otherwise prints error.
template <class NumericType>
NumericType Evaluator<NumericType>::getValue( )
{
    Tokenizer<NumericType> tok( str );
    Token<NumericType> lastToken;

    do
    {
        lastToken = tok.getToken( );
        processToken( lastToken );
    } while( lastToken.getType( ) != EOL );

    if( postFixStack.isEmpty( ) )
    {
        cerr << "Missing operand!" << endl;
        return 0;
    }

    NumericType theResult = postFixStack.top( );
    postFixStack.pop( );
    if( !postFixStack.isEmpty( ) )
        cout << "Warning: missing operators!" << endl;

    return theResult;
}

// After token is read, use operator precedence parsing
// algorithm to process it; missing opening parentheses
// are detected here.
template <class NumericType>
void Evaluator<NumericType>::processToken(const Token<NumericType> & lastToken )
{
    TokenType topOp;
    TokenType lastType = lastToken.getType( );

    switch( lastType )
    {
      case VALUE:
        postFixStack.push( lastToken.getValue( ) );
        return;

      case CPAREN:
        while( ( topOp = opStack.top( ) ) != OPAREN && topOp != EOL )
            binaryOp( topOp );
        if( topOp == OPAREN )
            opStack.pop( );  // Get rid of opening parentheseis
        else
            cout << "Missing open parenthesis" << endl;
        break;

      default:    // General operator case
        while( PREC_TABLE[ lastType ].inputSymbol <=
               PREC_TABLE[ topOp = opStack.top( ) ].topOfStack )
            binaryOp( topOp );
        if( lastType != EOL )
            opStack.push( lastType );
        break;
    }
}

// top and pop the postfix machine stack; return the result.
// If the stack is empty, print an error message.
template <class NumericType>
NumericType Evaluator<NumericType>::getTop( )
{
    if( postFixStack.isEmpty( ) )
    {
        cout << "Missing operand" << endl;
        return 0;
    }

    NumericType tmp = postFixStack.top( );
    postFixStack.pop( );
    return tmp;
}

// Process an operator by taking two items off the postfix
// stack, applying the operator, and pushing the result.
// Print error if missing closing parenthesis or division by 0.
template <class NumericType>
void Evaluator<NumericType>::binaryOp( TokenType topOp )
{
    if( topOp == OPAREN )
    {
        cout << "Unbalanced parentheses" << endl;
        opStack.pop( );
        return;
    }
    NumericType rhs = getTop( );
    NumericType lhs = getTop( );

    if( topOp == EXP )
        postFixStack.push( pow( lhs, rhs ) );
    else if( topOp == PLUS )
        postFixStack.push( lhs + rhs );
    else if( topOp == MINUS )
        postFixStack.push( lhs - rhs );
    else if( topOp == MULT )
        postFixStack.push( lhs * rhs );
    else if( topOp == DIV )
        if( rhs != 0 )
            postFixStack.push( lhs / rhs );
        else
        {
            cout << "Division by zero" << endl;
            postFixStack.push( lhs );
        }

    opStack.pop( );
}

// A simple main to exercise Evaluator class.
int main( )
{
    string str;

    while( getline( cin, str ) )
    {
        Evaluator<int> e( str );
        cout << e.getValue( ) << endl;
    }

    return 0;
}