Logo  

CS456 - Systems Programming

The Shunting-yard Algorithm:

The shunting yard algorithm essentially reads a infix notation expression and converts it to an RPN expression, which can be evaluated as the algorithm runs. To do this it adds an operator stack in addition to a number stack. The operator stack must record additional information, so might be a stack of the following structures:

    struct oper {
      token_t op;       // The operator token
      int prec;         // The operator precedence
      int dir;          // The associativity (0 = left-to-right, 1=right-to-left)
      int unary;        // Flag if the operator is the unary version.
    };


The attributes that go with the operator are taken from a set of tables for each operator, such as:

//               +,  -,  *,  /, **,  (,  )      Used when unary = 
int  prec[] = { 60, 60, 65, 65, 70,  0,  0};    //  false
int uprec[] = { 75, 75, -1, -1, -1, -1, -1};    //  true
int   dir[] = {  0,  0,  0,  0,  0,  0,  0};    //  false
int  udir[] = {  1,  1,  0,  0,  0,  0,  0};    //  true


The uprec/udir entries are the precedence and direction for the unary versions of the operator. A -1 indicates that the operator cannot be unary. A unary operator is one that takes only one operand and typically precede their operand, such as the minus sign that makes a number negative.

Higher precedence operators are removed from the operator stack before a lower precedence operator can be pushed onto it. ()'s are handled explicitly.

The direction of evaluation (0 = left to right, 1 = right to left) or associativity determines if operators of the same precedence are also removed (as is the case in left-associative operators) from the op stack before this operator is pushed.

Theory of operation:

  • A simple infix expression is an alternating sequence of numbers and operators beginning and ending with a number. i.e. n = number, o = operator:
  n
  n o n
  n o n o n
  n o n o n o n ...

or generalized to: n (o n)*

  • A Boolean flag is used to keep track of when to to expect a number or an operator.
    typedef enum { NUMBER, OPERATOR } flag_t;

  • Numbers when encountered are pushed onto the number stack and the flag is then switched to expect an operator.

  • If a operator is encountered when a number is expected, then it is considered a unary operator and the flag is not updated (i.e. a number is still expected.) It should still be considered an error to have multiple consecutive unary ops.

  • An opening parenthesis resets the flag back to expecting a number. Everything inside of ()'s should collapse to a single number in the number stack when the closing ) is encountered (i.e. all the operators are pop'ed off the op stack until the matching '(' is pop'ed.)

  • Before pushing an operator to the op stack, all higher precedence operators at the top of the stack are processed (i.e. the operator is pop'ed off, one or two numbers are pop'ed, the operation is performed with the numbers and the result is finally pushed onto the number stack.) If the operator to be pushed is left-associative (like most operators) then any operator of equal precedence at the top of the stack must also be processed.

  • When the end of the expression is reached, all the remaining operators on the op stack are processed. The number stack should have one value on it which is the result of the expression.

    << Work through example: 5 + 2 * 3 + 6 >>

Implementation

The below sets up the beginning of the implementation. We re-use the lexer built for the RPN calculator, we also re-use the number stack. This sets up the precedence and direction "table" and the operator stack and the functions to push, pop and peek at the operator stack.

#include "lex.h"

typedef enum { NUMBER, OPERATOR } flag_t;

void die(char *why) {
  fprintf(stderr,"%s\n", why);
  exit(1);
}

//               +,  -,  *,  /, **,  (,  )
int  prec[] = { 60, 60, 65, 65, 70,  0,  0};
int uprec[] = { 75, 75, -1, -1, -1, -1, -1};
int   dir[] = {  0,  0,  0,  0,  0,  0,  0};
int  udir[] = {  1,  1,  0,  0,  0,  0,  0};

// Number stack:
int numstack[K];
int nsp = 0;

void push_num(int n) {
  numstack[nsp++] = n;
}
int pop_num(void) {
  if (nsp <= 0) die("Stack underflow");
  return numstack[--nsp];
}

The operator stack

// Operator stack:
struct op {
  token_t op;
  int unary, prec, dir;
} opstack[K];
int osp = 0;

void push_op(token_t op, int unary, int prec, int dir) {
  opstack[osp++] = (struct op){.op=op, .unary=unary, .prec=prec, .dir=dir};
}

struct op pop_op(void) {
  if (osp <= 0) die("Op stack underflow");
  return opstack[--osp];
}
/**
 * Function to "peek" at the top of the op stack, T_UNKNOWN means the stack
 * is empty.
 */
token_t peek_op(void) {
  if (osp <= 0) return T_UNKNOWN;
  return opstack[osp-1].op;
}
/**
 * Peak at the just the precedence of operator at the top of the op stack:
 */
int peek_prec(void) {
  if (osp <= 0) return -1;
  return opstack[osp-1].prec;
}

The action() function.

The action() function performs an "action" -- a single operation. It pops an operator off the op stack then pops one or two numbers off the number stack (based on the unary flag for that operation) then performs the operation. Finally the result is then pushed onto the number stack.

The first value pop'ed off the number stack should be considered the 'right' value, and the second value the 'left' value. If no 'left' value is pop'ed then use a default value of 0 for the left side. i.e. v = l - r; if - is the unary minus, then v = 0 - r; would still give the correct result.

If the unary flag is set, but the unary precedence of the operator is -1, then the expression is malformed and an error should printed.

void action(void)
{
  struct op op;
  int l, r, v;

  // Get the operator:
  op = pop_op();

  // Right side is at the top of the stack, left would be underneath it,
  // unless it's a unary op
  r = pop_num();
  if (op.unary == FALSE) l = pop_num();
  else l = 0;
  if (uprec[op.op] == -1 && op.unary)
    die("Malformed unary expression in action.");

  // Perform the operation:
  switch(op.op) {
    case T_PLUS:  v = l + r; break;
    case T_MINUS: v = l - r; break;
    case T_MULT:  v = l * r; break;
    case T_DIV:   v = l / r; break;
    default:
      die("Illegal operation.");
  }
  // Push the result onto the stack:
  push_num(v);
}

The main loop:

  • Loop until a EOE (end of expression) token is read. For following tokens, do the following:

    • Numbers:

      • push the value onto the number stack, set the flag to expect an operator
    • Open parenthesis:

      • push the open parenthesis onto the op stack, set the flag to expect a number. Precedence, direction and unary flags do not matter for parenthesis.
    • Close parenthesis:

      • make a loop to call action() until the top operation on the op stack is a open parenthesis. If the op stack is completely drained, print the error "Mismatched parenthesis." and exit.
      • pop the open parenthesis off, set the flag to expect an operator.
    • For operators: +,-,*,/:

      • Determine the current tokens precedence (p) and direction (d) based on the value of the flag and the tables provided above.

      • If flag is set to NUMBER, then it is a unary operator.

      • "drain" the op stack of any operators that have a higher precedence (or the same and higher if the associativity is left to right (d==0)) than the current operation.

      • Push the new operation to the op stack

      • Set the flag to expect a number.

  • After all tokens have been processed, call the action() function until there are no more remaining operators on the op stack.

  • Pop the top number off the number stack, this is should be the result of the expression.

  ...
  startlex(somestring);
  token_t tok;
  flag_t flag = NUMBER;
  int num, p, d;

  while ((tok = lex(&num)) != T_EOE) {
    switch(tok) {
      case T_NUMBER:
        if (flag != NUMBER) die("Syntax error");
        push_num(num);
        flag = OPERATOR;
        break;
      case T_OPAREN:
        push_op(T_OPAREN, 0, 0, 0);
        flag = NUMBER;
        break;
      case T_CPAREN:
        // Drain the op stack until it reaches it's matching open paren:
        while ((tok = peek_op()) != T_UNKNOWN) {
          if (tok == T_OPAREN) break;
          action();
        }
        if (tok == T_UNKNOWN) die("Mismatched parenthesis.");
        // Removes the opening paren:
        pop_op();
        flag = OPERATOR;
        break;
      case T_PLUS:
      case T_MINUS:
      case T_MULT:
      case T_DIV:
      case T_EXP:
        // Get the precedence and direction of the current operator:
        p = (flag == NUMBER)? uprec[tok] : prec[tok];
        if (p == -1) die("Malformed expression, operator not a valid unary operator.");
        d = (flag == NUMBER)? udir[tok] : dir[tok];

        // Drain the op stack of operators of higher precedence (and equal to if
        // this op is left-assoc):
        if (d == 1) {   // Right associative
          while (peek_prec() > p)
            action();
        } else {        // Left associative
          while (peek_prec() >= p)
            action();
        }
        // Finally push the operator to the op stack:
        push_op(tok, flag==NUMBER, p, d);
        flag = NUMBER;
        break;
      default:
        die("Syntax error");
    }
  }
  // Drain the operator stack until it's empty:
  while(peek_op() != T_UNKNOWN)
    action();

  // The result of the entire expression should be the last and only value left:
  printf("%d\n", pop_num());