Our first shunting yard example is an example of an interpreter, i.e. it interprets the expressions given it without generating machine code. In computer language we would encounter the same expression repeatedly if it were the test expression of a loop, so re-lexing and re-evaluating it each time it is encountered would be inefficient, so we would like to lex and parse the expression once while maintaining it's form for repeated evaluation during run-time. To do this we want to transform the expression into an expression tree.
Consider this structure for a "node" in the expression tree:
struct enode { token_t op; int64_t value; struct enode *left, *right; };
Then given an expression such as show below, the resulting trees would look like:
1 + 2 - 3 │ 1 + 2 * 3 │ (1 / 2) + (3 * 4) │ a = b = 5 │ │ │ - │ + │ + │ = ╱ ╲ │ ╱ ╲ │ ╱ ╲ │ ╱ ╲ + 3 │ 1 * │ / * │ a = ╱ ╲ │ ╱ ╲ │ ╱ ╲ ╱ ╲ │ ╱ ╲ 1 2 │ 2 3 │ 1 2 3 4 │ b 5
As you walk the tree starting from the left, you perform the operation after you have visited both the left and right sub-tree, so the algorithm for evaluating the tree would be:
int64_t eval(struct enode *e) { if (e == NULL) return 0; int64_t l = eval(e->left); int64_t r = eval(e->right); // Perform the 'e->op' with the values l and r return result; }
Handing the function the top node of the tree will then return the result. Parsing an expression into a tree like this also has other benefits, one of which is a simple means to optimize out constant expressions, called constant folding. If both sides of an operation are constant values, then we can perform the operation during the parse phase and replace the operation with its result, so given:
(2 + 3) * var * * ╱ ╲ ╱ ╲ + var ──▶ 5 var ╱ ╲ 2 3
There are other benefits to using an expression tree, even when generating code in the form of a compiler, is that we may want to make more than one pass over the code to resolve the names of functions and variables to addresses and so forth without having to completely re-parse the program from the beginning.
To make the shunting yard program generate an expression tree, we change our number stack to an expression node stack. Numeric constants are transformed into node leaves and operations pop nodes for their left and right side and then push the resulting tree back onto the node stack.
Consider the expression: 1 + 2 * 3 and the state of stacks after processing all the tokens will be:
1 + 2 * 3
1 + 2 * 3 ^ node op d p u │───────│ │───│───│───│───│ │ 3 │ │ │ │ │ │ │───────│ │───│───│───│───│ │ 2 │ │ * │ 0 │65 │ 0 │ │───────│ │───│───│───│───│ │ 1 │ │ + │ 0 │60 │ 0 │ └───────┘ └───┴───┴───┴───┘
The operator stack now will be drained (we've reached the end of the expression and so operators are poped off the operator stack until it is empty.) The first operator is the *, which pops the 3 and 2 which creates the tree:
* / \ 2 3
which is pushed back onto the node stack:
node op d p u │───────│ │───│───│───│───│ │ * │ │ │ │ │ │ │ / \ │ │───│───│───│───│ │ 2 3 │ │ │ │ │ │ │───────│ │───│───│───│───│ │ 1 │ │ + │ 0 │60 │ 0 │ └───────┘ └───┴───┴───┴───┘
Then we pop the + operator and create the following tree:
+ / \ 1 * / \ 2 3
With the entire * tree on the right because it was above the 1 node in the node stack. The resulting tree is then pushed back on the expression stack and is also the final "value" and is thus the expression tree.
To do the constant folding during the parse phase, in the action() routine we check if both sides of the operation are constant values, if so, we go ahead and perform the operation for real and instead of pushing a tree, we push a leaf with the computed result of the operation instead, thus eliminating the operation. It cannot completely eliminate operations, such as var * 2 * 2 would resist constant folding even though it could be optimized to var * 4, it lacks any sub-tree with constants on both sides, since the expression is essentially (var*2) * 2 and opportunistic constant folding cannot be employed.
action()
var * 2 * 2
var * 4
(var*2) * 2