The hardware stack is a section of memory that typically is controlled by a
stack register (in our case rsp and rbp) and often has special instructions
for pushing and popping data on and off of it (push, pop, etc.)
In a Linux process on program start the stack contains the count of command
line arguments (argc), the list of strings of the command line arguments (argv)
and the list of strings of the environment. The stack grows downward in memory
on the Intel architecture, however some stacks may grow upwards in memory (and
it would probably be better if they did.)
The initial stack organization of the Linux stack appears to look something
like the following:
Top of stack (highest address)
│ Environment strings │
│ ... │
│ Command line strings │
│ ... │
│ empty space? │ │
│ ... │ │ Lower addresses
│ envp array pointers │ ∇
│ argv array pointers │
│ argc │ ◀─── rsp
Linux allows/allocates about 8MB of stack space per process and is available to
our programs immediately.
Up until now we have used almost exclusively global data. For simple programs
this might be sufficient, however for more complicated functions we need to
have local state per function call. Consider the following C function:
// Recursively computes: n! (n factorial): n * n-1 * n-2 * ... * 2 * 1
int factorial(int n)
if (n <= 1) return 1;
return n * factorial(n-1);
In this function n is a local variable. If n starts at 4 (i.e.
? = factorial(4);) then the call stack would look something like:
? = factorial(4);
= 4 * factorial(3)
= 3 * factorial(2)
= 2 * factorial(1)
In order to keep n in a register, we would need four registers, one for
each of the calls. We can use a register for the return value of the
functions, as as we come back up from the recursions, we can re-use it
for the parents return value. We can also use a register as the parameter
to the function, however we still need to save that value somewhere while
the child call is made, for that we can use the stack.
The stack pointer normally points to the last item pushed onto the stack. It
is usually manipulated with the following instructions:
; is equivalent to:
SUB rsp, 8
MOV QWORD [rsp], rax
; is equivalent to:
MOV rax, QWORD [rsp]
ADD rsp, 8
The rsp register can be used as a regular register, however if you want
to keep access to the stack, you need to remember where the stack is, so should
save rsp somewhere if you want to maintain access to the stack.
We can implement our recursive function using push/pop like so:
; Uses rax as both its parameter and return value register
cmp rax, 1
mov rax, 1
push rax ; If we saved rax into a register here, it would
dec rax ; likely be over-written during the next call to
call factorial ; factorial at this point.
If we need more space for more than a few local variables, then pushing and popping
value may be too cumbersome. We can allocate the space we want from the stack
simply by subtracting from the stack pointer the size of all the local variables
we want to have and move into the stack the values we want there, which can be done
with a fast memory copy operation. To delete the local variables after the function
is complete, simply requires putting the rsp register back to the same value it
had when we entered the function, to remember where that is, we use rbp, the
base pointer register.
The area of the stack that is used for a single function calls local variables is
called the stack frame, the top of which is maintained by the rbp register. A
function can create and tear-down its stack frame by manipulating rbp and rsp.
│ Callers (parent) stack frame │ △
│ ... │ │
│ Any pushed parameters for │ │
│ the called function │ │ Higher addresses
│ │ │
│ Return Address │ │
│ old rbp │ ◁─ current rbp (bottom of current stack frame)
│ ... local variables ... │
│ │ ◁─ rsp (top of stack)
Given that rsp may change it's location during the run-time of the
function, we switch to using the rbp register to access variables within the
stack frame as rbp does not change. Because it is "above" all the values,
To create and tear-down the stack frame, we can use the enter and leave
The enter instruction is used to allocate a stack frame for local variable space
for functions. It should be the first instruction upon entering the function.
Most compilers implement the equivalent to enter using push/mov/sub since
it appears to be faster to do it that way, however for now, we will use enter.
ENTER imm16, imm8
ENTER imm16, imm8
The first parameter (imm16) is an amount of stack to allocate and the second
(imm8) is a nesting level for nested functions, for now always use 0 as the
Assuming the second parameter (the nesting level for enter) is 0, the enter
instruction is equivalent to:
push rbp ; Save old base pointer.
mov rbp, rsp ; make current stack pointer the new base pointer.
sub rsp, imm16 ; Allocates imm16 bytes for the new stack frame.
Leave should be the last instruction before the ret instruction, which restores
the parent callers stack frame.
mov rsp, rbp ; Restore stack pointer back to where we pushed the old base pointer
pop rbp ; Restore the old base pointer.
Modifies the printnum function to allocate its number buffer on the stack and
use that instead of the global _numbuf buffer.
enter 32, 0 ; Sets aside 32 bytes of space on the stack
lea rsi, [rbp-1] ; The space is immediately below rbp
; At this point we can use rsi in the same manner we use it in the original:
mov BYTE [rsi], 0 ; (*rsi) = '\0';
; Rest of the code for printnum is the same.
; Restore the stack fame before returning from the function:
Since the stack grows downward in memory and buffers that are allocated on the
stack are often written to upwards toward higher addresses, if the code allows
writes to continue past the amount of memory allocated to the buffer, then
Most of the time a buffer overflow of this type simply causes the program
to crash (usually by accessing memory not allowed to it resulting in a
segmentation fault.) However if a malicious attacker understands the nature
of the overflow and control it, they can craft a stack smash attack, which
writes a small assembly program called shell code (which launches a unix
shell,) into the buffer and enough overflow to overwrite the return address to
point at the buffer containing the shell code, causing the return from the
function to execute the code.
If the process is privileged (i.e. running as the super-user,) then such an
attack can gain for the attacker the privileges of program being exploited.
A by no means exhaustive list of mitigations of such attacks may take the form
Address Space Layout Randomization (ASLR) - Among other things it randomizes
the location of the stack in memory, preventing an attacker from easily
guessing what the return address should be set to. Fairly easy to get around
on a 32 bit machine however.
A "no execute" stack - Modern CPUs can prevent execution of code in specific
areas of memory and making the stack "no execute" prevents code on the stack
from being executed. This still does not prevent the program from being
made to crash, or being able to modify its behavior.
Stack "canaries" - From the proverbial "canary in a coal mine", a stack
canary (a random value that is determined at run-time,) is pushed onto the
stack at the start of the function. As the function prepares to return the
stack canary is inspected to see if it has been modified and if so, the
program exits immediately. If the canary is still intact, then the return
value is probably safe to use.