The hardware stack is a section of memory that typically is controlled by a stack register (in our case rsp and rbp) and often has special instructions for pushing and popping data on and off of it (push, pop, etc.)
In a Linux process on program start the stack contains the count of command line arguments (argc), the list of strings of the command line arguments (argv) and the list of strings of the environment. The stack grows downward in memory on the Intel architecture, however some stacks may grow upwards in memory (and it would probably be better if they did.)
The initial stack organization of the Linux stack appears to look something like the following:
Top of stack (highest address) ┌──────────────────────┐ │ Environment strings │ │ ... │ ├──────────────────────┤ │ Command line strings │ │ ... │ ├──────────────────────┤ │ │ empty space? │ │ │ ... │ │ Lower addresses ├──────────────────────┤ │ │ envp array pointers │ ∇ ├──────────────────────┤ │ argv array pointers │ ├──────────────────────┤ │ argc │ ◀─── rsp └──────────────────────┘
Linux allows/allocates about 8MB of stack space per process and is available to our programs immediately.
Up until now we have used almost exclusively global data. For simple programs this might be sufficient, however for more complicated functions we need to have local state per function call. Consider the following C function:
// Recursively computes: n! (n factorial): n * n-1 * n-2 * ... * 2 * 1 int factorial(int n) { if (n <= 1) return 1; return n * factorial(n-1); }
In this function n is a local variable. If n starts at 4 (i.e. ? = factorial(4);) then the call stack would look something like:
n
4
? = factorial(4);
factorial(4) = 4 * factorial(3) factorial(3) = 3 * factorial(2) factorial(2) = 2 * factorial(1) factorial(1) = 1
In order to keep n in a register, we would need four registers, one for each of the calls. We can use a register for the return value of the functions, as as we come back up from the recursions, we can re-use it for the parents return value. We can also use a register as the parameter to the function, however we still need to save that value somewhere while the child call is made, for that we can use the stack.
rsp
The stack pointer normally points to the last item pushed onto the stack. It is usually manipulated with the following instructions:
PUSH rax ; is equivalent to: SUB rsp, 8 MOV QWORD [rsp], rax
POP rax ; is equivalent to: MOV rax, QWORD [rsp] ADD rsp, 8
The rsp register can be used as a regular register, however if you want to keep access to the stack, you need to remember where the stack is, so should save rsp somewhere if you want to maintain access to the stack.
We can implement our recursive function using push/pop like so:
; Uses rax as both its parameter and return value register factorial: cmp rax, 1 jg .cont mov rax, 1 ret .cont: push rax ; If we saved rax into a register here, it would dec rax ; likely be over-written during the next call to call factorial ; factorial at this point. pop rbx mul rbx ret
If we need more space for more than a few local variables, then pushing and popping value may be too cumbersome. We can allocate the space we want from the stack simply by subtracting from the stack pointer the size of all the local variables we want to have and move into the stack the values we want there, which can be done with a fast memory copy operation. To delete the local variables after the function is complete, simply requires putting the rsp register back to the same value it had when we entered the function, to remember where that is, we use rbp, the base pointer register.
rbp
The area of the stack that is used for a single function calls local variables is called the stack frame, the top of which is maintained by the rbp register. A function can create and tear-down its stack frame by manipulating rbp and rsp.
│ Callers (parent) stack frame │ △ │ ... │ │ │ Any pushed parameters for │ │ │ the called function │ │ Higher addresses │ │ │ │ Return Address │ │ ├──────────────────────────────┤ │ old rbp │ ◁─ current rbp (bottom of current stack frame) │ │ │ ... local variables ... │ │ │ │ │ ◁─ rsp (top of stack)
Given that rsp may change it's location during the run-time of the function, we switch to using the rbp register to access variables within the stack frame as rbp does not change. Because it is "above" all the values,
To create and tear-down the stack frame, we can use the enter and leave instructions.
enter
leave
The enter instruction is used to allocate a stack frame for local variable space for functions. It should be the first instruction upon entering the function. Most compilers implement the equivalent to enter using push/mov/sub since it appears to be faster to do it that way, however for now, we will use enter.
push
mov
sub
ENTER imm16, imm8
ENTER
The first parameter (imm16) is an amount of stack to allocate and the second (imm8) is a nesting level for nested functions, for now always use 0 as the nesting level.
Assuming the second parameter (the nesting level for enter) is 0, the enter instruction is equivalent to:
push rbp ; Save old base pointer. mov rbp, rsp ; make current stack pointer the new base pointer. sub rsp, imm16 ; Allocates imm16 bytes for the new stack frame.
Leave should be the last instruction before the ret instruction, which restores the parent callers stack frame.
ret
LEAVE
mov rsp, rbp ; Restore stack pointer back to where we pushed the old base pointer pop rbp ; Restore the old base pointer.
Modifies the printnum function to allocate its number buffer on the stack and use that instead of the global _numbuf buffer.
printnum: enter 32, 0 ; Sets aside 32 bytes of space on the stack lea rsi, [rbp-1] ; The space is immediately below rbp ; At this point we can use rsi in the same manner we use it in the original: mov BYTE [rsi], 0 ; (*rsi) = '\0'; ; Rest of the code for printnum is the same. ; Restore the stack fame before returning from the function: leave ret
Since the stack grows downward in memory and buffers that are allocated on the stack are often written to upwards toward higher addresses, if the code allows writes to continue past the amount of memory allocated to the buffer, then writes may:
Most of the time a buffer overflow of this type simply causes the program to crash (usually by accessing memory not allowed to it resulting in a segmentation fault.) However if a malicious attacker understands the nature of the overflow and control it, they can craft a stack smash attack, which writes a small assembly program called shell code (which launches a unix shell,) into the buffer and enough overflow to overwrite the return address to point at the buffer containing the shell code, causing the return from the function to execute the code.
If the process is privileged (i.e. running as the super-user,) then such an attack can gain for the attacker the privileges of program being exploited.
A by no means exhaustive list of mitigations of such attacks may take the form of:
Address Space Layout Randomization (ASLR) - Among other things it randomizes the location of the stack in memory, preventing an attacker from easily guessing what the return address should be set to. Fairly easy to get around on a 32 bit machine however.
A "no execute" stack - Modern CPUs can prevent execution of code in specific areas of memory and making the stack "no execute" prevents code on the stack from being executed. This still does not prevent the program from being made to crash, or being able to modify its behavior.
Stack "canaries" - From the proverbial "canary in a coal mine", a stack canary (a random value that is determined at run-time,) is pushed onto the stack at the start of the function. As the function prepares to return the stack canary is inspected to see if it has been modified and if so, the program exits immediately. If the canary is still intact, then the return value is probably safe to use.