CS456 - Systems Programming

The Central Processing Unit (CPU)

Somewhere in a processing device, such as a computer, laptop or phone there is a CPU, which controls everything that happens in the device. CPU's today are von Neumann architecture devices, i.e. they:

Contain an arithmetic logic unit (ALU) and internal registers
An instruction register / program counter
Access to memory that stores both data and instructions
External storage
Input and output mechanisms

An early competitor CPU design was the Harvard architecture which used separate memory for instructions and data, including separate memory busses for each. Modern systems incorporate some Harvard ideas and are effectively a modified Harvard architecture having separate instruction and data caches (at the level 1 cache) and having control of which memory areas may contain instructions and allowing such memory regions to be read-only.

Early CPU's were connected to memory via the motherboard and operated directly on the motherboards memory. In order to improve the speed of the CPU it becomes necessary to keep data as close to the CPU (i.e. inside of it) as possible as the round trip time of data to memory on the motherboard is quite high.

An aside about CPU caches

A cache memory is a temporary holding spot for instructions or data fetched from main memory or destined for main memory. While a CPU is working on some data it greatly reduces the access time (latency) of accessing the given data. Modern CPU's have three levels of cache:

L3 - The outer-most layer and largest cache is the Level 3 (L3) cache. The L3 cache tends to be shared among all processor cores and appeared with the advent of larger multiple core CPUs.
L2 - The next closest cache is the Level 2 (L2) cache which is internal to a processor core, but may be shared with 2 or more processor cores.
L1 - The closest cache is the Level 1 (l1) cache which is typically split into two caches, one for instructions and another for data.

A modern CPU (such as a Ryzen 3900X 12 core processor) may have an L3 cache size of 64MB shared among all it cores that operates at up to 1.1TB/s, a L2 cache of 512K/core operating at upwards of 1.6TB/s and an L1 cache size of 64K/core, typically 32K/32K instruction / data caches that operate at up to 3TB/s. Main memory for the Ryzen operates at a theoretical maximum of 60GB/s, though actual throughput in all cases are likely closer to 1/3 of the above.

Registers

A register is memory that is internal to the CPU and is used to hold the data while the CPU performs computations with it. Registers are typically used in the following manner:

Data is fetched from memory and stored in one or more registers.
Some operation is performed on the data in the register(s), such as adding or multiplying them. The result of the operation is stored back into one of the registers.
Step 2 is performed repeatedly on intermediate results using the registers until the final result is computed.
The result in the result register is moved back to main memory for longer term storage and to free registers for other computation.

Consider the 6502 CPU and its register set:

Register	Purpose
A	The accumulator, used to store the result of most arithmetic operations
X	The X index register, used to offset memory accesses, or as a counter
Y	The Y index register, much like the X register
S	Stack pointer, points to next available location on the stack
PC	Program counter, the only 16 bit register that contains the memory location of the current instruction
P	CPU Status Register, the bits of which are modified by some instructions which control the actions of others

The X86_64 architecture which is use for more PC systems has the following register set:


`rax`	a extended (multiply/divide, string load & store)
`rbx`	b extended (index register for MOVE)
`rcx`	c extended (count for string operations & shifts)
`rdx`	d extended (port address for IN and OUT)
`rbp`	base pointer (start of stack)
`rsp`	stack pointer (current location in stack, growing downwards)
`rsi`	source index (source for data copies)
`rdi`	destination index (destination for data copies)

`r8`-`r15`	Registers added for 64-bit mode

`rip`	Instruction pointer register -> the address of the next instruction to be fetched by the CPU from memory and executed. Incremented by the size of the instruction (to point at the next) or set to a new address by a jump instruction.
`rflags`	The CPU flag bits, which are set/cleared by specific instructions and control the behavior of other instructions, such as the jump instructions.

The main difference between a 6502 and X86_64 in operation are the increased number of registers and their increased size (64 bits vs 8.) They also tend to be much more general purpose in that it does not matter too much which register you wish to use as an index register or to accumulate arithmetic results into. This also does not include the registers found in the math co-processor which has been integrated into CPUs for a long time now. Some research suggests that there are over 557 registers in a modern X86_64 CPU.

There is a downside to large register sets on modern CPUs in that in modern multi-processing operating systems the entire register set needs to be saved by the operating system each time the CPU switches in/out of user-space and the kernel (system call / context switch / etc.) which takes more and more time as the overhead of saving and restoring the processes register set takes more and more bandwidth.

CPU Status (flags) Register

As an instruction performs its work, it may modify one or more flags (i.e. bits) in the RFLAGS register, usually depending on the result of the operation. We use these flags to control the flow of the assembly program, usually via conditional jump instructions.

Commonly used FLAGS:

Flag	Purpose
`CF`	Carry Flag: an arithmetic overflow(carry) / or underflow(borrow) was generated in the most/least significant bit.
`PF`	Parity Flag: indicates # of set bits is odd (0) or even (1).
`AF`	Adjust Flag: Used for BCD carry
`ZF`	Zero Flag: Set if a result is zero, cleared otherwise.
`SF`	Sign Flag (Negative): Set if result is negative, cleared otherwise
`DF`	Direction Flag: Direction of byte (string) copies:
	0 = lowest to highest address (`CLD` instruction)
	1 = highest to lowest (`STD` instruction)
`OF`	Overflow Flag: Arithmetic overflow has occurred (i.e. sign has changed by adding/multiplying two numbers)

Consider the following comparison instruction:

CMP rax, 15

This instruction "compares" the register rax against 15 by actually subtracting 15 from the value stored in rax (i.e. rax - 15). The same flags are modified by the SUB (subtract) instruction, however the CMP instruction does not save the result.

If the result of that subtraction is 0, then the ZF flag will be set (i.e. set to 1,) otherwise it is cleared (i.e. set to 0)
If the result is negative then the SF flag is set, otherwise cleared.
Other flags may be set, but those two are sufficient to determine if the values are greater than, equal to or less than one another:

SF ZF Result is:

0 0 rax > 15

1 0 rax < 15

0 1 rax == 15

The flags then are used by various jump instructions to determine if they should jump to a new location or not. If so the machine begins executing code at the new location and if not, it merely continues executing code after the jump instruction as if nothing had happened.

Most of the time we don't care about the flags, but we should be aware that instructions may or may not modify them and that could influence the way the program operates. Most of the time as long as we perform a comparison of some sort immediately before one or more conditional jumps, everything should work as expected.

Intro to assembly instructions

"instructions" are simple commands to do usually one thing, such as move data from one place to another, add this to that, compare this to that, jump to another location in the program, etc. They usually consist of an instruction name and 0-3 comma separated parameters.

The MOV (move) instruction, copies data from source to the destination. The source may be a memory location (usually bracketed (i.e. [address]), an immediate value (i.e. a constant value like 1, 0xFF, etc.) or a register name (i.e. rax, r13, etc.) The destination may be a register or memory location.

MOV dst, src

example:

      mov  rax, 1       ; set the rax register to the value 1, i.e.: rax = 1
      add  rax, 10      ; Add 10 to rax, i.e.: rax += 10
      mov  [var],rax    ; move value in rax to memory pointed to by the address
                        ; given by the 'label' var.  i.e. *var = rax

Addresses and Values

A register may hold both integer values and addresses (an address is just a integer value, specifying a memory location.) It is up the programmer to disambiguate in assembly if a register holds a value or an address. In nasm (the Netwide Assembler,) to fetch or store a value at a specific address, the register, label (almost all labels are addresses) or an equation (such as a label + register offset) is surrounded by square brackets.

Example	Meaning in C
`MOV rax, 1`	`rax = 1`
`MOV [rax], 1`	`*rax = 1` or `memory[rax] = 1`
`MOV rbx, rax`	`rbx = rax`
`MOV rbx, [rax]`	`rbx = *rax` or `rbx = memory[rax]`

where memory in the above refers the programs entire memory address space as if it were an array of values.