Somewhere in a processing device, such as a computer, laptop or phone there is a CPU, which controls everything that happens in the device. CPU's today are von Neumann architecture devices, i.e. they:
An early competitor CPU design was the Harvard architecture which used separate memory for instructions and data, including separate memory busses for each. Modern systems incorporate some Harvard ideas and are effectively a modified Harvard architecture having separate instruction and data caches (at the level 1 cache) and having control of which memory areas may contain instructions and allowing such memory regions to be read-only.
Early CPU's were connected to memory via the motherboard and operated directly on the motherboards memory. In order to improve the speed of the CPU it becomes necessary to keep data as close to the CPU (i.e. inside of it) as possible as the round trip time of data to memory on the motherboard is quite high.
A cache memory is a temporary holding spot for instructions or data fetched from main memory or destined for main memory. While a CPU is working on some data it greatly reduces the access time (latency) of accessing the given data. Modern CPU's have three levels of cache:
L3 - The outer-most layer and largest cache is the Level 3 (L3) cache. The L3 cache tends to be shared among all processor cores and appeared with the advent of larger multiple core CPUs.
L2 - The next closest cache is the Level 2 (L2) cache which is internal to a processor core, but may be shared with 2 or more processor cores.
L1 - The closest cache is the Level 1 (l1) cache which is typically split into two caches, one for instructions and another for data.
A modern CPU (such as a Ryzen 3900X 12 core processor) may have an L3 cache size of 64MB shared among all it cores that operates at up to 1.1TB/s, a L2 cache of 512K/core operating at upwards of 1.6TB/s and an L1 cache size of 64K/core, typically 32K/32K instruction / data caches that operate at up to 3TB/s. Main memory for the Ryzen operates at a theoretical maximum of 60GB/s, though actual throughput in all cases are likely closer to 1/3 of the above.
A register is memory that is internal to the CPU and is used to hold the data while the CPU performs computations with it. Registers are typically used in the following manner:
Consider the 6502 CPU and its register set:
The X86_64 architecture which is use for more PC systems has the following register set:
rax
rbx
rcx
rdx
rbp
rsp
rsi
rdi
r8
r15
rip
rflags
The main difference between a 6502 and X86_64 in operation are the increased number of registers and their increased size (64 bits vs 8.) They also tend to be much more general purpose in that it does not matter too much which register you wish to use as an index register or to accumulate arithmetic results into. This also does not include the registers found in the math co-processor which has been integrated into CPUs for a long time now. Some research suggests that there are over 557 registers in a modern X86_64 CPU.
There is a downside to large register sets on modern CPUs in that in modern multi-processing operating systems the entire register set needs to be saved by the operating system each time the CPU switches in/out of user-space and the kernel (system call / context switch / etc.) which takes more and more time as the overhead of saving and restoring the processes register set takes more and more bandwidth.
As an instruction performs its work, it may modify one or more flags (i.e. bits) in the RFLAGS register, usually depending on the result of the operation. We use these flags to control the flow of the assembly program, usually via conditional jump instructions.
CF
PF
AF
ZF
SF
DF
CLD
STD
OF
Consider the following comparison instruction:
CMP rax, 15
This instruction "compares" the register rax against 15 by actually subtracting 15 from the value stored in rax (i.e. rax - 15). The same flags are modified by the SUB (subtract) instruction, however the CMP instruction does not save the result.
SUB
CMP
SF ZF Result is: 0 0 rax > 15 1 0 rax < 15 0 1 rax == 15
The flags then are used by various jump instructions to determine if they should jump to a new location or not. If so the machine begins executing code at the new location and if not, it merely continues executing code after the jump instruction as if nothing had happened.
Most of the time we don't care about the flags, but we should be aware that instructions may or may not modify them and that could influence the way the program operates. Most of the time as long as we perform a comparison of some sort immediately before one or more conditional jumps, everything should work as expected.
"instructions" are simple commands to do usually one thing, such as move data from one place to another, add this to that, compare this to that, jump to another location in the program, etc. They usually consist of an instruction name and 0-3 comma separated parameters.
The MOV (move) instruction, copies data from source to the destination. The source may be a memory location (usually bracketed (i.e. [address]), an immediate value (i.e. a constant value like 1, 0xFF, etc.) or a register name (i.e. rax, r13, etc.) The destination may be a register or memory location.
MOV dst, src
mov rax, 1 ; set the rax register to the value 1, i.e.: rax = 1 add rax, 10 ; Add 10 to rax, i.e.: rax += 10 mov [var],rax ; move value in rax to memory pointed to by the address ; given by the 'label' var. i.e. *var = rax
A register may hold both integer values and addresses (an address is just a integer value, specifying a memory location.) It is up the programmer to disambiguate in assembly if a register holds a value or an address. In nasm (the Netwide Assembler,) to fetch or store a value at a specific address, the register, label (almost all labels are addresses) or an equation (such as a label + register offset) is surrounded by square brackets.
MOV rax, 1
rax = 1
MOV [rax], 1
*rax = 1
memory[rax] = 1
MOV rbx, rax
rbx = rax
MOV rbx, [rax]
rbx = *rax
rbx = memory[rax]
where memory in the above refers the programs entire memory address space as if it were an array of values.
memory