Somewhere in a processing device, such as a computer, laptop or phone there is
a CPU, which controls everything that happens in the device. CPU's today are
von Neumann architecture devices, i.e. they:
An early competitor CPU design was the Harvard architecture which used
separate memory for instructions and data, including separate memory busses
for each. Modern systems incorporate some Harvard ideas and are effectively a
modified Harvard architecture having separate instruction and data caches (at
the level 1 cache) and having control of which memory areas may contain
instructions and allowing such memory regions to be read-only.
Early CPU's were connected to memory via the motherboard and operated directly
on the motherboards memory. In order to improve the speed of the CPU it
becomes necessary to keep data as close to the CPU (i.e. inside of it) as
possible as the round trip time of data to memory on the motherboard is
A cache memory is a temporary holding spot for instructions or data fetched
from main memory or destined for main memory. While a CPU is working on
some data it greatly reduces the access time (latency) of accessing the
given data. Modern CPU's have three levels of cache:
L3 - The outer-most layer and largest cache is the Level 3 (L3) cache.
The L3 cache tends to be shared among all processor cores and appeared
with the advent of larger multiple core CPUs.
L2 - The next closest cache is the Level 2 (L2) cache which is internal
to a processor core, but may be shared with 2 or more processor cores.
L1 - The closest cache is the Level 1 (l1) cache which is typically
split into two caches, one for instructions and another for data.
A modern CPU (such as a Ryzen 3900X 12 core processor) may have an L3 cache
size of 64MB shared among all it cores that operates at up to 1.1TB/s, a L2
cache of 512K/core operating at upwards of 1.6TB/s and an L1 cache size of
64K/core, typically 32K/32K instruction / data caches that operate at up
to 3TB/s. Main memory for the Ryzen operates at a theoretical maximum of
60GB/s, though actual throughput in all cases are likely closer to 1/3
of the above.
A register is memory that is internal to the CPU and is used to hold the
data while the CPU performs computations with it. Registers are typically
used in the following manner:
Consider the 6502 CPU and its register set:
The X86_64 architecture which is use for more PC systems has the following register
The main difference between a 6502 and X86_64 in operation are the increased
number of registers and their increased size (64 bits vs 8.) They also tend to
be much more general purpose in that it does not matter too much which register
you wish to use as an index register or to accumulate arithmetic results into.
This also does not include the registers found in the math co-processor which
has been integrated into CPUs for a long time now. Some research suggests that
there are over 557 registers in a modern X86_64 CPU.
There is a downside to large register sets on modern CPUs in that in modern
multi-processing operating systems the entire register set needs to be saved
by the operating system each time the CPU switches in/out of user-space and the
kernel (system call / context switch / etc.) which takes more and more time as
the overhead of saving and restoring the processes register set takes more and
As an instruction performs its work, it may modify one or more flags (i.e. bits)
in the RFLAGS register, usually depending on the result of the operation. We use
these flags to control the flow of the assembly program, usually via
conditional jump instructions.
Consider the following comparison instruction:
CMP rax, 15
CMP rax, 15
This instruction "compares" the register rax against 15 by actually subtracting
15 from the value stored in rax (i.e. rax - 15). The same flags are modified by
the SUB (subtract) instruction, however the CMP instruction does not save
rax > 15
rax < 15
rax == 15
The flags then are used by various jump instructions to determine if they should
jump to a new location or not. If so the machine begins executing code at the
new location and if not, it merely continues executing code after the jump
instruction as if nothing had happened.
Most of the time we don't care about the flags, but we should be aware that
instructions may or may not modify them and that could influence the way the
program operates. Most of the time as long as we perform a comparison of some
sort immediately before one or more conditional jumps, everything should work
"instructions" are simple commands to do usually one thing, such as move
data from one place to another, add this to that, compare this to that, jump
to another location in the program, etc. They usually consist of an
instruction name and 0-3 comma separated parameters.
The MOV (move) instruction, copies data from source to the destination. The
source may be a memory location (usually bracketed (i.e. [address]), an
immediate value (i.e. a constant value like 1, 0xFF, etc.) or a register
name (i.e. rax, r13, etc.) The destination may be a register or memory
MOV dst, src
MOV dst, src
mov rax, 1 ; set the rax register to the value 1, i.e.: rax = 1
add rax, 10 ; Add 10 to rax, i.e.: rax += 10
mov [var],rax ; move value in rax to memory pointed to by the address
; given by the 'label' var. i.e. *var = rax
A register may hold both integer values and addresses (an address is just a
integer value, specifying a memory location.) It is up the programmer to
disambiguate in assembly if a register holds a value or an address. In
nasm (the Netwide Assembler,) to fetch or store a value at a specific address,
the register, label (almost all labels are addresses) or an equation (such as
a label + register offset) is surrounded by square brackets.
MOV rax, 1
rax = 1
MOV [rax], 1
*rax = 1
memory[rax] = 1
MOV rbx, rax
rbx = rax
MOV rbx, [rax]
rbx = *rax
rbx = memory[rax]
where memory in the above refers the programs entire memory address space
as if it were an array of values.