Logo  

CS456 - Systems Programming

The RAX register decoded:

Bits: 63            ..             31               15      7       0
      ┌────────────────────────────┬────────────────┬───────┬───────┐
      │                            │                │  AH   │   AL  │
      └────────────────────────────┴────────────────┴───────┴───────┘
                                                    └      AX       ┘
                                   └               EAX              ┘

Some registers have specific purposes (rip, rsp, rbp), some instructions require specific registers to do their work (such as DIV.)

Bits Registers
8 AL/AH CL/CH DL/DH BL/BH SPL BPL SIL DIL R8B-R15B
16 AX CX DX BX SP BP SI DI R8W-R15W
32 EAX ECX EDX EBX ESP EBP ESI EDI R8D-R15D
64 RAX RCX RDX RBX RSP RBP RSI RDI R8-R15

Memory layout of a Linux process

┌──────────────────────────┐0xFFFFFFFFFFFFFFFF
│ Kernel mode space        │
├──────────────────────────┤
│           ...            │
├──────────────────────────┤
│ Stack (Grows down)       │
├──────────────────────────┤
│           ...            │
├──────────────────────────┤
│ Memory mapping segment │ │
│ File mappings(dyn libs)│ │
│ Anon mappings          ▽ │
├──────────────────────────┤
│           ...            │
├──────────────────────────┤
│        △                 │
│        │ Heap            │ C malloc'ed memory
├──────────────────────────┤
│           ...            │
├──────────────────────────┤
│ BSS Segment (uninitial-  │
│ ized static vars) Zero-  │ section .bss
│ Filled                   │
├──────────────────────────┤
│ Data Segment (static     │ 
│ Variables initilized by  │ section .data
│ programmer.              │ 
├──────────────────────────┤
│ Text Segment (ELF)       │ section .text
│ Binary image of program  │
├──────────────────────────┤ (start of program)
└──────────────────────────┘ 0x0000000000000000

... = unmapped address space (i.e. no memory is mapped in these regions)

When writing assembly programs, we specify the locations of data and program code by specifying the section the following elements are stored in using an assembler directive, described further below.

Register usage during syscall/function call

  • Reading: man 2 syscall
  1. The rax register holds the system call # (documented in /usr/include/asm/unistd_64.h)

  2. Parameters 1-6 stored in: rdi, rsi, rdx, r10, r8, r9 (and in that order, i.e. the first parameter must be in rdi, the second if any in rsi, and so on.)

  3. Any remaining parameters are pushed on the stack (this likely will never be necessary.)

  4. The return value of the system call is placed in rax upon return from the call.

  5. The called routine is expected to preserve: rsp, rbp, rbx, r12, r13, r14, and r15 but may trample any other registers. Thus the only safe registers to use for your program when you make system-calls are rbx and r12-r15, you should avoid rbp and rsp if you use any stack instructions (i.e. PUSH/POP)

  6. Consider the following "mnemonic":

    rax = rax(rdi, rsi, rdx, r10, r8, r9)

Example:

; The write system call is Syscall #1:
%define SYS_write       1
; Descriptor #1 is the standard output:
%define STDOUT_FILENO   1

;    w = write(1, buf, r);  would be converted to:
;  rax = rax  (rdi, rsi, rdx);  converting to mnemonic:
;  [w] = 1    (1,   buf, [r])   values to be put into the above registers
;  Note that 'buf' is already an address, so may be copied into rsi as-is.

SECTION .data
    r:  resq 1          ; Reserve one quad-word at address 'r'
    w:  resq 1          ;   "      "      "     "     "    'w'
    ; NOTE: both r and w are "addresses", not values.  Most labels are
    ; addresses (like a pointer).

SECTION .text
        ;...
        mov rax, SYS_write      ; Load rax with the syscall number (i.e. 1)
        mov rdi, STDOUT_FILENO  ; Load rdi with the file descriptor number (i.e. 1)
        mov rsi, buf            ; Load rsi with the buffer address
        mov rdx, [r]            ; Load rdx with the value at address r
        ; If the system call needs more parameters it would use rdx, r10, etc.
        ; If it needs fewer then then use fewer, but always in the same order,
        ; i.e. rdi is the first parameter, rsi the second and so on.
        syscall                 ; Perform the system call
        mov [w], rax            ; Return value is in rax, so store it @ w

NASM - The Netwide Assembler

Layout of a NASM source line:

label:     instruction     operands     ; comment

x86 Intel Syntax instruction conventions:

operands Format
0 INSTR
1 INSTR arg
2 INSTR dst, src
3 INSTR dst, src, aux

Intel syntax does not require the use of instruction suffixes to specify the size of data, i.e.:

movl $0x000F, %eax ; Store hex 0xF (32 bits) in eax, AT&T syntax

instead use:

MOV eax, 0x000F ; register naming and/or immediate value indicates size (usually.)

Size specifiers:

The size of the data (in comparisons and other operations) may sometimes need to be specified, to do this use a size specifier before one of the arguments:

size   BYTE   WORD   DWORD   QWORD   TWORD   OWORD   YWORD   ZWORD
bits 8 16 32 64 80 128 256 512
bytes 1 2 4 8 10 16 32 64

Example:

CMP BYTE [rax+rbx], 0
or:
CMP [rax+rbx], BYTE 0

  • In the above case the amount of data to be compared is ambiguous w/o the specifier

Operand types:

  • (R) register

  • (I) immediate - i.e. constants:

    • 0x or $ prefix or a H/h suffix denotes hex: 0xFF, $FF, FFh
    • 0o prefix or Q/q/o suffix denotes octal: 0o777, 777o
    • 0b prefix or B/b suffix denotes binary: 0b110101, 01010011b
    • Character constants are C-format: 'a', 'abcd'
    • Single/double quoted strings: "abc",'abc'
      • Cannot contain escapes.
    • Back-ticked strings: ` abc\n `
      • Can contain C escapes
  • (M) effective addresses (memory):

[label] Data located at the address of label
[label+1] data at label + some constant offset
[label+register] data at label offset by the amount in register
[label+register*scale], [register*scale]
[label+register*scale+constant], [register*scale + constant]
[register:label+register]
  • In 64 bit mode, immediates and offsets are generally only 32 bits wide. Only MOV supports a 64 bit immediate value.

Examples

  • assume address of "char array[]" stored in "esi":
Assembly C equivalent
mov al, BYTE [esi] al = array[0]
mov al, [esi + 10] al = array[10]
mov al, [esi + ecx] al = array[ecx]
mov al, [esi + ecx*8 + 100] al = array[ecx*8 + 100]

labels:

A label refers to the address of the code or data on the line it occurs on.

  • A label may start with a letter, . (dot)(w/ special meaning), _ (underscore) or ?, and may contain letters, numbers, _, $, #, @, -, . and ?.

  • May be up to 4095 characters in length

A label that starts with a . (dot) is a "local" label and is associated with the previous non-local label. May be combined with its associated non-local label to be accessed from outside of it's local code.

  func1:    ...
  .loop:    ...

  func2:    ...
  .loop:    ...
        jmp func1.loop  ; go to the .loop inside of func1.

Assembler directives:

SECTION section name

  • Defines a section to compile into, some sections are pre-defined, including:
Section name
.bss statically allocated objects that are un-initialized (usually zeroed)
.data statically allocated objects that are pre-initialized, sometimes read-only.
.text The program code

GLOBAL symbol

  • Exports a symbol globally

EXTERN symbol

  • Imports a symbol from an external source

X86 nasm pseudo-instructions:

RESB, RESW, RESD, RESQ

  • reserves 1,2,4 or 8 byte words

TIMES #

  • prefix indicating repetition of data or reserved space.
    label: times 10 resd 10 ; set aside 10x10 bytes of memory
    label: times 8 db'abcd' ; repeats "abcd" 8 times.
    buf: times 64 db0 ; Makes a zero'ed 64 byte buffer

DB, DW, DD and DQ

  • Defines data that is 1,2,4 or 8 bytes in size

INCBIN "file"[,skip[,amount]

  • Include a binary file as data, optionally skipping amount at the beginning and optionally including only bytes of data.

EQU constant

  • Define a constant value i.e.:
    x: equ 10
    make x equal to 10 rather than an address

Macros:

%include "file"

  • Includes another file into your source.

%define define

  • Just like C define