CS471/571 - Operating Systems

Lesson 1

The Unix C programming and build environment is typically composed of at least the following commands:

A C compiler, such as gcc (GNU C Compiler) or clang (LLVM C compiler)
A debugger such as gdb (GNU DeBugger)
A Makefile defining the methods to build your program using the make program such as GNU Make.
An editor, preferably one with compiling in the editor capability, such as kate, emacs, jove, vim, etc.

Makefiles

A file called 'makefile' or 'Makefile' in a source directory is used by the make command, describing the relationships between files and the necessary commands to update/create said files.

The relationships are defined as targets which each have requirements and a recipe to build said targets. When make builds a target it look at the requirements and if any requirement is newer than the target, or the target is missing, then the target will be re-made using the recipe.

Format of a makefile:

Comments

Started with # and extend to EOL. Use # for a literal # sign

Variable definitions

Roughly equal to sh/bash variable definitions:
i.e.:

name=value
or
name := value

Variables are used in recipes as $(name)

Functions

Wherever variables can be used you may use "functions", usually in the form of: $(func-name comma-deliminated-params)

Example:

$(sort $(objs)) ──► Emits a sorted version of $objs

Directives:

These are special commands, such as:

include file

reads the "file" as a sub-makefile.

conditional directives:

conditional
statements
[ else
statements ]
endif

conditional true if:

ifeq (arg1,arg2) arg1 and arg2 are the same

ifeq (arg1,) arg1 is not empty

ifneq (arg1,arg2) arg1 and arg2 are not the same

ifdef var var is defined

ifndef var var is not defined

Explicit rules:

target(s): prerequisites
[recipe]

targets One or more files, separated by spaces to make. You may use wild-cards for the file-names. If the target already exists, but the prerequisites have a newer timestamp than the target, it will be re-made so it is up to date.

prerequisites These are files that are required by the target to be made usually source/header files or some generated source output from a code generation tool such as lex/flex and yacc/bison

recipe How to make the target. Often times make will know how to do this via its stored implicit rules and not require an explicit recipe. Must be prefixed by a tab character. Recipe lines are sh shell commands. If the recipe needs to span more than one line, you should escape the newline character with a backslash () (not necessarily necessary.)

Examples:

# target foo.o requires both foo.c and bar.h (bar.h is included by foo.c)
# '$(CC) -c foo.c' is the recipe to make the target foo.o

foo.o:  foo.c bar.h
        $(CC) -c foo.c

# Makes more than one program, each with their own C source file(s). Assumes
# you don't want to build any .o files, really only suitable for small programs

all: prog1 prog2

prog1:  prog1.c prog1.h
        $(CC) $(CFLAGS) -o prog1 prog1.c $(LDFLAGS)

prog2:  prog2.c foo.c prog2.h
        $(CC) $(CFLAGS) -o prog2 prog2.c foo.c $(LDFLAGS)

The make command:

make [-C dir] [-f makefile] [-j #jobs] [ target ]

-C dir Move to dir then begin making

-f makefile Use makefile instead of the default

-j #jobs Parallel make using #jobs

target The target within the makefile to build. If not specified the first target found in the makefile is used

Environment:

MAKEFLAGS

Sets default options to make to use:
Example:
(tcsh) > setenv MAKEFLAGS '-j32'
(bash) > export MAKEFLAGS='-j32'

Examples:

# A "variable" defines the CC variable to be gcc (the C compiler):
CC=gcc

# Rule to make the executable hello from hello.o, hello.o will be made first:
hello: hello.o
    $(CC) -o hello hello.o

# Rule makes hello.o from hello.c:
hello.o: hello.c
    $(CC) -c hello.c

CC=gcc -std=c11
CFLAGS=-ggdb -Wall
LIBS=-lcurses

# Usually the first rule is used to define what should be done when just 'make'
# is invoked:
all:    c1 c2 c3 c4 f1 f2 f3

util.o: util.c
      $(CC) $(CFLAGS) -c util.c

# % in a target/prerequisite is like a wild-card.
# $@ -> The filename of the target
# $< -> The name of the first prerequisite

c%:     c%.c util.o
      $(CC) $(CFLAGS) -o $@ util.o $< $(LIBS)

f%:     f%.c
      $(CC) $(CFLAGS) -o $@ $<

# Additional rules to clean things up. I.e. 'make clean' / 'make cleanall'
clean:
      rm -f *.o

cleanall:       clean
      rm -f c1 c2 c3 c4 f1 f2 f3

GDB - The GNU Debugger

GDB TUI mode (Text User Interface)

The TUI mode represents a nice curses interface to gdb.

Compile your programs with:

Language	Options
GNU C	`gcc -ggdb` ...
clang	`clang -gdwarf` ...
nasm	`nasm -g -F dwarf -felf64 prog.s` ... `ld -o prog prog.o`

Then issue the command: gdb prog

Then inside of GDB issue the following commands:

tui enable
tui reg general
break _start
run [<params>]

Then use 'step', 'next', 'si', 'ni', etc as normal, but enjoy the curses window. The source window should be the selected one, which will respond to arrow keys and scroll wheel events to move around. The enter key will repeat the last command run over and over.

To direct the output of the program to a different terminal window (because you will not see the output because of the curses nature of the tui window,) use "tty" in the terminal you want to direct the output to, then in the gdb window use "tty ", such as "tty /dev/pts/4" for example, then "run" the program (re-run it if using debug.)

gdb program

-p pid - Debug the already running program with PID 'pid'
-c core - Use the core file as the processes memory.

Use:

unlimit coredump (tcsh)
ulimit -c unlimited (bash)

to enable core-dumps.

Common commands:

Command	What it does
`run` args...	Runs the program
`c`	Continue the program
`next`	Step the program (skips function calls)
`step`	Step one source line (descends into function calls)

`list` [file`:`]func	List the source where the program is stopped at
`break` location	Set a break-point at location
`catch` event	Catch an event such as "fork", "signal", etc.
`watch` expr	Break whenever a data location changes.
`awatch` expr	Break whenever a data location is read/written.

`print` expr	Print the value of expr
`where/bt`	Print a stack-trace of where the program is currently
`up`	Go up the previous stack frame
`down`	Go down a stack frame

`help`	GDB help
`quit`	Exit gdb

Assembly debugging:

Command	What it does
`info registers` `i r`	Dumps all the registers and their values
`info frame` `i f`	Dump information about the current stack frame
`print $reg` `p $reg`	Print a specific register and its value
`print (char *)($rsp+8)`	Can apply C type to a dereferenced address.
`display /3i $pc`	Display 3 instructions relative to the program counter when stepping. *You probably always want to do this before stepping*
`ni`	Go to next instruction (stepping over calls)
`si`	Step to next instruction (stepping into calls)
`x /8g $rsp`	Examine 8 (giant (8-byte) words) starting at $rsp

Memory:

In a computer, memory is a sequence of bytes each with a numeric address, exactly analogous to an array of bytes. The index of the array is the address of a byte value:

char memory[size-of-total-memory];

memory[address] = value

In Linux, a processes memory is laid out like:

32 bit machine:

┌─────────────────────────┐ 0xFFFFFFFF
│ Kernel mode space (1GB) │ <- not directly accessible by a user-space process.
├─────────────────────────┤ 0xC0000000
│                         │
│ User mode space (3GB)   │
│                         │
└─────────────────────────┘ 0x00000000

64 bit (48 bits physical) virtual address space layout is similar, with kernel space at the top 128TB and user space at the beginning 128TB. Currently address bits 48-64 must copy bit 47 (i.e. all ones or zeros) otherwise are non-canonical addresses.

┌─────────────────────────┐ 0xFFFFFFFF FFFFFFFF
│                         │
│ Kernel mode space       │  (128TB)
│                         │
├─────────────────────────┤ 0xFFFF8000 00000000
│                         │
│ Unused space            │  
│                         │
├─────────────────────────┤ 0x00007FFF FFFFFFFF
│                         │
│ User mode space         │  (128TB)
│                         │
└─────────────────────────┘ 0x00000000 00000000

Memory for a (32 Bit) Linux process:

┌──────────────────────────┐ 0xFFFFFFFF
│ Kernel mode space (1GB)  │
├──────────────────────────┤ 0xC0000000
│ Random Stack offset      │
├──────────────────────────┤
│ Stack (Grows down)       │ RLIMIT_STACK (8MB)
├──────────────────────────┤
│                          │ Random MMap offset
├──────────────────────────┤
│ Memory mapping segment │ │
│ File mappings(dyn libs)│ │
│ Anon mappings          ▽ │
├──────────────────────────┤
│                          │ program break
├──────────────────────────┤ brk
│        △                 │
│        │ Heap            │
├──────────────────────────┤ start_brk
│                          │ Random brk offset
├──────────────────────────┤
│ BSS Segment (uninitial-  │
│ ized static vars) Zero-  │
│ Filled                   │
├──────────────────────────┤
│ Data Segment (static     │ end_data
│ Variables initilized by  │
│ programmer.              │ start_data
├──────────────────────────┤
│ Text Segment (ELF)       │ end_code
│ Binary image of program  │
├──────────────────────────┤ 0x08048000 (start of program)
└──────────────────────────┘ 0x00000000

Distinction between kernel and user-space

The distinction is mostly down to which privilege level (called a ring) the code runs at:

Ring 0 (kernel)

Can do anything, sees all of memory.

Rings 1,2

Not used, can access some privileged memory, but not allowed some instructions. Meant for separating device drivers out of the kernel proper, sometimes used by VMs such as VirtualBox.

Ring 3 (userspace)

Cannot change it's own ring (obviously.)
Cannot modify it's own page tables (i.e how it sees memory)
Cannot register interrupt handlers.
Cannot do I/O instructions like IN or OUT
Must basically let the kernel manage things for it. It "communicates" with the kernel by raising an interrupt or syscall which jumps to a specific area of memory in kernel space while switching to ring 0. The kernel figures out what the user-space process wants it to do based on the values in specific registers.

Misc. programs:

> size binary

Print sizes of program sections.

> readelf -a binary

Print symbols, etc. about an ELF binary.

> objdump -d binary

Dissambly of a program.
-S with source code

> cat /proc/*/maps

Display memory mappings for shared objects