CS471/571 - Operating Systems

Lesson 11

Libraries

In an effort to not re-invent the wheel, libraries were developed to allow for code-reuse. The first libraries were static libraries, code that is compiled to one or more .o (object) files, each .o containing a set of library functions and/or global variables, which are then collected together into a .a archive file, which can be manipulated with the ar command (see: man ar).

ar t /usr/lib64/libz.a

The C compiler then searches through the .a files collection of object files and statically links them to your program at the compile time linking stage in the same manner your own .o object files are linked together.

The downside to static libraries are that they:

The program must be re-compiled in order to update the code to use a newer library. This may be important if the library is updated to fix security issues in the library.
Make a program much larger, the code of the library cannot be shared between all the processes that may use the library. On-disk images of programs are increased by the size of all the libraries (or at least the parts of the libraries that are used) by the programs.

Static libraries still exist and may be available to be linked to your programs if they are compiled with the -static option. Static programs have the upside that they will likely work so long as a specific minimum kernel version is met, meaning that a distributed static binary will much more likely work across all distributions of Linux.

Shared libraries

To address the two downsides of static libraries, shared libraries were developed. The main problem with a shared library is knowing where in memory it and its functions/data reside so that a caller can call the functions or access the data. In a program or library, the names of functions and data are called symbols, each of which has an address and a type and a record of them is kept in the program itself after compilation. The symbols of a program or library may be inspected using the following commands (assuming the symbols have not been stripped via the strip command):

readelf -s program|library

nm program

Initially in Linux, prior to kernel version 1.2 when the a.out program executable format was used, a library was compiled such that it would load at a specific address, allowing the caller to know where in memory a function or data item will be found, thus allowing the compiler to hard-code the addresses for the particular functions (or the address of a jump-table.) The downside of this scheme is that the limited 32 bit address space had to have reservations for each library, limiting the number and locations of libraries in the system. Libraries needed to be registered to prevent conflicts in memory with other libraries.

To make a shared library you make a C program (the main function is optional) and compile it with the options -shared and -fPIC for Position Independent Code.

Example:

#include <stdio.h>

int foo(int n)
{
  return printf("Foo %d\n", n);
}

int main(void)
{
  return foo(10);
}

gcc -shared -fPIC -o libfoo.so foo.c

The Dynamic Runtime linker (ld.so)

When Linux switched from a.out (Assembler OUTput) executable format to ELF (Executable and Linking Format, also used by

The program /lib64/ld-linux-x86_64.so.2 is the runtime dynamic linker (also known as ld.so.) When a program is compiled against a shared library, the dynamic linker is specified as the interpreter for the program. When such a program is execve()'ed, the kernel will launch the dynamic linker to load and run the program:

The dynamic linker loads the program into memory,
Searches for the shared libraries the program is requesting and maps them into memory in the memory mapping segment.
Populates the addresses in jump tables for the dynamic shared object for non-relative jumps within the shared object.
Looks through the unresolved (U) symbols of the program and matches them up with the symbols in the libraries that are loaded.
Execution is then transferred to the program.

dlopen() and dlsym()

We can play the part of the dynamic linker in our own programs using the C library functions dlopen() to open a dynamic shared object (.so file) and then get the address of symbols within using dlsym().

#include <dlfcn.h>

void *dlopen(const char *filename, int flags);

int dlclose(void *handle);

dlopen() returns a (void *) non-NULL pointer to a resource handle if it successfully opens the file. flags is one of RTLD_LAZY (resolve symbols only when needed (functions only)) or RTLD_NOW (resolve all symbols immediately.) binary OR'ed with other flags such as RTLD_GLOBAL or RTLD_LOCAL (decides at what level symbols are available to subsequently loaded libraries.)

When opening a library you should be specific about it's location unless you wish to use an established library installed in the normal /lib* directories. If you wish to use a common library you can include <gnu/lib-names.h> and use defines such as LIBM_SO (the math library.)

dlsym()

void *dlsym(void *handle, const char *symbol);

Resolves the symbol named by symbol given the resource handle. The return value is the address of the symbol or NULL on failure.

Example:

#include <stdio.h>
#include <dlfcn.h>

int main(void)
{
  void *handle = dlopen("./libfoo.so", RTLD_LAZY);
  if (handle == NULL) {
    perror("dlopen");
    return 1;
  }

  int (*foo)(int) = dlsym(handle, "foo");
  foo(5);
  int (*foo_main)(void) = dlsym(handle, "main");
  foo_main();

  dlclose(handle);
  return 0;
}