CS456 - Systems Programming

Lesson 2

In Unix, everything is a file

In modern operating systems, the operating system controls access to all devices and thus needs to handle all the Input/Output operations that a program wishes to do. Unix & Linux are no exception. I/O under Unix/Linux is performed by making system calls, or making a request of the kernel to fetch or store data via the devices that the kernel controls access to.

To maintain the state of I/O operations, the kernel maintains a set of file descriptors, integer values starting at 0 and increasing to some limit (typically processes under Linux are limited to 1024 file descriptors, however that might be raised to 4096, perhaps higher with changes to the kernel.) Each descriptor represents an opened file, though a file may actually represent many things under Unix/Linux, such as devices, memory objects, network connections, just about anything that data can be read from and/or written to. Thus the notion that everything is a file.

Normally each process is given three opened descriptors at startup by the shell program that is used to find and run programs. These are:

Descriptor # C name What is usually represents
0 STDIN_FILENO Input from the terminal keyboard
1 STDOUT_FILENO Output to the terminal (line buffered)
2 STDERR_FILENO Output to the terminal (not line buffered)

File descriptors are not to be confused with FILE * streams that C provides which are a higher level abstraction of I/O. In this class we will be dealing mostly with the I/O provided directly by the kernel via system calls.

All system calls are documented in section 2 of the online manual (i.e. the man command.) Also all system calls return an integer return value, which if it is < 0 indicates an error has occurred. If an error has occurred the error number will be set in the errno global error variable and the perror() function may be used to print a human readable description of the error.

The C standard library provides wrapper functions for most of the kernels system calls, the following are a few of the more important ones.


  • Reading: man 2 open
   #include <sys/types.h>
   #include <sys/stat.h>
   #include <fcntl.h>

   int open(const char *pathname, int flags);
   int open(const char *pathname, int flags, mode_t mode);

open() asks the kernel to create a file descriptor referencing a particular file on the file-system and if successful returns the file descriptor as its value. The mode field is only required if the open() could potentially create a new file (i.e. the O_CREAT flag is provided). The mode for files should normally be 0666 for normal files. The mode is modified by the processes umask value (the umask value is subtracted from mode to yeild the final mode value.)

flags What it represents
O_RDONLY Open file for reading
O_WRONLY Open file for writing
O_RDWR Open file for reading and writing

Additional flags such as the following may be bitwise OR'ed (|) with the above:

Useful flags What it represents
O_APPEND Appends data to the end of the file
O_CREAT Create the file if it does not exist
O_TRUNC Delete all data in the file upon opening

Once a file has been opened successfully the file descriptor value returned should be saved so that it may be used for subsequent calls to read/write/etc to fetch or store data in the file or device.


  // Open foo.txt for reading only:
  int fd = open("foo.txt", O_RDONLY);

  // Open bar.txt for writing, truncating its contents or creating it if it
  // does not exist:
  int fd = open("bar.txt", O_WRONLY | O_TRUNC | O_CREAT, 0666);

  // Example of testing if bar.txt was sucessfully opened:
  if (fd < 0) {
    perror("open bar.txt");
    // Maybe a good idea to exit the program here.


  • Reading: man 2 close
   #include <unistd.h>

   int close(int fd);

close() releases a opened file descriptor, allowing the descriptor number to be re-used for a different file. Note that the kernel always attempts to allocate the lowest available file descriptor on open(). One way to replace one of the default descriptors with a file would be to close the original descriptor and follow it immediately with a call to open the file you wish to replace it with:


   open("foo.txt", O_RDONLY);
   // The program should now think that foo.txt is it's standard input.

read() / write()

  • Reading: man 2 read
   #include <unistd.h>

   ssize_t read(int fd, void *buf, size_t count);
   ssize_t write(int fd, const void *buf, size_t count);

read() and write() both have the same calling semantics, only the direction data is traveling is different and so the buffer that read uses cannot be read only. The amount of data read or written is returned by each function, a negative value indicating an error. A zero returned on read usually indicates that the end of file has been reached and reading should stop, unless the file descriptor has non blocking semantics enabled (i.e. O_NONBLOCK, although it does not apply to regular files,) where a read may return immediately if no data is available.

In a standard read-write loop where reading from one descriptor and writing to another, insure that you only write as much data as has been read by the previous read(). Attempting to write the full buffer will likely write garbage data that was left over from a previous read.

Example read-write loop:

  // Assumes input and output are valid file descriptors defined elsewhere:
  char data[K];
  int r;

  while ( (r = read(input, data, K)) > 0 ) {
    write(output, data, r);

In the example note that the read assignment is full enclosed inside of ()'s preventing a common error where 'r = (read() > 0)' assignment which would make r 1 or 0 depending on whether or not the read had reached the end of file.

Also note the use of r for the amount of data to write in the write() call.


  • Reading: man 2 lseek
   #include <sys/types.h>
   #include <unistd.h>

   off_t lseek(int fd, off_t offset, int whence);
whence What is represents
SEEK_SET The file offset is set to offset bytes.
SEEK_CUR The file offset is set to its current location plus offset bytes.
SEEK_END The file offset is set to the size of the file plus offset bytes.

lseek() sets the file pointer position in a file that contained on a block device, such a disk device or regular file. The offset represents the byte offset with respect to the value of whence. Typically negative values for offset are used when using SEEK_END, but positive values can be used, creating holes in the file where no data actually exists (presents as zero'ed data if a read is attempted in the region.)


  • Reading: man 2 pipe
   #include <unistd.h>

   int pipe(int pipefd[2]);

A pipe in Unix is a uni-directional data stream. The pipe() system call fills the contents of a 2 element integer array with two file-descriptors which represent the two ends of the data-stream.

pipefd[0] represents the read end of the data-stream. Reads will normally block until data is written to pipefd[1] which then becomes available to be read from the read end.

A pipe is required to connect the output of one process to another process. To do this the pipe must be created prior to a fork(), after which the same pipe exists in both processes (one of the few examples of a truely shared resource between processes after a fork,) thus what is written on the write side of the pipe in one process can be read from the read end in the other process.

Note that because it's a uni-directional data stream it is not possible to use it to both send and receive data between processes, unless one were to take turns reading and writing. One either needs to use two pipes or a socketpair() in which both descriptors can both read and write.

EOF on a pipe

Normally the End Of File (EOF) on a pipe descriptor (i.e. read returns 0 bytes read) is only transmitted when the write end of a pipe has been closed (i.e. there can never be any more data to be read.) A gotcha arises when after we have forked and have two sets of pipe descriptors however, as both processes have both a write end and a read end. Since a EOF can only occur when the write end has been closed, it must be closed in both processes. Thus it becomes important to close the write end in the process that will be reading from the pipe. Thus the steps necessary for process A to write to process B through a pipe are:

Process A (parent) Process B (child)
1) Create the pipe -
2) Fork the new process -
3) Close read end 1) Close write end
4) Write to write end ──▷ 2) Read from read end

dup() / dup2()

  • Reading: man 2 dup
   #include <unistd.h>

   int dup(int oldfd);
   int dup2(int oldfd, int newfd);

To connect the output of one process (it's stdout, i.e. descriptor 1) to the input (it's stdin, i.e. descriptor 0) of another process it is necessary to replace the processes normal descriptors (normally attached to the TTY,) with the appropriate pipe descriptors.

The dup() and dup2() system calls are designed to allow us to duplicate an existing descriptor. With dup() the descriptor to be replaced must be closed prior to calling dup() to make the descriptor slot available for the duplication of the pipe descriptor (oldfd). This can be cumbersome, so dup2() takes a newfd parameter which will insure that the descriptor to be replaced (newfd) is closed if necessary and the replaced with the pipe descriptor (oldfd).

Pipe example:

The following implements A | B where the standard output of the parent (A) is sent to the standard input of the child (B):

int pipefd[2];

// Pipe must exist before forking if it is to be shared between the processes.

pid_t pid = fork();

if (pid > 0) {
  // Parent (A):
  // Duplicate write end as our stdout:
  dup2(pipefd[1], STDOUT_FILENO);
  // Closing the read end in parent:
  // Close the now duplicated and extraneous write end as well:

  printf("This will be sent to the child process.\n");
} else {
  // Child (B):
  // Duplicate the read end as our stdin:
  dup2(pipefd[0], STDIN_FILENO);
  // Closing the write end in child:
  // Close the now duplicated read end as well:

  // This will read the string printed above:
  char buf[K];
  fgets(buf, K, stdin);

  printf("The message from the parent: %s\n", buf);