Logo  

CS456 - Systems Programming

I/O Redirection:

I/O redirection in the context of a shell is the process of redirecting the normal standard output or input of a command, normally connected to the terminal to or from a file or files.

Structure for a command:

The grammar for a simple command (i.e. a single command, not a pipeline) in most shells is a sequence of space separated words of which the first is the command and any subsequent words the parameters to the command. The redirection operators "<", ">" and ">>" followed by a file-path are removed from the word sequence and are not passed to the execve function.

word = normal_word
     | "<" path
     | ">" path
     | ">>" path
     ;

command = ( word )


The redirections do need to be remember and stored somehow with the command to be executed, to this end we add some additional structure to a command:

typedef struct command {
  char **argv;
  char *input;
  char *output;
  int append;
} cmd_t;


The structure is filled in with the words for the command in the argv array of strings. If a "<" is encountered the input string is set to the file-path that follows, and left NULL if no input redirection is seen. Same for output which is set if a ">" or ">>" is seen. append is a Boolean that is set to true if a ">>" is seen rather than a ">" and controls how the output file is opened.

If as we scan the command we would see two or more "<" or ">"/">>" then the redirection is "ambiguous" and the command should be aborted. It would also be ambiguous to have a output redirection on the left side of a pipe or input redirection on the right side.

After we have parsed the command and forked a new child to be the command, but before we call execve() we perform the I/O redirections by replacing the STDIN_FILENO (descriptor 0), and/or STDOUT_FILENO (descriptor 1) with file descriptors for the given files. We can do this by either:

  1. Close the descriptor to be replaced, then open the given file. This method requires that the kernel always use the first available file descriptor for an opened file.

  2. Open the given file, replace the STD*_FILENO descriptor with dup2() and close the now unnecessary opened file descriptor.

Option 2 is more work and it is usually guaranteed that open will use the lowest available descriptor, so we will use method 1 in the following code:

void redir(int nfd, char *name, int flags)
{
  close(nfd);
  if (open(name, flags, 0666) != nfd) {
    perror(name);
    exit(1);
  }
}

pid_t run(cmd_t *cmd)
{
  pid_t pid = fork();
  if (pid != 0) return pid;

  if (cmd->input != NULL)
    redir(STDIN_FILENO, cmd->input, O_RDONLY);
  if (cmd->output != NULL)
    redir(STDOUT_FILENO, cmd->output, O_WRONLY | O_CREAT | (cmd->append? O_APPEND: O_TRUNC));

  execvp(cmd->argv[0], cmd->argv);
  perror("exec");
  exit(1);
}

Constructing Pipelines

Suppose we have the following pipeline:

cmd_a | cmd_b | cmd_c

Each a cmd_t variable holding the parameters for each command. To enable pipelines in the run() routine, we modify it accordingly:

pid_t run(cmd_t *c, int *outfd, int inpipe)
{
  int pfd[2];

  if (inpipe) {
    if (pipe(pfd) < 0) {
      perror("pipe");
      return -1;
    }
  }

  pid_t pid = fork();
  if (pid < 0) {
    perror("fork");
    return -1;
  }
  if (pid > 0) {
    if (*outfd >= 0) close(*outfd);
    if (inpipe) {
      close(pfd[0]);
      *outfd = pfd[1];
    }
    return pid;
  }

  // Child code:
  if (c->input  != NULL) redir( STDIN_FILENO, c->input,  O_RDONLY);
  if (c->output != NULL) redir(STDOUT_FILENO, c->output, O_WRONLY|O_CREAT| (c->append? O_APPEND: O_TRUNC));
  if (*outfd >= 0) {
    dup2(*outfd, STDOUT_FILENO);
    close(*outfd);
  }
  if (inpipe) {
    close(pfd[1]);
    dup2(pfd[0], STDIN_FILENO);
    close(pfd[0]);
  }
  execvp(c->argv[0], c->argv);
  perror("exec");
  exit(1);
}


Then the above run command would be executed in the following manner:

int outfd = -1;
pid_t pid_a, pid_b, pid_c;

// cmd_a | cmd_b | cmd_c
pid_c = run(cmd_c, &outfd, TRUE);
pid_b = run(cmd_b, &outfd, TRUE);
pid_a = run(cmd_a, &outfd, FALSE);

// Now wait for processes pid_a, pid_b and pid_c to complete.


How this works:

  1. outfd represents the write side of a pipe. If inpipe is TRUE, then a new pipe is created which will be the input to the child and the write side will be passed back to the parent process in outfd.

  2. In the first invocation, outfd is -1 and inpipe is TRUE, indicating that a new pipe should be created that the child will read from, but outfd indicates that it will write to the terminal. In the parent, outfd is replaced with the write side of the pipe created in this invocation.

  3. In the second invocation, outfd is the write side of the pipe created in the first invocation and inpipe remains TRUE, indicating that we will create another pipe (to join cmd_a and cmd_b together). In the parent outfd is replaced with the write side of the new pipe whereas in the child outfd replaces the standard output and the new pipe replaces its standard input.

  4. In the final invocation, outfd is the write side of the pipe created in the second invocation and inpipe is now FALSE, indicating we need no more pipes created. outfd in the child is replaced