C Tutorial: Playing with processes
Waiting for a process to die
We can use fork to create a new process. That child process can then replace itself with a new program via execve. The parent process can use the wait system call to put itself to sleep until a child process has terminated.
In its most basic form, wait takes a parameter that is a pointer to an integer that will contain the exit status of that program when wait returns. It returns the process ID of the child that terminated.
One thing to be aware of is that wait will return whenever any child of that process has terminated or when the process has received a signal. This means that sometimes you may wake up from wait for some other reason. Hence, wait is generally used in a while loop, where you will go back to sleep until the process you are waiting for is detected
Example
This is a variation of the fork and exec program. The parent forks and then blocks (goes to sleep) to wait for the child process to terminate. In this example, we have only a single child but, in the general case, we can for any number of child processes. When the operating system wakes the parent process because it is no longer blocked on wait, the process prints the exit status of the child process. The exit status is an eight-bit value that the child program passed as a parameter to the exit call. It can be obtained with a call to the WEXITSTATUS macro: WEXITSTATUS(status). To ensure the value is legitimate, we can also test the exist status value to see if the child process exited normally (e.g., by calling exit) or if it was terminated via some signal.
- WIFEXITED(status)
- is true if the process exited normally via a call to exit.
- WIFSIGNALED(status)
- is true if the process exited because of the receipt of some signal.
- WIFSTOPPED(status)
- is true if the process has been stopped and can be restarted.
Depending on the how the program exited, we can obtain the following values:
- WEXITSTATUS(status)
- If WIFEXITED is true, this returns the value passed to exit by the child.
- WTERMSIG(status)
- If WIFSIGNALED is true, this returns the signal number that terminated the child process.
- WCOREDUMP(status)
- If WIFSIGNALED is true, this tells us if the process generated a core file containing an image of the process.
- WSTOPSIG(status)
- If WIFSTOPPED is true, this tells us the number of the signal that caused the process to stop.
The child process, meanwhile, runs the "ls -aF /" program. Note that we loop in wait to ignore any wake-ups that do not correspond to the death of our child process (any signal to the process can get it to return from wait.
Save this file by control-clicking or right clicking the download link and then saving it as wait.c.
Compile this program via:
If you don't have gcc, You may need to substitute the gcc command with cc or another name of your compiler.
Run the program:
Simply waiting for all child processes to exit
If we are not interested in identifying the various termination conditions of child processes but just want to have the parent wait for all processes to exit, we can simply loop through wait:
The printf statement, along with pid
can, of course, be omitted.
Note that wait will only report deaths of immediate children. If a child process forks another process and exits without waiting for its child to exit, then the process will be inherited by init, the first process, and not by the grandparent. It's good programming practice to wait for dead children.
Waiting for a specific process: waitpid and wait4
This discussion of wait covered the oldest and most basic form of the system call. There have been more added to do things such as allow you to wait for a specific process (see wait3, wait4, and waitpid. See the man page on your system for some variations. Let's look at two specific variations.
The waitpid call allows us to wait for a specific child process or class of processes to die. Its usage is:
The first parameter allows you to tell the call which process ID to wait for. If a value of -1 is givien, then waitpid waits for any child process to exit. If a value of 0 is given, then the call waits for any child process in the same process group to die (a child may place itself into a new process group; see the man page for setpgrp or setpgid). A negative value is a way of waiting for a child in a specific process group to die. The absolute value of the parameter is the number of the proces group. You'll probably never use process groups.
The second parameter contains a pointer to the exit status and is the same as used by wait. The last parameter, options, is a bit mask of one of two values (which may be combined with the logical OR). WNOHANG tells the call not to block if there are no processes that are ready to report their status. WUNTRACED will report the status of children that are stoppeddue to SIGTTIN, SIGTTOU, SIGTSTP, or SIGSTOP signals. You'll probably never use this.
You might still have to loop
On some operating system, notably Solaris (SunOS), waitpid and other wait system calls in the wait family may return because the system call was interrupted because the process received a signal. We can detect this by checking if the call returns a value of -1 and the external int errno is set to the value EINTR (defined via #include <errno.h>). Any other return of -1 indicates an error with errno containing the error code.
If you want to avoid having to loop on wait, then you need to use the sigaction system call to set the ssignal with the SA_RESTART flag, which is a directive to restart the interrupted system call.
Here's an example of waitpid where we just wait for a specific process to exit. It's the exact same program as the previous example of using wait but with waitpid instead of wait.
Save this file by control-clicking or right clicking the download link and then saving it as waitpid.c.
Compile this program via:
If you don't have gcc, You may need to substitute the gcc command with cc or another name of your compiler.
Run the program:
Getting resource usage of a child process
The wait4 system call is just like waitpid except that it takes an extra pointer as a parameter so that the parent may obtain a summary of resources used by the child process and all of its children.
To use this call, you need to define an rusage structure and pass a pointer to it in wait4:
If you're interested in this information, please see the man page for
getrusage (also here) for information on the contents that are reported. The getrusage is a system call that returns information about running or terminated child processes and returns the same information as wait4 returns in the rusage structure.
You'll see that the first two values are timestamps with a structure of timeval. Look at the man page for gettimeofday (also here) for information on using this structure. Library functions for performing operations on timeval values are defined in timercmp(3). The timeval structs in rusage represent elapsed time and the second and microsecnd components are broken into two elements in the structure: tv_sec and tv_usec. The seconds value, is of time_t and the microseconds value is of type suseconds_t. Both are typically of type long int. If you're not worried about overflow or loss of precision, the elapsed user time in seconds is usage.ru_utime.tv_sec+(usage.ru_utime.tv_usec/1000000.0). If you just want to print the time values, you can do so via:
An important distinction between the getrusage system call and the rusage data returned from wait4 is that getrusage is cumulative. If you request getrusage for your child processes via getrusage(RUSAGE_CHILDREN, &usage), you will get cumulative results for all of the process' children. The results from wait4 apply only to the child returned by the call.
Another important point about the usage information returned by wait4 and getrusage is that any fields beyond the time values may be machine and operating system dependent. Moreover, the values may not be properly populated or populated at all by the operating system. The SunOS man page, for example, specifically states that "Only the timeval member of struct rusage are supported in this implementation." Other data is an approximation at best.