3. multi-programming

3.1 进程的基本概念

A process is an executing instance of a program in an operating system. It has its own execution state, data space, and system resources (such as CPU time, memory, file descriptors, etc.). Each process is uniquely identified and managed through a Process Identifier (PID).
Processes can run independently or collaborate with other processes. They can also use signals for communication and synchronization with other processes. Processes are the fundamental execution units in an operating system, and the operating system utilizes process scheduling algorithms to allocate CPU time and system resources, enabling multitasking and concurrent execution. Multiple processes can run simultaneously on the operating system, each having its own address space and execution environment, unaffected by others. The operating system employs process management mechanisms to ensure security and stability among processes, preventing interference and maintaining system stability.
Programs are sets of instructions or code used to perform specific tasks or complete particular work. Typically, programs are written in source code and need to be compiled or interpreted to produce executable code that can run on a computer.
Processes are instances of executing programs. They encompass the program's code, data, and the current execution state. In the operating system, each process has its unique Process Identifier (PID) used for identification. The operating system uses process scheduling algorithms to control the execution sequence of processes to achieve concurrent execution of multiple processes.
In simple terms, programs are static—they are just sets of instructions, whereas processes are dynamic—they are instances of running programs. Programs become processes only when loaded into memory and executed.

3.2 Understanding Processes with Examples

In Windows systems, we can manage processes using the Task Manager, which monitors the running status of system processes, allowing administrators to terminate out-of-control processes. In Linux systems, although process management is primarily done through commands, the main purpose remains the same—to view running programs and processes, assess server health, and forcefully terminate unnecessary processes.
A common real-life example is opening an application on a computer, like double-clicking the text editor icon to launch it. When you open the text editor, a process is created. This process is responsible for executing the code of the text editor program and consumes some system resources (such as memory, CPU time, etc.). When you type and save text in the text editor, the process writes the data to a file on the disk.

3.3 Process States

In a single-tasking operating system, the CPU can only execute one process at a time. While one process is running, other processes must wait until it finishes to start running.
In a multi-tasking operating system, the CPU can handle multiple processes simultaneously. When a process is allocated a CPU time slice, it can run, while other processes wait in the ready queue for their next time slice. A multi-tasking operating system can run multiple processes simultaneously, selects processes for execution using a scheduler, and switches between processes through context switching. On the other hand, a single-tasking operating system can only handle one process and uses simple scheduling algorithms to determine which process runs. In a multi-tasking operating system, a process may undergo various states during its execution:

Start State: When a process is created, it enters the Start state. At this point, the operating system allocates necessary resources and space for the process.
Ready State: When a process has acquired all the required resources and is waiting for CPU time, it is in the Ready state. The process is prepared to run and only needs to be scheduled by the CPU to execute.
Running State: When the CPU executes a process, it is in the Running state. In this state, the process uses CPU time to execute its instructions and can access its allocated resources.
Blocked State: When a process is waiting for an event to occur during its execution, such as waiting for user input or waiting for a file to be read, it is in the Blocked state. In this state, the process does not occupy CPU time but cannot continue execution until the event occurs and the process returns to the Ready state.
Terminated State: When a process completes its execution or is terminated for some reason, it is in the Terminated state. At this point, the process is removed, and its allocated resources are released.
The above five states constitute the lifecycle of a process, and a process transitions between these states until it reaches the Terminated state.

3.4 Process Control Block

A process consists of three parts: the Process Control Block (PCB), related program segment, and data set for operations. The Process Control Block (PCB) is one of the key data structures used by the operating system to implement process management. It contains all the information necessary for the operating system to control and manage processes. The operating system uses information in the PCB to perform functions such as process scheduling, synchronization, and communication.

PCBs are usually stored in the kernel of the operating system. The operating system utilizes the PCBs to maintain the state of processes and control their execution. The information in each process's PCB is unique, enabling the operating system to identify and manage each process correctly.
When creating a process, the system first creates its PCB and then implements effective management and control based on the information in the PCB. When a process completes its function, the system releases the PCB, and the process ceases to exist.
The information typically needed by the operating system to handle processes includes the following:
- Process Identifier (PID): Each process has a unique PID used to distinguish different processes.
- Process State: Represents the current state of the process, such as Ready, Running, Blocked, etc.
- Registers: Store the values of registers during the process's execution, including general-purpose registers, program counter, stack pointer, etc.
- Process Priority: Determines the priority of the process in the Ready queue, allowing the operating system to schedule processes based on their priority.
- Process Scheduling Information: Includes information about the process's time slice size and the number of time slices used.
- Process Wait Queue Pointer: Points to the next process's PCB in the wait queue.
- Process Open File List: Records the files opened by the process and their file descriptors.
- Process Memory Management Information: Contains information about the memory space used by the process, including its starting address and size.
- Process Resources: Records the various resources used by the process, such as opened files, memory space, I/O devices, etc.
- Process Communication Information: Records information about the process's communication with other processes, such as message queues, pipes, shared memory, etc.
- Parent and Child Process Relationship: Records the relationship between the process's parent and child processes, enabling inter-process communication and coordination.

3.5Process Identifiers

A process identifier is a unique identifier used in an operating system to identify each process and is commonly known as PID (Process ID). The PID is an integer value that is unique within the operating system.

Typically, the operating system assigns a unique PID to each process according to certain rules. The PID remains unchanged while the process is running, and once the process terminates, the PID is reclaimed by the system and later reassigned to new running processes.

3.5.1 ps

In a Linux system, the ps command is used to list the currently running processes.
Commonly used commands:
```
ps aux  # Displays all processes in the system
```
```
ps -le  # Displays all processes and also shows the PID of the parent process and process priority
```
```
ps -ef
```
- a: Shows all processes associated with a terminal, except session leaders
- u: Shows the user and memory usage of the processes
- x: Shows processes without a controlling terminal
- -l: Shows detailed information in long format
- -e: Displays all processes.
- f: Displays processes in ASCII character tree structure to express the relationships between programs.

Meaning of the ps command output:

USER/UID: The user who created the process.
PID: The process ID.
PPID: The parent process ID.
C: The percentage of CPU utilization during the process's lifetime.
STIME: The system time when the process started.
%CPU: The percentage of CPU resources used by the process.
%MEM: The percentage of physical memory used by the process.
VSZ: The virtual memory size occupied by the process in KB.
RSS: The actual physical memory size occupied by the process in KB.
TTY: The terminal where the process is running.

STAT: The process status. Common states include:
    -D: Uninterruptible sleep state (usually I/O-bound processes).
    -R: Running state.
    -S: Sleeping state.
    -T: Stopped state, either in the background or in a debug state.
    -W: Swapped out process (invalid since 2.6 kernels).
    -Z: Zombie process. The process has terminated, but some information remains in memory.
    -<: High-priority.
    -N: Low-priority.
    -L: Process has pages locked into memory.
    -s: Includes child processes.
    -l: Multi-threaded (lowercase L).
    -+: Process is in the background.

START: The start time of the process.
TIME: The CPU time used by the process, not the system time.
COMMAND: The command that generated this process.

Difference between ps aux and ps -ef:
- The ps aux command provides more detailed output, including PID, CPU usage, memory usage, start time, command, etc. However, the command line may be truncated for each line.
- The ps -ef command provides a relatively concise output, including PID, PPID, C, STIME, TTY, TIME, and CMD, but the command line is not truncated.

3.5.2 top

The ps command is used to view the process information statically. If you want to view the process information in real-time, you can use the top command.
Parameters:
- -d: Change the display update speed, specifying how often the top command updates. The default is 3 seconds
- -b: Use batch mode output. Generally used with the -n option to output the top result to a file
- -n: The number of updates before top exits
- -p: Specify the PID of a particular process to monitor
- -s: Run top in secure mode to avoid errors in interactive mode
- -u: Only monitor processes of a specific user
- -c: Toggle command-line display, showing the complete command and path.
- -q: Set the display speed to have no delay.

Commonly used commands:

top        # Displays process information
top -c     # Displays the full command
top -b     # Displays program information in batch mode
top -S     # Displays program information in cumulative mode
top -n 2   # Sets the number of updates, and stops after two updates
top -p 139 # Displays information of the specified PID
top -n 10  # Displays updates ten times and then exits

Within the top command display window, you can also use the following keys for interactive operations:
- ? or h: Show interactive mode help
- P: Sort processes by CPU usage (default)
- M: Sort processes by memory usage
- N: Sort processes by PID
- T: Sort processes by cumulative CPU time, indicated by TIME+
- k: Send a signal to a specific process by PID. Often used to terminate a process; signal 9 is used for forceful termination
- r: Reset the priority (nice value) of a specific process by PID
- q: Exit the top command.

3.5.3 kill

In Linux systems, the kill command is used to send signals to specific processes to control their behavior. The basic syntax is as follows:
```
kill [signal] PID...
```
- The signal parameter specifies the type of signal to send, which can be the signal name or signal number
- The PID parameter specifies the process identifier of the target process(es). You can specify one or multiple PIDs separated by spaces

Commonly used signals with their meanings:

Signal Number	Signal Name	Meaning
0	EXIT	Sent when the process exits.
1	HUP	Hang-up signal, often used to restart certain processes without termination.
2	INT	Interrupt signal, typically used for ending a process (equivalent to `Ctrl+C`).
3	QUIT	Quit signal.
9	KILL	Kill signal, forcibly terminates a process.
11	SEGV	Segmentation fault.
15	TERM	Terminate signal, the default signal sent by the `kill` command.

For example, to forcefully terminate the process with PID 2246:
```
kill -9 2246
```
For more detailed signal information, you can refer to the inter-process communication signal section.

3.6 Process Creation

3.6.1 Creating Processes using fork

In Linux systems, a new process can be created using the fork function, and the process created by fork is called the child process.
The child process is almost identical to the original process, including code, data, and open file descriptors. However, they do not share memory, and the Process ID (PID) of the child process is different from that of the parent process. The PID of the child process is allocated by the operating system and is unique. The fork system call is declared as follows:
```
#include <unistd.h>
pid_t fork();
```
Return values:
- If the creation is successful, the fork function will return the PID of the child process in the parent process's program
- If the creation is successful, the fork function in the child process will return 0
- If the creation fails, fork returns a negative value

Creating a child process using fork:

#include <stdio.h>
#include <unistd.h>
int main(void)
{
    pid_t pid;
    pid = fork();
    if (pid < 0)
    {
        printf("fork is error \n");
        return -1;
    }
    //父进程
    if (pid > 0)
    {
        printf("This is parent,parent pid is %d\n", getpid());
    }
    //子进程
    if (pid == 0)
    {
        printf("This is child,child pid is %d,parent pid is %d\n",getpid(),getppid());
    }
    return 0;
}

getpid(): Obtain the PID of the current process

getppid(): Obtain the PID of the parent process of the current process

Compile and execute the program:
```
gcc -o fork fork.c
./fork
```

3.6.2 Creating Processes using exec

In Linux systems, the exec is a group of system call functions used to execute other programs. The exec function can replace the current process with another process, enabling dynamic loading and replacement of programs.

The exec() function family includes various variants:

#include <unistd.h>

int execl(const char *path, const char *arg, ...);
int execlp(const char *file, const char *arg, ...);
int execle(const char *path, const char *arg,..., char * const envp[]);
int execv(const char *path, char *const argv[]);
int execvp(const char *file, char *const argv[]);
int execvpe(const char *file, char *const argv[],char *const envp[]);

Declaration of the execl function:

#include <unistd.h>

int execl(const char *path, const char *arg, ...);

Parameter meanings:
- path: A pointer to the path of the file to be executed
- arg and subsequent ellipses: Represent the list of parameters passed when executing the program. The parameters after path are argv[0], the second is argv[1], etc. For system command programs like the ls command, argv[0] is necessary, but its value can be a meaningless string

Using execl to execute the system command "ls":

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main(void)
{
    int i=0;
    pid_t pid;
    pid = fork();
    if (pid < 0)
    {
        printf("fork is error \n");
        return -1;
    }

    //父进程
    if (pid > 0)
    {
        printf("This is parent,parent pid is %d\n", getpid());
    }
    //子进程
    if (pid == 0)
    {
        printf("This is child,child pid is %d\n", getpid(), getppid());
        execl("/bin/ls", "lsakakk", "-l", NULL);
        exit(1);
    }
    i++;
    //printf("i is %d\n",i);
    return 0;

3.7 Orphan Processes and Zombie Processes

Orphan processes refer to the situation where the parent process exits or is killed before the child process, resulting in the child process becoming an orphan. Orphan processes are adopted by the init process (a special process with a process number 1), becoming a child process of the init process. This is because all processes must have a parent process, and the init process is the first process that runs when the system starts, so it has no parent process. When an orphan process terminates, its resources are reclaimed.
Zombie processes refer to processes that have finished executing but their parent process has not yet called wait() or waitpid() to obtain their exit status. As a result, their process descriptors still exist in the system process table but have no process control blocks and memory space. Zombie processes do not consume CPU time and memory space but occupy an entry in the process table. When there are a large number of zombie processes in the system, it can impact system performance. Zombie processes can be reclaimed by having the parent process call wait() or waitpid().
Failure to handle zombie and orphan processes in a timely manner may lead to the following consequences:
- Zombie processes occupy entries in the system process table, wasting system resources and reducing system performance
- Orphan processes have no parent process to manage and control them, which may lead to resource leaks or system crashes
- If a large number of zombie and orphan processes accumulate in the system, the process table may become full, preventing the creation of new processes
- Since orphan processes are adopted by the init process, if there are issues with the init process, it may lead to system crashes or abnormal operation
- Therefore, handling zombie and orphan processes is crucial. Zombie processes can be reclaimed by having the parent process call wait() or waitpid() to free up resources. Orphan processes need to be terminated or allowed to end in a timely manner and release their resources back to the system.
Declaration of the wait function:
```
#include <sys/wait.h>
pid t wait(int *status)
```
- Return value: If successful, it returns the PID of the reclaimed child process; if failed, it returns -1
Two macros related to the wait function's parameter:
- WIFEXITED(status): If this macro evaluates to true, it means that the child process terminated normally
- WEXITSTATUS(status): If the child process exited normally, this macro will contain the exit value of the child process
```
if(WIFEXITED(status))
{
    printf("退出值为 %d\n", WEXITSTATUS(status));
}
```

Example program:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
int main(void)
{
    int i=0;
    pid_t pid;
    pid = fork();
    if (pid < 0)
    {
        printf("fork is error \n");
        return -1;
    }
    if (pid > 0)
    {
        int status;
        wait(&status);
        if(WIFEXITED(status)==1)
        {
            printf("return value is %d\n",WEXITSTATUS(status));
        }
    }
    if (pid == 0)
    {
        sleep(2);
        printf("This is child\n");
        exit(6);
    }
    return 0;

3.8 Process Scheduling

Linux process scheduling refers to the process by which the operating system kernel determines which process should run. The goals of Linux process scheduling are to improve system responsiveness, throughput, and fairness. In Linux, the core of process scheduling is a scheduler responsible for deciding which process should be given a CPU time slice.
Linux process scheduling is implemented based on the time slice round-robin algorithm. Each process is assigned a time slice, and when the time slice expires, the scheduler suspends the currently running process and allocates the CPU time slice to the next waiting process. This round-robin allocation of CPU time slices ensures that each process has a chance to run and does not continuously occupy CPU resources, allowing other processes to execute.
The priority of process scheduling in Linux is dynamically adjusted. Each process is assigned a priority based on its scheduling policy, real-time requirements, and historical behavior. Based on these priorities, the scheduler determines which process should be given a CPU time slice.
There are two process scheduling strategies in Linux: time slice round-robin scheduling and real-time scheduling. Time slice round-robin scheduling is the default scheduling strategy and is suitable for most applications. Real-time scheduling is used for applications with strict response time requirements, such as control systems and embedded systems.
Linux process scheduling is a complex system that involves various factors, including the process's state, priority, and resource requirements

3.9 Classification of Processes

In the Linux system, processes are generally classified into three categories: foreground processes, background processes, and daemon processes.

3.9.1 Daemon Processes

Daemon processes are special background processes that usually do not interact directly with users and are not affected by user logins or logouts. They run to perform specific tasks, such as providing services or monitoring system status.
Most of Linux's servers are implemented using daemon processes. Common daemon processes include the system log process syslogd, web server httpd, mail server sendmail, and database server mysqld, among others.
Daemon processes typically start running during system boot-up and run with superuser privileges. They often require access to special resources or use special ports (1-1024). Daemon processes continue running until the system shuts down unless forcibly terminated. Their parent process is the init process, as their true parent process exits after forking the child process and then exits before the child process. Therefore, they are orphan processes inherited by init. Since daemon processes are non-interactive programs without a controlling terminal, any output requires special handling. Usually, the names of daemon processes end with "d", such as sshd, xinetd, and crond.

3.9.2 Writing a Daemon Process

Process Group: A collection of one or more processes, identified by the process group ID. The process group leader has the same process ID as the process group ID, and the process group ID is not affected by the exit of the process group leader.
Session: A collection of one or more process groups. For example, from login to logout, all processes run by the user belong to the same session.
setsid Function: Creates a new session and becomes the session leader of that session. The purpose of calling setsid is to detach the process from the original session, process group, and controlling terminal.

Create a child process and exit the parent process.
- As we have learned earlier, when the parent process exits before the child process, it becomes an orphan process. Then, it is adopted by the 1st init process, making the child process a child of the init process.
The child process creates a new session.
- Call setsid to create a new session, detach from the original session, process group, and terminal control, and become the leader of the new session.
Change the current working directory to the root directory.
- The running process's file system (e.g., "/mnt/usb") cannot be unmounted. If the directory needs to be rolled back, the process cannot achieve it. To avoid this inconvenience, the common function to change the working directory is chdir, typically using the root directory as the current directory for the daemon process.
Reset the file permission mask.
- The child process inherits the file permission mask from the parent process. If it is not reset, it can cause various troubles when the child process uses files. The function to set the file mask is umask, and here, umask(0) is used to enhance the flexibility of the daemon process.
Close unnecessary file descriptors.
- The child process also inherits file descriptors from the parent process. Those file descriptors that are not needed by the daemon process will waste system resources and may prevent the file system from terminating.
Daemon process exit handling.
- When users need to stop the daemon process externally, they often use the kill command to stop the daemon process. Therefore, the daemon process needs to implement signal handling for the signals sent by kill to achieve a proper process exit.

Create a daemon process:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/stat.h>

int main(void)
{
    pid_t pid;
    // Step 1: Create a new process
    pid = fork();
    // The parent process exits directly
    if (pid > 0)
    {
        exit(0);
    }
    if (pid == 0)
    {
        // Step 2: Call setsid function to detach from the control terminal
        setsid();
        // Step 3: Change the working directory
        chdir("/");
        // Step 4: Reset the umask
        umask(0);
        // Step 5: Close file descriptors 0, 1, and 2
        for (int i = 1; i < 4; i++)
        {
            close(i);
        }
        while (1)
        {
        }
    }
    return 0;
}

3.1 进程的基本概念​

3.2 Understanding Processes with Examples​

3.3 Process States​

3.4 Process Control Block​

3.5Process Identifiers​

3.5.1 ps​

3.5.2 top​

3.5.3 kill​

3.6 Process Creation​

3.6.1 Creating Processes using fork​

3.6.2 Creating Processes using exec​

3.7 Orphan Processes and Zombie Processes​

3.8 Process Scheduling​

3.9 Classification of Processes​

3.9.1 Daemon Processes​

3.9.2 Writing a Daemon Process​