Re: 【系程】教学: 简介 fork, exec*, pipe, dup2 - b97902HW板

作者LoganChien (简子翔)
看板b97902HW
标题Re: [系程] 教学: 简介 fork, exec*, pipe, dup2
时间Fri Mar 19 07:06:36 2010
简介 fork, exec*, dup2, pipe实作 Command Interpreter 的 Pipeline：上一篇的综合练习

看完上一篇，大家应该有能力写一个具有 Pipeline 功能的简单
Command Interpreter。所谓的 Command Interpreter 就像是
bash、ksh、tcsh 之类的东西，我们也称之为 shell。一般而言
会是你登入一个系统之後第一个执行的程式。

而我们所谈论的 Pipeline 有一点像 IO redirection。例如我
下达以下的指令：

  command1 | command2 | command3

此时 command1 的 stdout 会被当作 command2 的 stdin；command2
的 stdout 会被当作 command3 的 stdin。而当上面的指令执行时，
command1 与 command3 的标准输出都不会显示到萤幕上。

例如：cat /etc/passwd 指令是用来把 /etc/passwd 这一个
档案的档案内容印到 stdout 上面；而 grep username 是从
stdin 读入每一行，如果某一行有 username 就输出该行到
标准输出。所以当他们用 pipeline 组合在一起：

  cat /etc/passwd | grep username

就会变成在萤幕上显示 /etc/passwd 之中含有 username 的
那几行。当然，如果灵活使用 pipeline 可以用很少的指令
变化出很多功能。因此 pipeline 在 *nix 环境下是很重要的
东西。你能用 open/close/dup2/exec*/fork 写出一个具有
Pipeline 功能的 Command Interpreter 吗？

以下是我写到一半到程式码，他已经可以把使用者输入的指令
转换成若干个可以传给 execvp 的 argv，只剩 pipeline 的
部分还没有写完，你可以试着写写看：

  http://w.csie.org/~b97073/B/todo-pipeline-shell.c





(防雷，按 Page Down 继续阅读)













你也可以直接下载我随手写的版本：

  http://w.csie.org/~b97073/B/simple-pipeline-shell.c

这一份程式码其实没有新得东西，就是利用先前介绍过的：IO
redirection (red.c 使用的方法)，与使用 fork/exec 来建立
child process。

我在执行 command1 的时候，我把他的 stdout 导向一个档案。当他结束之後，我再把这个档案做为 stdin 导入 command2，而 command2 的 stdout 再导入另一个档案... 以下类推。

我们还是看一下其中的 creat_proc 与 execute_cmd_seq 二个函式：

/* Purpose: Create child process and redirect io. */
void creat_proc(char **argv, int fd_in, int fd_out)
{
    /* creat_prc 函式主要的目的是建立 child process，并且做好 IO redirection。       它的参数有三个：argv 是将来要传给 execvp 用的；fd_in、fd_out 分别是       输入输出的 file descriptor。 */

    pid_t proc = fork();

    if (proc < 0)
    {
        fprintf(stderr, "Error: Unable to fork.\n");
        exit(EXIT_FAILURE);
    }
    else if (proc == 0)
    {
        if (fd_in != STDIN_FILENO)
        {
            /* 把 fd_in 复制到 STDIN_FILENO */
            dup2(fd_in, STDIN_FILENO);
            /* 因为 fd_in 没有用了，就关掉他 */
            close(fd_in);
        }

        if (fd_out != STDOUT_FILENO)
        {
            /* 把 fd_out 复制到 STDOUT_FILENO */
            dup2(fd_out, STDOUT_FILENO);
            /* 因为 fd_out 没有用了，就关掉他 */
            close(fd_out);
        }

        /* 载入可执行档，我直接把 argv[0] 当成 executable name */
        if (execvp(argv[0], argv) == -1)
        {
            fprintf(stderr,
                    "Error: Unable to load the executable %s.\n",
                    argv[0]);

            exit(EXIT_FAILURE);
        }

        /* NEVER REACH */
        exit(EXIT_FAILURE);
    }
    else
    {
        int status;
        wait(&status); /* 等程式执行完毕 */
    }
}


/* Purpose: Create several child process and redirect the standard output * to the standard input of the later process. */
void execute_cmd_seq(char ***argvs)
{
    int C;
    for (C = 0; C <= MAX_CMD_COUNT; ++C)
    {
        char **argv = argvs[C];
        if (!argv) { break; }

        int fd_in = STDIN_FILENO;
        int fd_out = STDOUT_FILENO;

        if (C > 0)
        {
            /* 开启暂存档案 */
            fd_in = open(pipeline_tmp_[C - 1], O_RDONLY);

            if (fd_in == -1)
            {
                fprintf(stderr, "Error: Unable to open pipeline tmp r.\n");
                exit(EXIT_FAILURE);
            }
        }

        if (C < MAX_CMD_COUNT && argvs[C + 1] != NULL)
        {
            /* 开启暂存档案 */
            fd_out = open(pipeline_tmp_[C],O_WRONLY | O_CREAT | O_TRUNC,0644);

            if (fd_out == -1)
            {
                fprintf(stderr, "Error: Unable to open pipeline tmp w.\n");
                exit(EXIT_FAILURE);
            }
        }

        creat_proc(argv, fd_in, fd_out);

        if (fd_in != STDIN_FILENO) { close(fd_in); }
        if (fd_out != STDOUT_FILENO) { close(fd_out); }
    }
}



直接用暂存档案实作 pipeline 的缺点

不过上面直接用暂存档案来达成 pipeline 有什麽缺点呢？

(1) 就是慢！因为不过是要让二个程式相互沟通而已，实在没有必要把内容写入硬碟。而且可能会用去为数不少的空间。例如：执行
    这个指令一定很花时间与硬碟空间：

    tar c / | tar xv -C .

(2) command1, command2, .. commandN 只能够依序轮流执行。因为
    如果 command1 还没写完，而 command2 读得比较快，则 command2
    可能误以为 command1 的输出已经结束了。所以为了避免资料不完
    整，我们只能在 command1 结束之後再执行 command2。然而这样可
    能比较浪费时间。

那有没有解决的方法呢？这就是我们下一个要介绍的系统呼叫：pipe()。



pipe：二个 Process 之间沟通的桥梁

pipe 顾名思意就是水管的意思，当我们呼叫 pipe 的时候，他会为
我们开启二个 File descriptor，一个让我们写入资料，另一个让我们读出资料。他的主要用途是让二个 Process 可以互相沟通(Inter-process Communication, IPC)。在大多数的系统中，pipe 是使用记
忆体来当 buffer，所以会比直接把档案写到硬碟有效率。pipe 的函
式原型如下：

  int pipe(int fds[2]);

当我们呼叫 pipe 的时候，我们必需传入一个大小至少为 2 的 int阵列，pipe 会在 fds[0] 回传一个 Read Only 的 File descriptor，
在 fds[1] 回传一个 Write Only 的 File descriptor。当二个
Processs 要相互沟通的时候，就直接使用 write 系统呼叫把资料
写进 pipe，而接收端就可以用 read 来读取资料。

另外，和一般的档案不同，除非 pipe 的 write-end (写入端) 全部
都被 close 了，不然 read 会一直等待新的输入，而不是以为已经
走到 eof。

备注：虽然我们是从 Pipeline 开始提到 pipe()，不过，Pipeline      未必要用 pipe() 实作。pipe() 的应用领域也不限於 Pipeline。      不过以 pipe() 实作 Pipeline 确实是一个很有效率的方法，      究我所知，GNU bash 就是使用 pipe() 来实作 Pipeline。

我们可以看一下一个简单的 Multiprocess Random Generator 的范例：

/* 程式码： pipe-example.c */

#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#include <unistd.h>

enum { RANDOM_NUMBER_NEED_COUNT = 10 };


int main()
{
    int pipe_fd[2];

    if (pipe(pipe_fd) == -1) /* 建立 pipe */
    {
        fprintf(stderr, "Error: Unable to create pipe.\n");
        exit(EXIT_FAILURE);
    }

    pid_t pid;

    if ((pid = fork()) < 0) /* 注意：fork 的时候，pipe 的 fd 会被 dup */
    {
        fprintf(stderr, "Error: Unable to fork process.\n");
        exit(EXIT_FAILURE);
    }
    else if (pid == 0)
    {
        /* -- In the Child Process -------- */

        /* Close Read End */close(pipe_fd[0]); /* close read end, since we don't need it. */
        /* 我们在 Child Process 只想要当写出端，所以我们就要先把 pipe 的 read           end 关掉 */

        /* My Random Number Generator */
        srand(time(NULL));

        int i;
        for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
        {
            sleep(1); // wait 1 second

            int randnum = rand() % 100;
            /* 把资料写出去 */write(pipe_fd[1], &randnum, sizeof(int));
        }

        exit(EXIT_SUCCESS);
    }
    else
    {
        /* -- In the Parent Process -------- */

        /* Close Write End */close(pipe_fd[1]); /* Close write end, since we don't need it. */
        /* 不会用到 Write-end 的 Process 一定要把 Write-end 关掉，不然 pipe           的 Read-end 会永远等不到 EOF。 */

        int i;
        for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
        {
            int gotnum;
            /* 从 Read-end 把资料拿出来 */read(pipe_fd[0], &gotnum, sizeof(int));

            printf("got number : %d\n", gotnum);
        }
    }

    return EXIT_SUCCESS;
}


虽然上面的例子展示了二个 Process 之间如何沟通。不过只看这个
例子看不出 pipe 的价值。我们的第二个例子就是要利用 pipe 来
拦截另一个 Program 的 standard output。

在第二个例子之中，我们会有二个 Program，也就是会有二个可执行
档案。其中一个专门付负制造 Random Number，然後直接把 32-bit
int 写到 standard output。而令一个会去呼叫前述的 Random Number
制造程式，然後拦截他的 standard output。


/* 程式码： random-gen.c *//* 这一个档案就没有什麽特别的，就只是不断制造 Random Number */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#include <unistd.h>

enum { RANDOM_NUMBER_NEED_COUNT = 10 };

int main()
{
    srand(time(NULL));

    int i;
    for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
    {
        sleep(1); /* Wait 1 second.  Simulate the complex process of
                     generating the safer random number. */

        int randnum = rand() % 100;
        write(STDOUT_FILENO, &randnum, sizeof(int));        /* 注意：是写到 stdout 。*/
    }

    return EXIT_SUCCESS;
}


/* 程式码：pipe-example-2.c */

#include <stdio.h>
#include <stdlib.h>

#include <unistd.h>

enum { RANDOM_NUMBER_NEED_COUNT = 10 };

int main()
{
    /* -- Prepare Pipe -------- */
    int pipe_fd[2];

    if (pipe(pipe_fd) == -1)
    {
        fprintf(stderr, "Error: Unable to create pipe.\n");
        exit(EXIT_FAILURE);
    }


    /* -- Create Child Process -------- */
    pid_t pid;
    if ((pid = fork()) < 0)
    {
        fprintf(stderr, "Error: Unable to create child process.\n");
        exit(EXIT_FAILURE);
    }
    else if (pid == 0) /* In Child Process */
    {
        /* Close Read End */
        close(pipe_fd[0]); /* Close read end, since we don't need it. */

        /* Bind Write End to Standard Out */
        dup2(pipe_fd[1], STDOUT_FILENO);        /* 把第 pipe_fd[1] 个 file descriptor 复制到第 STDOUT_FILENO 个           file descriptor */

        /* Close pipe_fd[1] File Descriptor */
        close(pipe_fd[1]);        /* 说明：经过上面三个步骤之後，这个 Child Process 的第 1 号 File           Descriptor 会是 pipe 的 Write-end，所以在我们做标准输出的时候，           所有的资料都跑进我们的 pipe 里面。因此另一端的 Read-end 就可以           接收到 random-gen 的标准输出。 */

        /* Load Another Executable */
        execl("random-gen", "./random-gen", (char *)0);

        /* This Process Should Never Go Here */
        fprintf(stderr, "Error: Unexcept flow of control.\n");
        exit(EXIT_FAILURE);
    }
    else /* In Parent Process */
    {
        /* Close pipe_fd[1] File Descriptor */
        close(pipe_fd[1]); /* Close write end, since we will not use it. */

        /* Read Random Number From Pipe */
        int i;
        for (i = 0; i < RANDOM_NUMBER_NEED_COUNT; ++i)
        {
            int gotnum = -1;
            read(pipe_fd[0], &gotnum, sizeof(int));

            printf("got number : %d\n", gotnum);
        }
    }

    return EXIT_SUCCESS;
}



再回头写 Command Interpreter：加上 pipe() 系统呼叫，你可以写得更好吗？

这是我写得另一个版本(使用 pipe() 的版本)：

  http://w.csie.org/~b97073/B/faster-pipeline-shell.c

这次我先检查指令有多少个 '|'，这代表我要准备多少的 pipe。接
着我为每一个 commandI 都用 fork 建立一个 Process，让所有的
Process 可以用时执行。

另外，使用 pipe() 来实作有一个好处，就是如果 command2 要
read 东西，可是 command1 还没有算完，command2 的 read 就会
一直等下去。所以我们不用依序轮流执行。所有的 process 可以
并行运作，除非遇到 IO blocking。而且使用 pipe() 也省去了暂
存档案命名的困扰。

但是写 pipe 的版本就要注意：对於所有的 Process，如果该 Process不需要 Write-end 就一定要记得关掉他，不然像是 cat 或者 grep
的程式就会一直等不到 EOF，也就不会结束了！

我们可以快速地看一下 execute_cmd_seq 与 creat_proc 二个函式：

/* Purpose: Create several child process and redirect the standard output * to the standard input of the later process. */
void execute_cmd_seq(char ***argvs)
{
    int C, P;

    int cmd_count = 0;
    while (argvs[cmd_count]) { ++cmd_count; }

    int pipeline_count = cmd_count - 1;

    int pipes_fd[MAX_CMD_COUNT][2];

    /* 准备足够的 pipe */
    for (P = 0; P < pipeline_count; ++P)
    {
        if (pipe(pipes_fd[P]) == -1)
        {
            fprintf(stderr, "Error: Unable to create pipe. (%d)\n", P);
            exit(EXIT_FAILURE);
        }
    }

    for (C = 0; C < cmd_count; ++C)
    {
        int fd_in = (C == 0) ? (STDIN_FILENO) : (pipes_fd[C - 1][0]);
        int fd_out = (C == cmd_count - 1) ? (STDOUT_FILENO) : (pipes_fd[C][1]);

        /* 呼叫下面的 creat_proc 来建立 Child Process */
        creat_proc(argvs[C], fd_in, fd_out, pipeline_count, pipes_fd);
    }

    /* 在建立所有 Child Process 之後，Parent Process 本身就不必使用 pipe       了，所以关闭所有的 File descriptor。*/
    for (P = 0; P < pipeline_count; ++P)
    {
        close(pipes_fd[P][0]);        close(pipes_fd[P][1]);
    }

    /* 等待所有的程式执行完毕 */
    for (C = 0; C < cmd_count; ++C)
    {
        int status;
        wait(&status);
    }
}


/* Purpose: Create child process and redirect io. */
void creat_proc(char **argv,
                int fd_in, int fd_out,
                int pipes_count, int pipes_fd[][2])
{
    pid_t proc = fork();

    if (proc < 0)
    {
        fprintf(stderr, "Error: Unable to fork.\n");
        exit(EXIT_FAILURE);
    }
    else if (proc == 0)
    {
        /* 把 fd_in 与 fd_out 分别当成 stdin 与 stdout。 */
        if (fd_in != STDIN_FILENO) { dup2(fd_in, STDIN_FILENO); }
        if (fd_out != STDOUT_FILENO) { dup2(fd_out, STDOUT_FILENO); }

        /* 除了 stdin, stdout 之外，所有的 File descriptor (pipe) 都要关闭。*/
        int P;
        for (P = 0; P < pipes_count; ++P)
        {
            close(pipes_fd[P][0]);            close(pipes_fd[P][1]);
        }

        if (execvp(argv[0], argv) == -1)
        {
            fprintf(stderr,
                    "Error: Unable to load the executable %s.\n",
                    argv[0]);

            exit(EXIT_FAILURE);
        }

        /* NEVER REACH */
        exit(EXIT_FAILURE);
    }
}



结语

我们从一个简单的 io redirect 程式谈起。一路介绍了 exec, fork,
dup2, pipe 等系统呼叫。还写了一个简单的 Command Interpreter。
希望可以透过这二篇小小的篇幅，让大家能对上面四个系统呼叫更为
熟悉。

备注：这二篇大部分的程式码可以在以下的网址取得：

  http://w.csie.org/~b97073/B/sp-article2.tar.gz(完)

--
   LoganChien-----from PTT2 个板 logan-----

--



※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.112.247.159
※ 编辑: LoganChien      来自: 140.112.247.159      (03/19 07:10)
1F：→ xflash96:推。 03/19 07:53
2F：推 qcl:     推！ 03/19 09:33
3F：推 louisyou:推喔! 03/19 09:36
4F：推 hanabi:大推! 03/19 13:13
5F：→ Daniel1147:推 03/19 20:45
6F：推 moonblack:推 03/22 16:27
7F：→ dennis2030:推 03/26 00:05
8F：推 averangeall:太厉害了!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 04/18 17:39
9F：→ Bingojkt:教学文全消推2@w< 04/19 18:15
	[问题/行为] 猫晚上进房间会不会有憋尿问题
	Re: [闲聊] 选了错误的女孩成为魔法少女 XDDDDDDDDDD
	[正妹] 瑞典一张
	[心得] EMS高领长版毛衣.墨小楼MC1002
	[分享] 丹龙隔热纸GE55+33+22
	[问题] 清洗洗衣机
	[寻物] 窗台下的空间
	[闲聊] 双极の女神1 木魔爵
	[售车] 新竹 1997 march 1297cc 白色四门
	[讨论] 能从照片感受到摄影者心情吗
	[狂贺] 贺贺贺贺贺！岛村卯月！总选举NO.1
	[难过] 羡慕白皮肤的女生
	阅读文章
	[黑特]
	[问题] SBK S1安装於安全帽位置
	[分享] 旧woo100绝版开箱!!
	Re: [无言] 关於小包卫生纸
	[开箱] E5-2683V3 RX480Strix 快睿C1 简单测试
	[心得] 苍の海贼龙地狱执行者16PT
	[售车] 1999年Virage iO 1.8EXi
	[心得] 挑战33 LV10 狮子座pt solo
	[闲聊] 手把手教你不被桶之新手主购教学
	[分享] Civic Type R 量产版官方照无预警流出
	[售车] Golf 4 2.0 银色自排
	[出售] Graco提篮汽座（有底座）2000元诚可议
	[问题] 请问补牙材质掉了还能再补吗?(台中半年内
	[问题] 44th 单曲生写竟然都给重复的啊啊！
	[心得] 华南红卡/icash 核卡
	[问题] 拔牙矫正这样正常吗
	[赠送] 老莫高业初业 102年版
	[情报] 三大行动支付本季掀战火
	[宝宝] 博客来Amos水蜡笔5/1特价五折
	Re: [心得] 新鲜人一些面试分享
	[心得] 苍の海贼龙地狱麒麟25PT
	Re: [闲聊] (君の名は。雷慎入) 君名二创漫画翻译
	Re: [闲聊] OGN中场影片：失踪人口局 (英文字幕)
	[问题] 台湾大哥大4G讯号差
	[出售] [全国]全新千寻侘草LED灯, 水草
WEB批踢踢(PTT)

b97902HW 板

Re: [系程] 教学: 简介 fork, exec*, pipe, dup2

热门看板

赞助商连结