Python subprocess - karasuyamatenguの日記

Pythonからコマンドを操るモジュールをsubprocessの使い方を整理してみた。前半はマニュアルをなぞっている http://docs.python.org/library/subprocess.html subprocessはos.system, os.spawn, os.popen, popen2, commandsなどのモジュールに取って代る位置付けだとは知らなかった。マニュアルは読んでみるものだ。コードはreplにコピペすると(Unix的OSなら)動くはず。

準備



import sys,os

from subprocess import *

コマンドからの出力を捕える

標準出力、標準エラー出力はcommunicate()で
perlだと`cmd args`



output,_=Popen(['/bin/ls', '/etc/hosts'], stdout=PIPE).communicate()

print [output]

['/etc/hosts\n']

コマンドを開始しPIDを得る

os.spawnlp(os.P_NOWAIT, cmd, *argv)の代用
子プロセス終了を待たない
子プロセスはシグナルで制御できる (「コマンドにシグナルを送る」を参照)



pid=Popen(['/bin/sleep', '3']).pid

print pid

直にpidが返ってくる。

コマンドを走らせてステータスを得る

同期で(子プロセスの終了を待って)終了ステータスを得るときはcall()
正常終了のときは0。0以下だと受けたのシグナル値に-1をかけたもの。
retcode = os.spawnlp(os.P_WAIT, cmd, *argv)の近代版



retcode=call(['/bin/sleep', '3'])

print retcode

0

3秒たってからステータス(成功なら0)を返す。

環境変数をコマンドに渡す

Popen(.., env={}, ..)



print Popen(['/bin/echo $HOGE'], 

            shell=True, 

            env={'HOGE' : 'oh hai'}, 

            stdout=PIPE, 

            stderr=PIPE).communicate()

('oh hai\n', '')

communiate()は(標準出力,標準エラー)をtupleとして返す。

パイプライン： Popenを複数開けてstdoutを次のstdinに繋げる

du -sk * | sort -nr をpythonでやる
c1=Popen(.. stdout=PIPE ..); c2=Popen(.. stdin=c1.stdou ..)



du=Popen(['du -sk *'], stdout=PIPE, shell=True, cwd='.')

sort=Popen(['sort', '-nr'], stdin=du.stdout, stdout=PIPE)

du.stdout.close()

out,_=sort.communicate()

print out

コマンドのstdinに書き込む

cmd=Popen(.., stdin=PIPE, ..)と開け、cmd.stdinに書き込む
pipe = os.popen("cmd", 'w', ..)の代用



pipe=Popen(['/usr/bin/wc', '-l'], stdin=PIPE).stdin

for n in range(10):

    pipe.write('%d\n' % n)

pipe.close()

親プロセスの標準出力に10と出る。

コマンドからの出力を読み出す

cmd=Popen(.. stdout=PIPE, ..)と開いて、 cmd.stdoutから読む
communicateは出力が全部が一度に返ってくるのでデータ量がメモリで制限される
これはストリームがらちょっづつ読めるので、制限を受けない



pipe=Popen(['/usr/bin/find', '/etc/'], stdout=PIPE).stdout
for line in pipe.readlines():

    print line.strip()
pipe.close()

標準出力にファイルがプリントされる。標準エラーはそのまま継承される。

stderrとstdoutを一緒に吐く

Popen(.. stderr=STDOUT, .. )
os.popen4と似ている。



print Popen(['/bin/ls', '/etc/hosts', 'nonexistant'], 

            stdin=PIPE, stdout=PIPE, stderr=STDOUT).communicate()

('/bin/ls: cannot access nonexistant: No such file or directory\n/etc/hosts\n', None)

二つ目のstderrの出力がNoneになっているのに注目。
こうすると、stdoutとstderrの出力が別れる。



print Popen(['/bin/ls', '/etc/hosts', 'nonexistant'], 

            stdin=PIPE, stdout=PIPE, stderr=PIPE).communicate()

('/etc/hosts\n', '/bin/ls: cannot access nonexistant: No such file or directory\n')

コマンドにシグナルを送る

.wait() .retcodeでコマンドが受けたシグナルをアクセスする



import signal
p=Popen(['/bin/sleep', '9999'])

p.kill()

print p.wait()==-1*signal.SIGKILL

print p.returncode==-1*signal.SIGKILL

True
True

.wait()からのステータスは0以下の場合、子プロセスが受け取ったシグナルに-1をかけた数字となる。.kll() 以外に .terminate() .send_signal(sig)などがある。



p=Popen(['/bin/sleep', '9999'])

p.send_signal(signal.SIGHUP)

print p.wait()==-1*signal.SIGHUP

True

.send_signal()で任意のシグナルを送れる。

stdoutとstderrの両方を読み込むときはcommunicate()で

ただし、一気にバファーに読み込むのでデータ量が制限される。
stdoutとstderrの両方を読もうとしてデッドロックする場合は解消できる。
(stdout,stderr)がtupleとして返ってくる。



print Popen(['/bin/cat'], stdout=PIPE, stderr=PIPE, stdin=PIPE).communicate('OH HAI')

('OH HAI', '')

stdoutとstderrの両方を読もうとしてデッドロックする例

例としてデッドロックが起きやすいようにstderrを先に読んでいる。これを普通のユーザで走らせるとデッドロックする可能性が高い。stderrのバファがいっぱいになって、コマンドが止ってしまうからだ。一般的に複数のブロッキングIOをやるときはselectなどで、ブロックをしないことを確認しながらやった方が無難だ。



# 注： デッドロックしがちなやりかた

find=Popen(['/usr/bin/find', '/'], stdout=PIPE, stderr=PIPE)
for line in find.stderr.readlines():

    pass
for line in find.stdout.readlines():

    pass

交互に読んでもデッドロックするのは同じ。



# 注： デッドロックしがちな読みかた

while True:

    print find.stderr.readline()

    print find.stdout.readline()

select(2)を使ってstdoutとstderrを多重IOする単純な例

select(2)でデータがあることを確認してから読むことによりブロックを避ける
終了処理無し



import select

    

find=Popen(['/usr/bin/find', '/usr/local'], stdout=PIPE, stderr=PIPE)
while True:

    # 注意： データ終了の確認をしていないのでこのままだと無限ループに入ってしまう。

    rready, _, _=select.select([find.stdout, find.stderr], [ ], [ ])

    if find.stdout in rready:

        print find.stdout.readline().strip()

    if find.stderr in rready:

        print find.stderr.readline().strip()

select(2)でstdoutとstderrを多重IOする

EOF確認付き



import select

    

find=Popen(['/usr/bin/find', '/etc'], stdout=PIPE, stderr=PIPE)
rfhd=dict(out=find.stdout, err=find.stderr)
while True:
    if not rfhd:

        break               # 読み込めるfdが無い。入力データの終了。
    rready, _, _=select.select(rfhd.values(), [ ], [ ])
    for name,fh in rfhd.items():

        # ブロックしないようにデータがあるfdからだけ読む。

        if fh in rready:

            line=fh.readline()

            # EOFがきたら、モニターするファイルから削除する。

            # pythonはEOFを''で返す。Noneの方が分かり易い気がするが…

            if line=='':

                fh.close()

                del rfhd[name]

            else:

                print name, line.strip()

ノンブロッキングIOでstdoutとstderrの両方から読む



# 普通ユーザだと権限が無くエラーが出る

cmd=Popen(['/usr/bin/find', '/etc'], stdout=PIPE, stderr=PIPE)
# stderrをnonblockingにする。stdoutはそのまま(blocking)

set_nonblocking(cmd.stderr)
while True:
    # 子プロセスがstderrで詰って止ってしまうことは無いので、stdoutはブロックしないはず。

    line=cmd.stdout.readline()

    if line=='':            # eof

        break

    print line.strip()
    # stderr： nonblockingモードの読み込み

    # nonblockingモードのfdから読もうとすると、EAGAIN(今データ無いよ、また後でやってみて)というエラー

    # が返ってくる。それは無視して、それ以外の例外は投げる。

    try:

        line=cmd.stderr.readline()

    except IOError, e:

        if e.args[0]==11: # EAGAIN

            line=None

        else:

            raise

    if line:

        print 'stderr:', line.strip()



def set_nonblocking(fh):

    """ ファイルハンドルをnonblockingにする """ 
    import fcntl
    fd = fh.fileno()

    fl = fcntl.fcntl(fd, fcntl.F_GETFL)

    fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)