Python 3
Python is the favorite language of many computational biologists. Unfortunately, running Python on your computer reaches its limits quickly while analyzing biological data. The way most researchers get more power is by starting a cloud instance, SSHing in, and running their Python script in the cloud.
Companies like Stripe have internal tooling for their engineers to train machine learning models in the cloud. It replaces the process of starting a cloud instance, SSHing in, starting the script, and then copying the results.
You get the same tooling with Toolchest: in the background, Toolchest starts a cloud instance, runs your script on the instance, and copies the input and output files. You only get charged when your script is running on the instance, which means you don't have to pay for idling cloud instances – or pay thousands of dollars after forgetting to terminate instances.
Example usage
Let's say we want to:
- Calculate the length of
input.txt
with a Python script calledcalculate_length.py
- Return the length of a file at
./my_output/length.txt
The Toolchest call is:
file_to_count = "input.txt"
length_file = "length.txt"
toolchest.python3(
script="calculate_length.py",
inputs=[file_to_count],
output_path="./my_output/",
tool_args=[f"--input-file {file_to_count} --length-file {length_file}"],
)
In the Python script, inputs are read from ./input/
, and output is written to ./output/
. That's because Toolchest places input files at ./input/
, and it only captures output files written to ./output/
.
The script, calculate_length.py
, contains:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--input-file', metavar='input', help="Input file")
parser.add_argument('--length-file', metavar='length', help="Output file")
args = parser.parse_args()
with open(f"./input/{args.input}", "r") as input_file:
input_file_contents = input_file.read()
with open('f"./output/{args.length}", "w") as output_file:
output_file.write(f"{len(input_file_contents)}")
The Toolchest call
toolchest.python3(
script,
inputs,
output_path=None,
tool_args="",
is_async=False,
)
toolchest$python3(
script,
inputs,
output_path = NULL,
tool_args = "",
is_async = FALSE
)
Function arguments
Argument | Use for | Description |
---|---|---|
| input file locations | Path to one or more files to be used as input. Your script can access these files at |
| the Python script to run | The path of the Python 3 script to run. The script can be local or remote, see Inputs and Outputs. |
| output file location | (optional) Path to a directory (local or S3) to download output files. These are all the files written to If omitted, downloading is skipped. The outputs can be downloaded manually. |
| all other arguments | (optional) Additional arguments to be passed directly to |
| the Docker image to run | (optional) A custom Docker image to use as context for Python 3 execution. See Custom Environments. |
| Whether to run a job asynchronously. |
Custom environments
By passing a Docker image to Toolchest using custom_docker_image_id
, you can run Python in any environment you'd like via Toolchest.
Make sure that the Docker image you use:
- Has
python3
(i.e.docker run {image} python3
works) - Supports the
linux/amd64
platform - Exists on the machine where you're running Toolchest, and that the Docker engine is running
If you're building the image on an M1 Mac or Windows machine, make sure you build your Docker image with platform set to linux/amd64
.
Building and using a custom environment
In this guide, we'll build and run a custom Docker image that supports numpy via Toolchest.
Before starting, make sure that Docker engine is installed and running.
Start by creating a file named Dockerfile
that contains Python 3.9 and numpy:
FROM python:3.9
RUN pip install numpy
Next, build a Docker image from that Dockerfile.
docker build . -t python3-numpy:3.9 --platform linux/amd64
# Make sure the Docker Python library is installed (e.g. pip install docker)
import docker
client = docker.from_env()
client.images.build(
path=f"./", # This is a path to the location of the Dockerfile
dockerfile="Dockerfile",
tag="python3-numpy:3.9",
platform="linux/amd64"
)
Now let's make a Python script ("numpy_example.py") that uses numpy:
import numpy as np
a = np.array([(1, 2, 3), (4, 5, 6)])
b = np.array([(7, 8), (9, 10), (11, 12)])
output_string = np.array_str(np.matmul(a, b))
f = open("./output/output.txt", "w")
f.write()
f.close()
And finally, the last step: you can run the Python script in the custom Docker image using Toolchest:
import toolchest_client as tc
tc.python3(
script="numpy_example.py",
output_path=f"./local_output/",
custom_docker_image_id="python3-numpy:3.9"
)
That's it!
Python versions
Toolchest currently runs version 3.9.1 of Python. You can use other versions of Python by using a custom environment.
Passing arguments to your Python 3 script
Any arguments passed to tool_args
are passed to your Python script, as if it were executing on the command line. For example:
toolchest.python3(
script="my_script.py",
tool_args="--my-arg 1234",
...
)
Is processed as if the script were run on the command line like:
python3 my_script.py --my-arg 1234
Some argument names are not allowed due to conflicts with Python itself, including:
-c
-i
-m
Return value
This function call returns a Toolchest output object, which contains the run ID and locations of downloaded output files. See Output Objects.
Async runs
Set the is_async
parameter to True
if you would like to run a Python 3 job asynchronously. See Async Runs.
Updated 3 days ago