This probably isn't the page you're looking for
To run Python functions and Docker images with Toolchest, check out Lug, an open-source project that builds on Toolchest.
Python is the favorite language of many computational biologists. Unfortunately, running Python on your computer reaches its limits quickly while analyzing biological data. The way most researchers get more power is by starting a cloud instance, SSHing in, and running their Python script in the cloud.
Companies like Stripe have internal tooling for their engineers to train machine learning models in the cloud. It replaces the process of starting a cloud instance, SSHing in, starting the script, and then copying the results.
You get the same tooling with Toolchest: in the background, Toolchest starts a cloud instance, runs your script on the instance, and copies the input and output files. You only get charged when your script is running on the instance, which means you don't have to pay for idling cloud instances – or pay thousands of dollars after forgetting to terminate instances.
Example usage
Let's say we want to:
- Calculate the length of
input.txt
with a Python script calledcalculate_length.py
- Return the length of a file at
./my_output/length.txt
The Toolchest call is:
file_to_count = "input.txt"
length_file = "length.txt"
tc.python3(
script="calculate_length.py",
inputs=[file_to_count],
output_path="./my_output/",
tool_args=f"--input-file {file_to_count} --length-file {length_file}",
)
In the Python script, inputs are read from ./input/
, and output is written to ./output/
. That's because Toolchest places input files at ./input/
, and it only captures output files written to ./output/
.
The script, calculate_length.py
, contains:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--input-file', metavar='input', help="Input file")
parser.add_argument('--length-file', metavar='length', help="Output file")
args = parser.parse_args()
with open(f"./input/{args.input}", "r") as input_file:
input_file_contents = input_file.read()
with open(f"./output/{args.length}", "w") as output_file:
output_file.write(f"{len(input_file_contents)}")
The Toolchest call
tc.python3(
script,
inputs,
output_path=None,
tool_args="",
is_async=False,
streaming_enabled=True,
)
toolchest$python3(
script,
inputs,
output_path = NULL,
tool_args = "",
is_async = FALSE
)
Custom environments
By passing a Docker image to Toolchest using custom_docker_image_id
, you can run Python in any environment you'd like via Toolchest.
Make sure that the Docker image you use:
- Has
python3
(i.e.docker run {image} python3
works) - Supports the
linux/amd64
platform - Exists on the machine where you're running Toolchest, and that the Docker engine is running
If you're building the image on an M1 Mac or Windows machine, make sure you build your Docker image with platform set to linux/amd64
.
Building and using a custom environment
In this guide, we'll build and run a custom Docker image that supports numpy via Toolchest.
Before starting, make sure that Docker engine is installed and running.
Start by creating a file named Dockerfile
that contains Python 3.9 and numpy:
```dockerfile Dockerfile FROM python:3.9 RUN pip install numpy
Next, build a Docker image from that Dockerfile.
```shell
docker build . -t python3-numpy:3.9 --platform linux/amd64
# Make sure the Docker Python library is installed (e.g. pip install docker)
import docker
client = docker.from_env()
client.images.build(
path=f"./", # This is a path to the location of the Dockerfile
dockerfile="Dockerfile",
tag="python3-numpy:3.9",
platform="linux/amd64"
)
Now let's make a Python script ("numpy_example.py") that uses numpy:
import numpy as np
a = np.array([(1, 2, 3), (4, 5, 6)])
b = np.array([(7, 8), (9, 10), (11, 12)])
output_string = np.array_str(np.matmul(a, b))
f = open("./output/output.txt", "w")
f.write()
f.close()
And finally, the last step: you can run the Python script in the custom Docker image using Toolchest:
import toolchest_client as tc
tc.python3(
script="numpy_example.py",
output_path=f"./local_output/",
custom_docker_image_id="python3-numpy:3.9"
)
That's it!
Python versions
Toolchest currently runs version 3.9.1 of Python. You can use other versions of Python by using a custom environment.
Passing arguments to your Python 3 script
Any arguments passed to tool_args
are passed to your Python script, as if it were executing on the command line. For example:
tc.python3(
script="my_script.py",
tool_args="--my-arg 1234",
...
)
Is processed as if the script were run on the command line like:
python3 my_script.py --my-arg 1234
Some argument names are not allowed due to conflicts with Python itself, including:
-c
-i
-m
Return value
This function call returns a Toolchest output object, which contains the run ID and locations of downloaded output files. See Output Objects.
Async runs
Set the is_async
parameter to True
if you would like to run a Python 3 job asynchronously. See Async Runs.