Adding and updating custom databases

You can add or update custom reference databases using Toolchest – the interface is the same as running any tool, except inputs is replaced by database_path.

Adding and updating databases function as async runs. The add_database and update_database functions return after transferring the data with:

  • a unique ID that you can use with .get_status() to track status
  • the new database_name and database_version

Adding a custom database

You can add a custom database for any tool that exists in Toolchest.

The custom database must already be generated for the specific tool. For example, the custom database for Kraken 2 must be generated by kraken2-build rather than a collection of FASTQ files.

Arguments:

  • database_path: Path or list of paths (local or S3) containing the custom database.
  • tool: Toolchest tool class with which you use the database (e.g. toolchest.tools.DiamondBlastp, toolchest.tools.Kraken2).
  • database_name: name of the new custom database.

The return is a Toolchest.api.Output object, containing:

  • database_name
  • database_version
  • run_id

Here's an example of adding a custom database for Kraken 2 using an S3 URI:

import time
import toolchest_client as toolchest

toolchest.set_key("YOUR_TOOLCHEST_KEY")

toolchest.add_database(
  database_path="s3://toolchest-integration-tests/arbitrary_directory/",
  tool=toolchest.tools.Kraken2,
  database_name=f"my_database_{time.time()}",
)

Note that it may take 24-48 hours for the custom database to be ready to use.

Updating a custom database

You can create a new custom version for any tool and database in Toolchest. This is very similar to adding a custom database, except the database_name for the database must already exist.

Arguments:

  • database_path: Path or list of paths (local or S3) containing the custom database.
  • tool: Toolchest tool class with which you use the database (e.g. toolchest.tools.DiamondBlastp, toolchest.tools.Kraken2).
  • database_name: name of the existing database

Returns a Toolchest.api.Output object, containing:

  • database_name
  • database_version (auto-incremented from the latest version)
  • run_id

Here's an example update of the standard Kraken 2 database:

import toolchest_client as toolchest

toolchest.set_key("YOUR_TOOLCHEST_KEY")

toolchest.update_database(
  database_path="s3://toolchest-integration-tests/arbitrary_directory/",
  tool=toolchest.tools.Kraken2,
  database_name="standard",
)

Please allow 24-48 hours for the custom database to be ready to use.

Security

By default, the privacy setting of all custom databases is the equivalent of "unlisted". This means that if anybody else knows the name and version of your database, they can access it.

We support fully private custom databases as a part of the managed-hosted and on-prem versions of Toolchest.


Did this page help you?