Setup & reference data

Subcommands and tasks for installing and configuring JASEN.

converge-catalogues

Merge WHO, TBdb, and FoHM TB mutation catalogues into a unified TBProfiler database.

jasentool converge-catalogues [--output-dir <DIR>] [--save-dbs]

Argument	Required	Default	Description
`--output-dir`	No	—	Directory to write output files
`--save-dbs`	No	False	Save all intermediary databases

Example

jasentool converge-catalogues --output-dir /path/to/output --save-dbs

download-bigsdb

Download cgMLST scheme alleles from PubMLST or BIGSdb Pasteur via OAuth1.

Note

This subcommand is based on code written by Keith Jolley (University of Oxford). Source: kjolley/BIGSdb_downloader

Initial setup

Run once per site to register your API key and obtain OAuth tokens.

jasentool download-bigsdb \
  --setup \
  --site PubMLST \
  --db seqdef_db \
  --key-name mykey

Follow the printed URL to authorise access in your browser, then paste the verifier code when prompted. Tokens are stored in --token-dir (default ./.bigsdb_tokens).

Download scheme alleles

jasentool download-bigsdb \
  --download-scheme \
  --url https://rest.pubmlst.org/db/pubmlst_saureus_seqdef/schemes/1 \
  --site PubMLST \
  --key-name mykey \
  --output-dir /path/to/alleles

Options

Argument	Required	Default	Description
`--url`	Conditional	—	API endpoint URL (required for `--download-scheme`)
`--site`	No	—	BIGSdb site: `PubMLST` or `Pasteur`
`--key-name`	Yes	—	API key name (unique per site)
`--output-dir`	No	—	Directory for per-locus FASTA files (`--download-scheme`)
`--token-dir`	No	`./.bigsdb_tokens`	Token storage directory
`--db`	No	—	Database config name (setup only)
`--setup`	No	False	Run initial OAuth1 setup
`--download-scheme`	No	False	Download all scheme loci
`--force`	No	False	Re-download existing files (`--download-scheme`)
`--cron`	No	False	Non-interactive / cron mode
`--method`	No	`GET`	HTTP method: `GET` or `POST`
`--output-file`	No	—	Save single API response to this file

download-ncbi

Download genome FASTA and GFF from the NCBI Datasets v2 API.

jasentool download-ncbi --accession <ACC> [--accession ...] --output-dir <DIR>
                        [--bwa-index] [--fai-index] [--clean]

Argument	Required	Default	Description
`-i`/`--accession`	Yes	—	NCBI accession number(s); repeat for multiple
`-o`/`--output-dir`	Yes	—	Output directory
`--bwa-index`	No	False	Run `bwa index` on the downloaded FASTA
`--fai-index`	No	False	Run `samtools faidx` on the downloaded FASTA
`--clean`	No	False	Clear output directory before downloading

Example

jasentool download-ncbi \
  --accession GCF_000013425.1 \
  --output-dir /path/to/references \
  --fai-index

transform-file-format

Convert a cgMLST target TSV file to BED format (or another output format).

jasentool transform-file-format -i <FILE> [...] -o <FILE>
                                 [-f <FORMAT>] [-a <ACCESSION>]

Argument	Required	Default	Description
`-i`/`--input-file`	Yes	—	Path to targets TSV file
`-o`/`--output-file`	Yes	—	Output file path
`-f`/`--out-format`	No	`bed`	Output format
`-a`/`--accession`	No	—	Chromosome/contig accession for the BED `chrom` column

Downloading the cgMLST targets TSV

The input TSV is the locus table for your organism’s cgMLST scheme on cgMLST.org. To download it:

Navigate to your organism’s schema page, e.g. https://www.cgmlst.org/ncs/schema/<SCHEMA_NAME>/locus/ (the schema name differs per organism — for S. aureus it is Saureus4059).
Click the Download table as CSV button.

The downloaded file has a .csv extension but is tab-separated. Pass it directly to --input-file.

Example

jasentool transform-file-format \
  --input-file Staphylococcus_aureus_cgMLST.csv \
  --output-file targets.bed \
  --accession NC_002951.2

Building a Kraken2 database

Option A: Download a pre-built database

Pre-built databases are ready to use with no build step — just download and extract.

Full index listing: https://benlangmead.github.io/aws-indexes/k2

Database	Index size	RAM needed	Contents
`Standard`	96.8 GB	~96 GB	Archaea, bacteria, viral, plasmid, human, UniVec_Core
`Standard-16`	14.9 GB	~16 GB	Same libraries, k-mer-filtered to fit 16 GB
`Standard-8`	7.5 GB	~8 GB	Same libraries, k-mer-filtered to fit 8 GB
`PlusPF`	103.4 GB	~103 GB	Standard + protozoa & fungi
`Viral`	0.6 GB	~1 GB	RefSeq viral only

Example (Standard-8):

# Download (check https://benlangmead.github.io/aws-indexes/k2 for the latest URL)
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_<DATE>.tar.gz
mkdir -p /path/to/krakendb
tar -xzf k2_standard_08gb_<DATE>.tar.gz -C /path/to/krakendb

# Run classification
singularity exec kraken2.sif kraken2 \
  --db /path/to/krakendb \
  --paired --gzip-compressed \
  sample_R1.fastq.gz sample_R2.fastq.gz \
  --output kraken_output.txt \
  --report kraken_report.txt

Option B: Build a custom database

jasentool does not wrap kraken2-build. Use the official Kraken2 Singularity image directly:

# Pull the image (once)
singularity pull kraken2.sif docker://staphb/kraken2:latest

# Download NCBI taxonomy
singularity exec kraken2.sif kraken2-build \
  --download-taxonomy --db /path/to/krakendb

# Download one or more libraries (repeat per library)
singularity exec kraken2.sif kraken2-build \
  --download-library bacteria --db /path/to/krakendb

singularity exec kraken2.sif kraken2-build \
  --download-library viral --db /path/to/krakendb

singularity exec kraken2.sif kraken2-build \
  --download-library human --db /path/to/krakendb

# Build the database
singularity exec kraken2.sif kraken2-build \
  --build --db /path/to/krakendb --threads 8

Available library names: archaea, bacteria, plasmid, viral, human, fungi, plant, protozoa, nr, nt, UniVec, UniVec_Core.