# Setup & reference data Subcommands and tasks for installing and configuring JASEN. ## converge-catalogues Merge WHO, TBdb, and FoHM TB mutation catalogues into a unified TBProfiler database. ``` jasentool converge-catalogues [--output-dir ] [--save-dbs] ``` | Argument | Required | Default | Description | |----------|----------|---------|-------------| | `--output-dir` | No | — | Directory to write output files | | `--save-dbs` | No | False | Save all intermediary databases | **Example** ```bash jasentool converge-catalogues --output-dir /path/to/output --save-dbs ``` ## download-bigsdb Download cgMLST scheme alleles from PubMLST or BIGSdb Pasteur via OAuth1. ```{note} This subcommand is based on code written by Keith Jolley (University of Oxford). Source: [kjolley/BIGSdb_downloader](https://github.com/kjolley/BIGSdb_downloader) ``` ### Initial setup Run once per site to register your API key and obtain OAuth tokens. ```bash jasentool download-bigsdb \ --setup \ --site PubMLST \ --db seqdef_db \ --key-name mykey ``` Follow the printed URL to authorise access in your browser, then paste the verifier code when prompted. Tokens are stored in `--token-dir` (default `./.bigsdb_tokens`). ### Download scheme alleles ```bash jasentool download-bigsdb \ --download-scheme \ --url https://rest.pubmlst.org/db/pubmlst_saureus_seqdef/schemes/1 \ --site PubMLST \ --key-name mykey \ --output-dir /path/to/alleles ``` ### Options | Argument | Required | Default | Description | |----------|----------|---------|-------------| | `--url` | Conditional | — | API endpoint URL (required for `--download-scheme`) | | `--site` | No | — | BIGSdb site: `PubMLST` or `Pasteur` | | `--key-name` | Yes | — | API key name (unique per site) | | `--output-dir` | No | — | Directory for per-locus FASTA files (`--download-scheme`) | | `--token-dir` | No | `./.bigsdb_tokens` | Token storage directory | | `--db` | No | — | Database config name (setup only) | | `--setup` | No | False | Run initial OAuth1 setup | | `--download-scheme` | No | False | Download all scheme loci | | `--force` | No | False | Re-download existing files (`--download-scheme`) | | `--cron` | No | False | Non-interactive / cron mode | | `--method` | No | `GET` | HTTP method: `GET` or `POST` | | `--output-file` | No | — | Save single API response to this file | ## download-ncbi Download genome FASTA and GFF from the NCBI Datasets v2 API. ``` jasentool download-ncbi --accession [--accession ...] --output-dir [--bwa-index] [--fai-index] [--clean] ``` | Argument | Required | Default | Description | |----------|----------|---------|-------------| | `-i`/`--accession` | Yes | — | NCBI accession number(s); repeat for multiple | | `-o`/`--output-dir` | Yes | — | Output directory | | `--bwa-index` | No | False | Run `bwa index` on the downloaded FASTA | | `--fai-index` | No | False | Run `samtools faidx` on the downloaded FASTA | | `--clean` | No | False | Clear output directory before downloading | **Example** ```bash jasentool download-ncbi \ --accession GCF_000013425.1 \ --output-dir /path/to/references \ --fai-index ``` ## transform-file-format Convert a cgMLST target TSV file to BED format (or another output format). ``` jasentool transform-file-format -i [...] -o [-f ] [-a ] ``` | Argument | Required | Default | Description | |----------|----------|---------|-------------| | `-i`/`--input-file` | Yes | — | Path to targets TSV file | | `-o`/`--output-file` | Yes | — | Output file path | | `-f`/`--out-format` | No | `bed` | Output format | | `-a`/`--accession` | No | — | Chromosome/contig accession for the BED `chrom` column | ### Downloading the cgMLST targets TSV The input TSV is the locus table for your organism's cgMLST scheme on [cgMLST.org](https://www.cgmlst.org). To download it: 1. Navigate to your organism's schema page, e.g. `https://www.cgmlst.org/ncs/schema//locus/` (the schema name differs per organism — for *S. aureus* it is `Saureus4059`). 2. Click the **Download table as CSV** button. The downloaded file has a `.csv` extension but is tab-separated. Pass it directly to `--input-file`. **Example** ```bash jasentool transform-file-format \ --input-file Staphylococcus_aureus_cgMLST.csv \ --output-file targets.bed \ --accession NC_002951.2 ``` ## Building a Kraken2 database ### Option A: Download a pre-built database Pre-built databases are ready to use with no build step — just download and extract. Full index listing: `https://benlangmead.github.io/aws-indexes/k2` | Database | Index size | RAM needed | Contents | |----------|------------|------------|---------| | `Standard` | 96.8 GB | ~96 GB | Archaea, bacteria, viral, plasmid, human, UniVec_Core | | `Standard-16` | 14.9 GB | ~16 GB | Same libraries, k-mer-filtered to fit 16 GB | | `Standard-8` | 7.5 GB | ~8 GB | Same libraries, k-mer-filtered to fit 8 GB | | `PlusPF` | 103.4 GB | ~103 GB | Standard + protozoa & fungi | | `Viral` | 0.6 GB | ~1 GB | RefSeq viral only | **Example (Standard-8):** ```bash # Download (check https://benlangmead.github.io/aws-indexes/k2 for the latest URL) wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_.tar.gz mkdir -p /path/to/krakendb tar -xzf k2_standard_08gb_.tar.gz -C /path/to/krakendb # Run classification singularity exec kraken2.sif kraken2 \ --db /path/to/krakendb \ --paired --gzip-compressed \ sample_R1.fastq.gz sample_R2.fastq.gz \ --output kraken_output.txt \ --report kraken_report.txt ``` ### Option B: Build a custom database jasentool does not wrap `kraken2-build`. Use the official Kraken2 Singularity image directly: ```bash # Pull the image (once) singularity pull kraken2.sif docker://staphb/kraken2:latest # Download NCBI taxonomy singularity exec kraken2.sif kraken2-build \ --download-taxonomy --db /path/to/krakendb # Download one or more libraries (repeat per library) singularity exec kraken2.sif kraken2-build \ --download-library bacteria --db /path/to/krakendb singularity exec kraken2.sif kraken2-build \ --download-library viral --db /path/to/krakendb singularity exec kraken2.sif kraken2-build \ --download-library human --db /path/to/krakendb # Build the database singularity exec kraken2.sif kraken2-build \ --build --db /path/to/krakendb --threads 8 ``` Available library names: `archaea`, `bacteria`, `plasmid`, `viral`, `human`, `fungi`, `plant`, `protozoa`, `nr`, `nt`, `UniVec`, `UniVec_Core`.