Command line console scripts
SoNNia provides three main command line console scripts accessible via the sonnia command (installed via pip or can be called as executables):
sonnia inferInfers a selection model with respect to a generative V(D)J model. Can infer both linear (Sonia) and non-linear (SoNNia) models, for both single-chain and paired-chain sequences.
sonnia evaluateEvaluates Ppost, Pgen, and selection factors (Q) of sequences according to a generative V(D)J model and selection model.
sonnia generateGenerates CDR3 (junction) sequences, either before selection (pre-selection, like OLGA) or after selection (post-selection).
For any command, you can execute with the -h or --help flags to get detailed options.
Quick Start Examples
We offer a quick demonstration of the console scripts. This will show how to generate and evaluate sequences and infer a selection model using the default generation model for human TCR beta chains. In order to run the commands below you need to download the examples folder.
Infer a selection model:
$ sonnia infer --model humanTRB -i examples/data_seqs.csv.gz
This reads in the file, infers a non-linear selection model (SoNNia) and saves to the folder
sonnia_model(default output directory). The command also generates several plot files:model_learning.png,marginals.png,log_Q.png, andQ_ratio.png.Infer a linear model:
$ sonnia infer --model humanTRB -i examples/data_seqs.csv.gz --linear
This infers a linear selection model (Sonia) instead.
Generate sequences:
$ sonnia generate --model sonnia_model --post -n 100
Generate 100 human TRB CDR3 (junction) sequences from the post-selection repertoire and print to stdout along with the V and J genes used to generate them.
Evaluate sequences:
$ sonnia evaluate --model sonnia_model -i examples/data_seqs.csv.gz -o evaluated_seqs.tsv
This computes Ppost, Pgen, and Q for all sequences in the input file and saves to
evaluated_seqs.tsv.
Specifying a default V(D)J model (or a custom model folder)
All of the console scripts require specifying a V(D)J model. SoNNia ships with several default models that can be indicated by name, or a custom model folder can be specified.
Single-chain models:
| Model Name | Description |
|---|---|
| humanTRA | Default human T cell alpha chain model (VJ) |
| humanTRB | Default human T cell beta chain model (VDJ) |
| humanIGH | Default human B cell heavy chain model (VDJ) |
| humanIGK | Default human B cell light kappa chain model (VJ) |
| humanIGL | Default human B cell light lambda chain model (VJ) |
| mouseTRA | Default mouse T cell alpha chain model (VJ) |
| mouseTRB | Default mouse T cell beta chain model (VDJ) |
| mouseIGH | Default mouse B cell heavy chain model (VDJ) |
Paired-chain models:
| Model Name | Description |
|---|---|
| humanTCR | Human T cell receptor (alpha-beta paired) |
| humanIGHK | Human B cell receptor (heavy-kappa paired) |
| humanIGHL | Human B cell receptor (heavy-lambda paired) |
Custom model folder
If specifying a folder for a custom VJ recombination model (e.g., an alpha or light chain model) or a custom VDJ recombination model (e.g., a beta or heavy chain model), the folder must contain the following files with the exact naming convention:
model_params.txtmodel_marginals.txtV_gene_CDR3_anchors.csv(V anchor residue position and functionality file)J_gene_CDR3_anchors.csv(J anchor residue position and functionality file)features.tsv(required to load the selection model; not required forsonnia infercommand)log.txt(optional; contains training log)model.h5(required to load a non-linear selection model; not required forsonnia infercommand)
For paired-chain models, the folder should contain heavy_chain/ and light_chain/ subdirectories, each with the above files.
The console scripts can read files in various formats (CSV, TSV, etc.) and automatically detect the delimiter. See the default models in the sonnia/default_models/ directory for examples.
Command-specific options
sonnia infer options
| Option | Description |
|---|---|
-i, --infile |
Path to input file (required) |
--model |
Model name or path to custom model folder (optional) |
-o, --outdir |
Output directory (default: sonnia_model) |
--linear |
Infer linear model instead of non-linear |
--paired |
Use paired-chain model. Assumes heavy and light chains are in separate columns named junction_aa_heavy, v_gene_heavy, j_gene_heavy, junction_aa_light, v_gene_light, j_gene_light. |
--max-seqs |
Maximum number of sequences to use (default: 1e8) |
--max-gen-seqs |
Maximum number of sequences to generate (default: 1e6) |
--n-gen-seqs |
Number of sequences to generate (default: 0, which auto-calculates as min(max_gen_seqs, 3 * len(data_seqs))) |
--epochs |
Number of training epochs (default: 50) |
--batch-size |
Batch size for training (default: 5000) |
--validation-split |
Validation split ratio (default: 0.2) |
--infile-gen |
Path to pre-generated sequences file (optional). If provided, uses these sequences instead of generating new ones. |
--junction-column |
Column name for junction sequences (default: junction_aa) |
--v-gene-column |
Column name for V gene (default: v_gene) |
--j-gene-column |
Column name for J gene (default: j_gene) |
--no-header |
Input file does not have a header |
--delimiter |
File delimiter (default: auto, inferred from file extension) |
sonnia evaluate options
| Option | Description |
|---|---|
-i, --infile |
Path to input file (required) |
--model |
Model name or path to model folder (required) |
-o, --outfile |
Output file path (default: evaluated_seqs.tsv) |
-m, --max_seqs |
Maximum number of sequences to evaluate (default: 1e8) |
--paired |
Use paired-chain model. Assumes heavy and light chains are in separate columns named junction_aa_heavy, v_gene_heavy, j_gene_heavy, junction_aa_light, v_gene_light, j_gene_light. |
--junction-column |
Column name for junction sequences (default: junction_aa, single chain only) |
-v, --v-gene-column |
Column name for V gene (default: v_gene, single chain only) |
-j, --j-gene-column |
Column name for J gene (default: j_gene, single chain only) |
--no-header |
Input file does not have a header |
-d, --delimiter |
File delimiter (default: auto, inferred from file extension) |
sonnia generate options
| Option | Description |
|---|---|
--model |
Model name or path to model folder (required) |
-n, --number_of_seqs |
Number of sequences to generate (required) |
-o, --outfile |
Output file path (optional; prints to stdout if not specified) |
--pre |
Generate sequences using pre-selection model (required: either --pre or --post must be specified) |
--post |
Generate sequences using post-selection model (required: either --pre or --post must be specified) |
--rejection-bound |
Rejection bound for post-selection (default: 10) |
--chunk-size |
Chunk size for generation (default: 1000) |
--paired |
Use paired-chain model |
--junction-column |
Column name for junction sequences (default: junction_aa) |
--v-gene-column |
Column name for V gene (default: v_gene) |
--j-gene-column |
Column name for J gene (default: j_gene) |
--no-header |
Input file does not have a header |
--delimiter |
File delimiter (default: auto, inferred from file extension) |
For detailed help on any command, use:
sonnia <command> --help