🏠 Home β€’ δΈ­ζ–‡

🧬 DNBelab C Series HT scVDJ Analysis Parameters

πŸ”¬ Main Analysis Pipeline (run)


πŸ”¬ Main Analysis Pipeline (run)

πŸ“Š Usage

$ dnbc4tools vdj run -h
usage: dnbc4tools vdj run [OPTIONS] 

optional arguments:
  -h, --help            show this help message and exit

Input Files:
  Choose ONE input method: either --fastqs (directory) OR individual FASTQ files (-1 and -2).

  --fastqs <DIR>        Input directory containing paired-end FASTQ files. The pipeline automatically detects Read1/Read2 files. Example: ./fastq_dir
  -1, --fastq1 <FILE> [<FILE> ...]
                        Read1 FASTQ file(s) (supports wildcards and comma-separated lists). Example: sample1_L01_R1.fastq.gz,sample1_L02_R1.fastq.gz
  -2, --fastq2 <FILE> [<FILE> ...]
                        Read2 FASTQ file(s) (supports wildcards and comma-separated lists). Must match --fastq1 order. Example: sample1_L01_R2.fastq.gz,sample1_L02_R2.fastq.gz

Basic Settings:
  -n, --name <STR>      Unique identifier for the sample (e.g., sample1). Used for naming output files and reports.
  -r, --ref <REF>         Reference database: 'human'/'mouse' (case-insensitive) or path to a custom reference directory containing reference.json. Examples: human | mouse | ./custom_vdj_ref
  -c, --chain <STR>     VDJ receptor type: 'IG' (B-cell receptors) or 'TR' (T-cell receptors).
  -o, --outdir <DIR>    Output directory for results and reports [default: current directory]. Example: ./output
  -t, --threads <INT>   Number of CPU threads for parallel processing [default: all available cores] (e.g., 16).
  -s, --beadstrans <FILE>
                        RNA analysis singlecell.csv file for filtering cells and merging beads information. When not provided, all cells will be kept by default (equivalent to --keep_all_cells).

Library Settings:
  Auto-detection is recommended for dark cycles. Available modes include "R1" and "unset".
  For multiple files, ensure consistent settings across all inputs.
  customize: Specify sequence structure patterns for parsing.

  --darkreaction <STR>  Dark cycle setting for VDJ library [default: auto]. Use 'R1' if dark cycles occur in Read1; otherwise leave as 'auto' or 'unset'.
  --customize <STR>     Sequence structure patterns, format: <type>,<read>:<start>-<end> separated by ';'. Types include: cb (cell barcode), umi (UMI) R1/R2 (sequence). Example:
                        "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R1:31-120;R2,R2:1-150"
  --enrichment_primers <FILE>
                        Custom inner enrichment primers file (one primer sequence per line). Required when using a custom reference database.

Analysis Settings:
  --keep_all_cells      Keep all cells in analysis without RNA data filtering. If --beadstrans is not provided, this behavior is enabled by default.
  --r2_only             Only use R2 reads for VDJ assembly. Manual setting required because Read1 assembly requirements cannot be auto-detected.
  --sample_read_pairs <INT>
                        Subsample the specified number of read pairs from the input FASTQ files (e.g., 1000000).

πŸ“ Parameter Description

πŸ”΄ Required Parameters

⚠️ Essential parameters that must be specified for a successful analysis

-n, --name (Required)

Provide a unique name for this analysis run.

Default: None

Example:

--name sample_VDJ_001

-r, --ref (Required)

Specify the reference database to be used for VDJ analysis.

Default: None

Examples:

# Use the built-in human reference database
--ref human
# Use a custom reference database
--ref ./custom_vdj_ref

-c, --chain (Required)

Specify the type of immune receptor to be analyzed.

Default: None

Examples:

# Analyze T-cell receptors
--chain TR
# Analyze B-cell receptors
--chain IG

🟒 Input File Parameters

πŸ“ Choose one input method: Directory-based OR specify individual files

--fastqs (Method 1)

Specify the path to the directory containing all FASTQ files.

Default: None

Example:

--fastqs ./VDJ_fastq_dir

-1, --fastq1 (Method 2A)

Specify one or more Read1 FASTQ files for the VDJ library individually.

Default: None

Example:

--fastq1 sample1_L01_R1.fastq.gz,sample1_L02_R1.fastq.gz

-2, --fastq2 (Method 2B)

Specify one or more Read2 FASTQ files for the VDJ library individually.

Default: None

Example:

--fastq2 sample1_L01_R2.fastq.gz,sample1_L02_R2.fastq.gz

⚠️ Input Method Selection:

⚠️ Important Note: All files under a parameter must come from the same library, with consistent sequencing mode and dark reaction settings. Data from different libraries cannot be merged for analysis.


🟒 Basic Settings

-o, --outdir (Optional)

Specify the output directory for all analysis results and reports.

Default: ./ (current directory)

Example:

--outdir ./VDJ_analysis_output

-t, --threads (Optional)

Set the number of CPU threads to be used during the analysis.

Default: Use all available CPU cores

Example:

--threads 16

-s, --beadstrans (Optional)

Provide the singlecell.csv file from a scRNA analysis for cell filtering and information integration.

Default: None

Example:

--beadstrans ./RNA_analysis_output/outs/singlecell.csv

🟒 Library Settings

--darkreaction (Optional)

Configure the dark cycle settings for the VDJ library.

Default: auto

Example:

# Dark cycle present in Read1
--darkreaction R1

⚠️ Important Note: Incorrect settings may lead to cell barcode identification failure. Specify manually only if you know the library structure or if auto-detection fails.

--customize (Advanced)

Precisely define the extraction structure for barcodes, UMIs, and effective sequences (reads) for non-standard libraries. This is an advanced feature that overrides --darkreaction settings.

Example:

# Example of a standard VDJ library configuration
--customize "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R1:31-120;R2,R2:1-150"

⚠️ Risk Warning: Incorrect custom configurations can lead to data loss or analysis failure. Use only when standard configurations do not meet your needs.

--enrichment_primers (Optional)

Specify a file containing internal enrichment primers for VDJ region-specific amplification.

Default: None

Example file content:

GTCCTCGGTGGCCTCCACGTG
AGCACCTGGGGCCTCGGCCAC
CCTGGACTCCTGGGCCCCAG

🚩 Analysis Settings

--keep_all_cells (Flag)

Enable this parameter to retain all detected cells without filtering based on RNA data.

Default: Not set (but enabled by default if --beadstrans is absent)

--r2_only (Flag)

Enable this parameter to use only Read2 sequences for VDJ assembly.

Default: Not set

--sample_read_pairs (Optional)

Extract a specified number of read pairs from the input FASTQ files for analysis.

Default: None (uses all data)

Example:

--sample_read_pairs 10000000

πŸ’‘ Tip

This document is continuously updated. If you find any errors or have information to add, your feedback is welcome.

πŸ“ Document Version: 3.0 beta | Last Updated: 2025


🧬 DNBelab C Series HT scVDJ Analysis Software
High-performance single-cell immune repertoire data analysis pipeline