🧬 DNBelab C Series HT scRNA Analysis Parameters

🔬 Main Analysis Pipeline (run) • 📊 Reference Database Construction (mkref) • 📋 Multi-sample Operations (multi)

🔬 Main Analysis Pipeline (run)

📊 Usage

$ dnbc4tools rna run -h
usage: dnbc4tools rna run [OPTIONS]

optional arguments:
  -h, --help            show this help message and exit

Input Files:
  Choose ONE input method: either --fastqs (directory) OR all four individual FASTQ files (-c1, -c2, -i1, -i2).

  --fastqs <DIR>        Directory containing cDNA and oligo FASTQ subfolders (e.g., cDNA/sample_cdna_R1.fastq.gz, oligo/sample_oligo_R1.fastq.gz). The pipeline automatically detects paired-end files. Example: ./fastq_dir
  -c1, --cDNAfastq1 <FILE> [<FILE> ...]
                        Read1 FASTQ file(s) for cDNA (supports wildcards and comma-separated lists). Used for gene expression data. Example: sample1_R1.fastq.gz,sample2_R1.fastq.gz
  -c2, --cDNAfastq2 <FILE> [<FILE> ...]
                        Read2 FASTQ file(s) for cDNA (supports wildcards and comma-separated lists). Must match --cDNAfastq1 order. Example: sample1_R2.fastq.gz,sample2_R2.fastq.gz
  -i1, --oligofastq1 <FILE> [<FILE> ...]
                        Read1 FASTQ file(s) for oligo (supports wildcards and comma-separated lists). Used for barcode merging. Example: sample1_oligo_R1.fastq.gz
  -i2, --oligofastq2 <FILE> [<FILE> ...]
                        Read2 FASTQ file(s) for oligo (supports wildcards and comma-separated lists). Must match --oligofastq1 order. Example: sample1_oligo_R2.fastq.gz

Basic Settings:
  -n, --name <STR>      Unique identifier for the sample (e.g., sample1). Used for naming output files and reports.
  -g, --genomeDir <DIR>
                        Path to reference genome directory containing STAR index files. Example: ./genome_index
  -o, --outdir <DIR>    Output directory for results and reports [default: current directory]. Example: ./output
  -t, --threads <INT>   Number of CPU threads for parallel processing [default: all available cores] (e.g., 16).

Filtering Settings:
  --calling_method <STR>
                        Cell detection method [default: emptydrops]. Options: barcoderanks, emptydrops.
  --expectcells <INT>   Expected number of cells to guide detection [default: auto] (e.g., 3000).
  --forcecells <INT>    Force pipeline to use exactly this number of cells, overriding detection (e.g., 5000).
  --minumi <INT>        Minimum UMI count per cell to retain [default: 1000].

Library Settings:
  Configure sequencing library settings for barcode, UMI, and read structure.
  Auto-detection is recommended for chemistry and dark cycles.
  Use --customize twice for cDNA and oligo patterns, e.g., 
  --customize "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R2:1-100" --customize "cb,R1:1-10;cb,R1:11-20;R1,R2:1-30".

  --chemistry <STR>     Library chemistry version [default: auto]. Options: scRNAv1HT, scRNAv2HT, scRNAv3HT, scRNA5Pv1, auto (automatic detection).
  --darkreaction <STR>  Dark cycle setting for cDNA and oligo libraries [default: auto]. Provide two comma-separated values: <cDNA>,<oligo> Each field options: auto (automatic detection), R1R2 (both reads), R1 (Read1 only), unset (no
                        dark cycles). Examples: R1,R1R2; R1,R1; unset,unset.
  --customize <STR>     Custom read structure for barcode, UMI, or sequence extraction, format: <type>,<read>:<start>-<end> separated by ';'. Types: cb (cell barcode), umi (UMI) R1/R2 (sequence). Examples:
                        "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R2:1-100"

Analysis Settings:
  --no_introns          Exclude intronic reads from the expression matrix to increase specificity.
  --end5                Enable 5'-end scRNA-seq analysis for 5' gene expression profiling.
  --no_bam              Skip BAM file generation to save time and disk space.
  --sample_read_pairs <INT>
                        Subsample this number of cDNA read pairs for analysis (e.g., 1000000).

📝 Parameter Description

🔴 Required Parameters

⚠️ Essential parameters that must be specified for a successful analysis

`-n, --name` (Required)

Provide a unique name for this analysis run.

Function: This name will be used as a prefix for all output files and the HTML report.
Display: In the final web report, this name will be shown as the Sample ID.

Default: None

Example:

--name sample_001

`-g, --genomeDir` (Required)

Specify the path to the reference genome directory.

Requirement: The directory must contain the index and annotation resources generated by the mkref command.
Content: Includes genome sequence, STAR alignment index, etc.

Default: None

Example:

--genomeDir /path/to/genome/database

🟢 Input File Parameters

📁 Choose one input method: Directory-based OR specify individual files

`--fastqs` (Method 1)

Specify the path to the directory containing all FASTQ files.

Function: The pipeline will automatically detect paired files within this directory (including cDNA and oligo subdirectories).
Note: This is a convenience option and cannot be used simultaneously with --cDNAfastq1 / --cDNAfastq2 / --oligofastq1 / --oligofastq2.

Default: None

Example:

--fastqs ./fastq_directory

`-c1, --cDNAfastq1` (Method 2A)

Specify one or more cDNA Read1 FASTQ files individually.

Support: You can use wildcards (*) to match files or a comma-separated list for multiple files.
Requirement: Must be used in pairs with the --cDNAfastq2 parameter, and the file order must match exactly.

Default: None

Example:

--cDNAfastq1 sample_cDNA_L01_R1.fastq.gz,sample_cDNA_L02_R1.fastq.gz

`-c2, --cDNAfastq2` (Method 2B)

Specify one or more cDNA Read2 FASTQ files individually.

Support: You can use wildcards (*) to match files or a comma-separated list for multiple files.
Requirement: Must be used in pairs with the --cDNAfastq1 parameter, and the file order must match exactly.

Default: None

Example:

--cDNAfastq2 sample_cDNA_L01_R2.fastq.gz,sample_cDNA_L02_R2.fastq.gz

`-i1, --oligofastq1` (Method 2C)

Specify one or more oligo Read1 FASTQ files individually.

Support: You can use wildcards (*) to match files or a comma-separated list for multiple files.
Requirement: Must be used in pairs with the --oligofastq2 parameter, and the file order must match exactly.

Default: None

Example:

--oligofastq1 sample_oligo_R1.fastq.gz

`-i2, --oligofastq2` (Method 2D)

Specify one or more oligo Read2 FASTQ files individually.

Support: You can use wildcards (*) to match files or a comma-separated list for multiple files.
Requirement: Must be used in pairs with the --oligofastq1 parameter, and the file order must match exactly.

Default: None

Example:

--oligofastq2 sample_oligo_R2.fastq.gz

⚠️ Input Method Selection:

🔸 Method 1: Use --fastqs to specify a directory containing cDNA and oligo subfolders.

🔸 Method 2: Use -c1, --cDNAfastq1, -c2, --cDNAfastq2, -i1, --oligofastq1, -i2, --oligofastq2 to specify R1 and R2 files respectively.

⚠️ Important Note: All files under a parameter must come from the same library, with consistent sequencing mode and dark reaction settings. Data from different libraries cannot be merged for analysis.

🟢 Basic Settings

`-o, --outdir` (Optional)

Specify the output directory for all analysis results and reports.

Function: All analysis results will be saved in this directory, and the pipeline will automatically create a structured subdirectory named after the sample.

Default: ./ (current directory)

Example:

--outdir ./output_results

`-t, --threads` (Optional)

Set the number of CPU threads to be used during the analysis.

Function: Increasing the number of threads can significantly speed up the analysis.
Recommendation: Adjust based on the number of available CPU cores for optimal performance.

Default: Use all available CPU cores

Example:

--threads 16

🟢 Filtering Settings

`--calling_method` (Optional)

Set the cell identification method to distinguish real cells from empty droplets.

Method Comparison

barcoderanks

Principle: Uses an empirical threshold based on total UMI counts, identifying cells via the "knee point" of the UMI rank plot.
Use Case: Quick preliminary analysis or in scenarios where cells are clearly distinct from the background.

emptydrops (Default)

Principle: Based on a statistical test of the expression profile to determine if a cell's profile is significantly different from the ambient RNA background.
Use Case: Standard analysis (recommended), accurately identifies cells with low RNA content and controls for false positives.

Default: emptydrops

Example:

# Switch to barcoderanks for cell identification
dnbc4tools rna run --name sample1 --fastqs ./fq --genomeDir ./ref --calling_method barcoderanks

`--expectcells` (Optional)

Set the expected number of recovered cells.

Function: Provides initial guidance for the emptydrops algorithm's preliminary screening.
Recommendation: The default auto mode is recommended, which automatically estimates the cell count based on UMI distribution features. If the effective cell count is known, you can also manually set it to 50% of that number as a preliminary screening basis.

Default: auto

Example:

# Expect to recover 3000 cells
dnbc4tools rna run --name sample1 --fastqs ./fq --genomeDir ./ref --expectcells 3000

`--forcecells` (Optional)

Force the pipeline to use an exact number of cells, overriding the software's automatic cell detection.

Function: Use when you want to analyze a cell population of a known quantity.
Priority: This is the highest-priority filtering parameter.

Default: None

Example:

# Force the output of 5000 cells for analysis
dnbc4tools rna run --name sample1 --fastqs ./fq --genomeDir ./ref --forcecells 5000

`--minumi` (Optional)

Set the minimum UMI count to retain a cell.

Function: This is a core cell quality control parameter. Cells below this threshold are considered to have poor data quality and will be excluded from subsequent analysis.
Recommendation: Use the default value for the initial analysis, then determine a more appropriate threshold based on the "UMI Count Distribution" plot in the web report.

Default: 1000

Example:

# Lower the UMI threshold for cell filtering to 500
dnbc4tools rna run --name sample1 --fastqs ./fq --genomeDir ./ref --minumi 500

Note

💡 Cell Identification Analysis Recommendations

Cell identification is a critical step in single-cell analysis. Correct parameter settings and result interpretation directly impact the quality and reliability of subsequent analyses.

Click to view Diagnostics & Strategies

1. Abnormal Cell Count

Cell count too low
Symptom: Detected cells < 50% of expected.
Cause: UMI threshold too high, severe empty droplet contamination, poor library quality.
Solution: Lower --minumi, adjust --expectcells, check raw data quality.

Cell count too high
Symptom: Detected cells > 200% of expected.
Cause: Inaccurate cell counting, UMI threshold too low, high background noise.
Solution: Increase --minumi, use --forcecells to limit the count.

Abnormal UMI distribution
Symptom: UMI rank plot shows no clear "knee point".
Cause: Insufficient sequencing depth, poor library diversity, technical failure.
Solution: Increase sequencing depth, rebuild the library.

2. Abnormal Cell Identification Curve

Gradual decline with no knee point
Meaning: Difficult to distinguish between real cells and background empty droplets.
Solution: Use --forcecells to set a conservative cell count and combine with downstream QC.

Multiple knee points
Meaning: Presence of different cell populations or doublet contamination.
Solution: Choose the cell count corresponding to the main knee point and perform doublet detection and removal later.

Steep decline
Meaning: High-quality cells are clearly distinguished from the background, which is the ideal case.
Solution: Use the default emptydrops algorithm; you can consider lowering --minumi slightly.

Severe noise fluctuation
Meaning: High technical noise, poor data quality.
Solution: Increase the --minumi threshold, consider re-sequencing or optimizing experimental conditions.

Best Practice Tip

For the initial analysis, it is recommended to use the default parameters to get a preliminary result, then make targeted parameter adjustments based on the statistics and visualizations in the HTML report.

🟢 Library Settings

`--chemistry` (Optional)

Configure the chemistry version of the scRNA kit, which determines the sequence structure of barcodes and UMIs.

Function: Guides the software to correctly parse the barcode and UMI sequence structures.
- Supported Versions: scRNAv1HT, scRNAv2HT, scRNAv3HT, scRNA5Pv1
Smart Detection (auto): Default setting. The software automatically identifies the kit version by analyzing the sequence structure of the first 200,000 reads based on the position patterns of barcodes and UMIs. If it cannot be identified, the pipeline will prompt for manual specification. Highly recommended for initial analysis.

Default: auto

Example:

# Scenario: Library is known to be scRNAv2HT and auto-analysis failed
dnbc4tools rna run --name sample2 --fastqs ./fq --genomeDir ./ref --chemistry scRNAv2HT

⚠️ Important Note: Incorrect settings may lead to cell barcode identification failure. Specify manually only if you know the library structure or if auto-detection fails.

`--darkreaction` (Optional)

Configure the dark cycle settings for the cDNA and oligo libraries.

Function: Guides the software to correctly parse dark reaction cycles generated by the sequencing chemistry (e.g., on MGI platforms).
- Configuration Format: <cDNA_setting>,<oligo_setting> (comma-separated).
- Supported Options: auto (auto-detection), R1R2 (both ends), R1 (R1 only), unset (none).
Smart Detection (auto): Default setting. The software automatically identifies the kit version by analyzing the sequence structure of the first 200,000 reads based on sequence length and fixed sequence positions. If it cannot be identified, the pipeline will prompt for manual specification. Highly recommended for initial analysis.

Default: auto

Examples:

# Example 1: cDNA library has dark cycle on R1, oligo library has dark cycles on both ends
--darkreaction R1,R1R2

# Example 2: Both libraries have dark cycles on R1 only
--darkreaction R1,R1

# Example 3: Neither library has dark cycles
--darkreaction unset,unset

⚠️ Important Note: Incorrect settings may lead to cell barcode identification failure. Specify manually only if you know the library structure or if auto-detection fails.

`--customize` (Advanced)

Precisely define the extraction structure for barcodes, UMIs, and effective sequences (reads) for non-standard libraries. This is an advanced feature that overrides --chemistry and --darkreaction settings.

Syntax: "<type>,<read>:<start>-<end>", with multiple segments separated by semicolons (;).
- Parameter Types (type):
  - cb: Cell Barcode
  - umi: UMI (Unique Molecular Identifier)
  - R1: Effective DNA sequence in Read1
  - R2: Effective DNA sequence in Read2 (for paired-end sequencing only)
Dual Configuration: You must specify the --customize parameter twice, once for the cDNA library and once for the oligo library.
Notes:
- The entire parameter string must be enclosed in quotes.
- Coordinates are 1-based and cannot exceed the read length.

Examples:

# For a cDNA library with structure: Barcode 1(1-10bp) + Barcode 2(11-20bp) + UMI(21-30bp) in R1; sequence(1-100bp) in R2
--customize "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R2:1-100"

# For a cDNA library with structure: Barcode 1(7-16bp) + Barcode 2(23-32bp) + UMI(38-47bp) in R1; sequence(1-100bp) in R2
--customize "cb,R1:7-16;cb,R1:23-32;umi,R1:38-47;R1,R2:1-100"

# For a 5'-end transcript cDNA library using data from both ends
--customize "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R1:31-120;R2,R2:1-150"

# Example: Custom sequence structures for cDNA and oligo libraries respectively
--customize "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R2:1-100" --customize "cb,R1:1-10;cb,R1:11-20;R1,R2:1-30"

⚠️ Risk Warning: Incorrect custom configurations can lead to data loss or analysis failure. Use only when standard configurations do not meet your needs.

🚩 Analysis Settings

`--no_introns` (Flag)

Enable this parameter to filter out reads from intronic regions during analysis.

Function: Retains only reads from exonic regions for expression quantification, avoiding interference from immature transcripts.

Default: If not set, reads from intronic regions are included.

`--end5` (Flag)

Enable 5'-end single-cell transcriptome data analysis mode.

Function: Specifically for analyzing mRNA captured at the 5' end.
Note: Use this parameter only when using a 5'-end scRNA kit.

Default: Not set.

`--no_bam` (Flag)

Enable this parameter to skip the generation of BAM files.

Function: Saves time and disk space, significantly reducing computation time and storage requirements.
Note: Downstream analysis requiring BAM files will not be possible.

Default: If not set, BAM files are generated.

`--sample_read_pairs` (Optional)

Extract a specified number of read pairs from the input cDNA FASTQ files for analysis.

Function: Used for quick testing of large datasets before a full analysis, or for down-sampling analysis when resources are limited.

Default: None (uses all data)

Example:

--sample_read_pairs 100000000

💡 Analysis Recommendation

For the initial analysis, it is recommended to use the default parameters and then adjust them as needed based on the results report.

📊 Reference Database Construction (mkref)

📊 Usage

$ dnbc4tools rna mkref -h
usage: dnbc4tools rna mkref [-h] 

optional arguments:
  -h, --help          show this help message and exit

Input Files:
  Input genome FASTA files and gene annotation GTF files. For mixed species analysis, separate multiple files with commas.

  --fasta <FILE>      Reference genome FASTA file path(s). Separate multiple files with commas
  --ingtf <FILE>      Gene annotation GTF file path(s). Separate multiple files with commas

Basic Settings:
  --genomeDir <DIR>   Output directory for generated reference files [default: current directory]
  --species <STR>     Species identifier(s). Use commas for mixed species analysis [default: undefined]
  --threads <INT>     Number of CPU threads for parallel processing [default: 10]

Advanced Settings:
  Advanced configuration options for reference genome building.
  Use these settings to customize STAR indexing behavior and resource usage.
  Parameters in extra-args will override default parameters if conflicts exist.
  Can be a space-separated string of parameters (e.g., "--sjdbOverhang 100 --runThreadN 16").

  --chrM <STR>        Mitochondrial chromosome identifier in reference genome [default: auto]
  --limitram <INT>    Maximum RAM (GB) allowed for index generation
  --extra-args <STR>  Additional STAR parameters to pass directly to STAR index generation
  --noindex           Skip STAR index generation step

📝 Parameter Description

🔴 Required Parameters

`--fasta` (Required)

Provide the reference genome sequence file.

Requirement: Standard FASTA format, primary assembly version is recommended.

Default: None

Example:

--fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa

`--ingtf` (Required)

Provide the gene structure annotation file.

Function: Used for gene expression quantification and annotation.
Requirement: Standard GTF format.
- Required Features: Must contain gene/transcript, exon type annotation entries.
- Required Attributes: Must contain gene_id/gene_name, transcript_id/transcript_name attributes.
- Chromosome Names: Must match the chromosome names in the FASTA genome file.
- Coordinates: Start and end coordinates must be valid.

Default: None

Example:

--ingtf Homo_sapiens.GRCh38.108.gtf

Note

Dual-Species Analysis Configuration

For dual-species analysis, both --fasta and --ingtf parameters support providing file paths for two species, separated by commas.

Example: --fasta human.fa,mouse.fa --ingtf human.gtf,mouse.gtf
Important Note: Please ensure that the order of FASTA files, GTF files, and the --species parameter is strictly consistent, meaning each FASTA file corresponds to its respective GTF file and species parameter in the list.

🟢 Settings

`--genomeDir` (Optional)

Specify the output directory for the generated reference database.

Function: All generated reference files (index, annotations, etc.) will be stored in this directory.

Directory Structure Preview

  genomeDir/
  ├── fasta/
  │   └── genome.fa          # Processed genome sequence file
  ├── genes/
  │   └── genes.gtf          # Processed gene annotation file
  ├── star/
  │   ├── SA                 # STAR index file
  │   ├── SAindex            # STAR index core file
  │   ├── chrLength.txt      # Chromosome length information
  │   ├── chrName.txt        # Chromosome name information
  │   ├── chrNameLength.txt  # Chromosome name and length
  │   ├── chrStart.txt       # Chromosome start position
  │   ├── Genome             # Genome sequence compressed file
  │   ├── genomeParameters.txt # Genome parameter configuration
  │   ├── Log.out            # STAR index construction log
  │   ├── sjdbInfo.txt       # Splice junction database information
  │   ├── sjdbList.fromGTF.out.tab # Splice junctions extracted from GTF
  │   ├── sjdbList.out.tab   # List of all splice junctions
  │   └── mtgene.list        # List of mitochondrial genes
  └── ref.json               # Database configuration and metadata file

Default: ./ (current directory)

Example:

dnbc4tools rna mkref --fasta genome.fa --ingtf genes.gtf --genomeDir /database/scRNA/GRCh38

`--species` (Optional)

Specify one or more species names for the reference database.

Function: This name is recorded in the configuration file and used for species identification, gene annotation, and cell annotation in subsequent analyses.

Dual-Species Analysis Configuration

Naming Format: Use commas to separate multiple species names (e.g., hg38,mm10).
Order Requirement: Must be strictly consistent with the order of --fasta and --ingtf files.
Automatic Processing: The pipeline automatically adds a species prefix to genes (e.g., hg38_GENE1) and separates statistical information in the results.

Cell Annotation Support

Providing this parameter for specific species enables automatic downstream cell type annotation.

Supported: Homo_sapiens (or hg38), Mus_musculus (or mm10).
Not Supported: Other species do not support cell annotation.

Default: undefined

Examples:

# Single species
--species Homo_sapiens

# Dual species (human + mouse)
--species hg38,mm10

`--threads` (Optional)

Set the number of CPU threads to be used during STAR index construction.

Performance Impact: Increasing the number of threads can significantly shorten the index construction time.
Resource Balance: Be mindful of the balance between the number of threads and available RAM; too many threads can lead to insufficient memory.

Default: 10

Example:

--threads 16

🟢 Advanced Settings

`--chrM` (Optional)

Specify the name of the mitochondrial chromosome.

Function: Used to assess cell quality. High mitochondrial gene expression often indicates cell stress or death.
Auto-detection: By default, it will automatically identify from common names (e.g., chrM, MT).

Default: auto

Example:

# If the mitochondrial chromosome name is "mitochondrion"
dnbc4tools rna mkref --fasta genome.fa --ingtf genes.gtf --chrM mitochondrion

`--limitram` (Optional)

Set the maximum available memory (in GB) for the STAR genome index generation process.

Function: A reasonable memory limit can prevent system memory exhaustion and increase the success rate of index construction.

Default: None

Example:

--limitram 64

`--extra-args` (Advanced)

Pass additional command-line arguments directly to STAR index generation.

Function: For special requirements and performance optimization.
Note: Improper parameter settings can lead to index construction failure or subsequent analysis issues.

Default: None

Example:

--extra-args "--sjdbOverhang 99 --runThreadN 20"

`--noindex` (Flag)

If this parameter is set, it will only generate the configuration file without building the genome index.

Function: Use this parameter to skip the time-consuming index construction step when the index files already exist.

Default: Not set

Example:

# Generate only the configuration file, do not build the index
dnbc4tools rna mkref --fasta genome.fa --ingtf genes.gtf --noindex

Tip

📋 Database Construction Technical Notes:

For genomes with numerous and variably sized chromosomes, the database construction is adjusted to automatically determine optimal values for genomeSAindexNbases and genomeChrBinNbits.
Upon completion of database construction, a ref.json file will be generated in the database directory to record all key configuration information.
Dual-species analysis automatically adds a species prefix to each gene (e.g., hg38_GENE1, mm10_GENE2) to differentiate genes from different species.
All build parameters and version information are recorded in ref.json to ensure the reproducibility of the analysis.

📋 Single-Species ref.json File Example:

{
    "chrmt": "chrM",
    "genome": "/database/scRNA/Homo_sapiens/fasta/genome.fa",
    "genomeDir": "/database/scRNA/Homo_sapiens/star",
    "gtf": "/database/scRNA/Homo_sapiens/genes/genes.gtf",
    "input_fasta_files": [
        "genome.fa"
    ],
    "input_gtf_files": [
        "genes.gtf"
    ],
    "mtgenes": "/database/scRNA/Homo_sapiens/star/mtgene.list",
    "species": "Homo_sapiens",
    "version": "dnbc4tools 3.0beta"
}

📋 Dual-Species ref.json File Example:

{
    "chrmt": "hg38_chrM,mm10_chrM",
    "genome": "/database/scRNA/hg38_and_mm10/fasta/genome.fa",
    "genomeDir": "/database/scRNA/hg38_and_mm10/star",
    "gtf": "/database/scRNA/hg38_and_mm10/genes/genes.gtf",
    "input_fasta_files": [
        "hg38_genome.fa",
        "mm10_genome.fa"
    ],
    "input_gtf_files": [
        "hg38_genes.gtf",
        "mm10_genes.gtf"
    ],
    "mtgenes": "/database/scRNA/hg38_and_mm10/star/mtgene.list",
    "species": "hg38_and_mm10",
    "version": "dnbc4tools 3.0beta"
}

📋 Performance Optimization Recommendations:

For commonly used genomes (e.g., human, mouse), it is recommended to pre-build the index and reuse it across multiple projects.
Dual-species analysis index construction takes longer and is recommended to be performed when computational resources are ample.
Regularly check for updates from databases like Ensembl to keep reference genomes and annotation files current.

📋 Multi-sample Operations (multi)

📊 Usage

$ dnbc4tools rna multi -h
usage: dnbc4tools rna multi [-h] 

optional arguments:
  -h, --help            show this help message and exit
  --list <LIST>         Path to the sample list file. Each line should contain sample name, cDNA FASTQ paths, and oligo FASTQ paths.
  --genomeDir <DATABASE>
                        Path to the directory containing genome files.
  --outdir <OUTDIR>     Output directory. [default: current directory].
  --threads <CORENUM>   Number of threads used for analysis. [default: 20].
  --end5                Perform 5'-end single-cell transcriptome analysis.

📝 Parameter Description

🔴 Required Parameters

`--list` (Required)

Specify the path to the list file containing information for multiple samples.

File Format: Tab-separated (\t) text file, UTF-8 encoding recommended.
Column Structure:
1. Sample Name
2. cDNA Data Path
3. Oligo Data Path

Path Format Rules

Multiple FASTQ files: Paths for multiple FASTQ files from the same library should be separated by commas (,).
R1 and R2 files: Paths for paired R1 and R2 files should be separated by semicolons (;).
Path Type: Both absolute and relative paths are supported.

Default: None

Example:

# Example 1: SampleA, with 1 pair of R1/R2 files for cDNA and oligo each
SampleA	/path/to/A_cDNA_R1.fq.gz;/path/to/A_cDNA_R2.fq.gz	/path/to/A_oligo_R1.fq.gz;/path/to/A_oligo_R2.fq.gz

# Example 2: SampleB, with 2 pairs of R1/R2 files for cDNA, and 1 pair for oligo
SampleB	/path/to/B_cDNA_L01_R1.fq.gz,/path/to/B_cDNA_L02_R1.fq.gz;/path/to/B_cDNA_L01_R2.fq.gz,/path/to/B_cDNA_L02_R2.fq.gz	/path/to/B_oligo_R1.fq.gz;/path/to/B_oligo_R2.fq.gz

📝 Parameter Inheritance Note

For other analysis parameter settings, please refer to the corresponding parameters of the dnbc4tools rna run command. All samples should use the same reference database.

💡 Tip

This document is continuously updated. If you find any errors or have information to add, your feedback is welcome.

📝 Document Version: 3.0 beta | Last Updated: 2025

🧬 DNBelab C Series HT scRNA Analysis Software
High-performance single-cell transcriptome data analysis pipeline

🧬 DNBelab C Series HT scRNA Analysis Parameters

🔬 Main Analysis Pipeline (run)

📊 Usage

📝 Parameter Description

🔴 Required Parameters

-n, --name (Required)

-g, --genomeDir (Required)

🟢 Input File Parameters

--fastqs (Method 1)

-c1, --cDNAfastq1 (Method 2A)

-c2, --cDNAfastq2 (Method 2B)

-i1, --oligofastq1 (Method 2C)

-i2, --oligofastq2 (Method 2D)

🟢 Basic Settings

-o, --outdir (Optional)

-t, --threads (Optional)

🟢 Filtering Settings

--calling_method (Optional)

barcoderanks

emptydrops (Default)

--expectcells (Optional)

--forcecells (Optional)

--minumi (Optional)

💡 Cell Identification Analysis Recommendations

🟢 Library Settings

--chemistry (Optional)

--darkreaction (Optional)

--customize (Advanced)

🚩 Analysis Settings

--no_introns (Flag)

--end5 (Flag)

--no_bam (Flag)

--sample_read_pairs (Optional)

📊 Reference Database Construction (mkref)

📊 Usage

📝 Parameter Description

🔴 Required Parameters

--fasta (Required)

--ingtf (Required)

🟢 Settings

--genomeDir (Optional)

--species (Optional)

--threads (Optional)

🟢 Advanced Settings

--chrM (Optional)

--limitram (Optional)

--extra-args (Advanced)

--noindex (Flag)

📋 Multi-sample Operations (multi)

📊 Usage

📝 Parameter Description

🔴 Required Parameters

--list (Required)

`-n, --name` (Required)

`-g, --genomeDir` (Required)

`--fastqs` (Method 1)

`-c1, --cDNAfastq1` (Method 2A)

`-c2, --cDNAfastq2` (Method 2B)

`-i1, --oligofastq1` (Method 2C)

`-i2, --oligofastq2` (Method 2D)

`-o, --outdir` (Optional)

`-t, --threads` (Optional)

`--calling_method` (Optional)

`--expectcells` (Optional)

`--forcecells` (Optional)

`--minumi` (Optional)

`--chemistry` (Optional)

`--darkreaction` (Optional)

`--customize` (Advanced)

`--no_introns` (Flag)

`--end5` (Flag)

`--no_bam` (Flag)

`--sample_read_pairs` (Optional)

`--fasta` (Required)

`--ingtf` (Required)

`--genomeDir` (Optional)

`--species` (Optional)

`--threads` (Optional)

`--chrM` (Optional)

`--limitram` (Optional)

`--extra-args` (Advanced)

`--noindex` (Flag)

`--list` (Required)