🧬 DNBelab C Series HT scVDJ 分析参数

📊 用法

$ dnbc4tools vdj run -h
usage: dnbc4tools vdj run [OPTIONS] 

optional arguments:
  -h, --help            show this help message and exit

Input Files:
  Choose ONE input method: either --fastqs (directory) OR individual FASTQ files (-1 and -2).

  --fastqs <DIR>        Input directory containing paired-end FASTQ files. The pipeline automatically detects Read1/Read2 files. Example: ./fastq_dir
  -1, --fastq1 <FILE> [<FILE> ...]
                        Read1 FASTQ file(s) (supports wildcards and comma-separated lists). Example: sample1_L01_R1.fastq.gz,sample1_L02_R1.fastq.gz
  -2, --fastq2 <FILE> [<FILE> ...]
                        Read2 FASTQ file(s) (supports wildcards and comma-separated lists). Must match --fastq1 order. Example: sample1_L01_R2.fastq.gz,sample1_L02_R2.fastq.gz

Basic Settings:
  -n, --name <STR>      Unique identifier for the sample (e.g., sample1). Used for naming output files and reports.
  -r, --ref <REF>         Reference database: 'human'/'mouse' (case-insensitive) or path to a custom reference directory containing reference.json. Examples: human | mouse | ./custom_vdj_ref
  -c, --chain <STR>     VDJ receptor type: 'IG' (B-cell receptors) or 'TR' (T-cell receptors).
  -o, --outdir <DIR>    Output directory for results and reports [default: current directory]. Example: ./output
  -t, --threads <INT>   Number of CPU threads for parallel processing [default: all available cores] (e.g., 16).
  -s, --beadstrans <FILE>
                        RNA analysis singlecell.csv file for filtering cells and merging beads information. When not provided, all cells will be kept by default (equivalent to --keep_all_cells).

Library Settings:
  Auto-detection is recommended for dark cycles. Available modes include "R1" and "unset".
  For multiple files, ensure consistent settings across all inputs.
  customize: Specify sequence structure patterns for parsing.

  --darkreaction <STR>  Dark cycle setting for VDJ library [default: auto]. Use 'R1' if dark cycles occur in Read1; otherwise leave as 'auto' or 'unset'.
  --customize <STR>     Sequence structure patterns, format: <type>,<read>:<start>-<end> separated by ';'. Types include: cb (cell barcode), umi (UMI) R1/R2 (sequence). Example:
                        "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R1:31-120;R2,R2:1-150"
  --enrichment_primers <FILE>
                        Custom inner enrichment primers file (one primer sequence per line). Required when using a custom reference database.

Analysis Settings:
  --keep_all_cells      Keep all cells in analysis without RNA data filtering. If --beadstrans is not provided, this behavior is enabled by default.
  --r2_only             Only use R2 reads for VDJ assembly. Manual setting required because Read1 assembly requirements cannot be auto-detected.
  --sample_read_pairs <INT>
                        Subsample the specified number of read pairs from the input FASTQ files (e.g., 1000000).

📝 参数说明

🔴 必需参数

⚠️ 成功分析必须指定的基本参数

`-n, --name` (必需)

为本次分析提供一个唯一的样本名称。

功能: 该名称将用作所有输出文件和HTML报告的前缀。
显示: 在最终的网页报告中，此名称将作为样本ID显示。

默认值: 无

示例:

--name sample_VDJ_001

`-r, --ref` (必需)

指定VDJ分析使用的参考数据库。

功能: 指定VDJ分析使用的参考数据库。
内置支持: 软件自带人类(human)和小鼠(mouse)的参考数据库。
自定义支持: 可提供包含reference.json的自定义参考目录路径。

默认值: 无

示例:

# 使用内置的人类参考数据库
--ref human

# 使用自定义参考数据库
--ref ./custom_vdj_ref

`-c, --chain` (必需)

指定分析的免疫受体类型。

核心功能: 指定分析的免疫受体类型，直接影响V(D)J基因段的识别和重组分析。
TR: T-cell Receptor (T细胞受体)，用于T细胞研究。
IG: Immunoglobulin (免疫球蛋白)，用于B细胞研究。

默认值: 无

示例:

# 分析T细胞受体
--chain TR

# 分析B细胞受体
--chain IG

🟢 输入文件参数

📁 选择一种输入方式：基于目录 OR 单独指定文件

`--fastqs` (方式1)

指定包含所有FASTQ文件的目录路径。

功能: 流程会自动检测此目录下的配对文件（R1/R2）。
注意: 这是一个便捷选项，不能与 --fastq1 / --fastq2 同时使用。

默认值: 无

示例:

--fastqs ./VDJ_fastq_dir

`-1, --fastq1` (方式2A)

单独指定一个或多个VDJ文库的Read1 FASTQ文件。

支持: 可以使用通配符 (*) 匹配文件，使用逗号分隔来指定多个文件。
要求: 必须与 --fastq2 参数配对使用，且文件顺序必须完全匹配。

默认值: 无

示例:

--fastq1 sample1_L01_R1.fastq.gz,sample1_L02_R1.fastq.gz

`-2, --fastq2` (方式2B)

单独指定一个或多个VDJ文库的Read2 FASTQ文件。

支持: 可以使用通配符 (*) 匹配文件，使用逗号分隔来指定多个文件。
要求: 必须与 --fastq1 参数配对使用，且文件顺序必须完全匹配。

默认值: 无

示例:

--fastq2 sample1_L01_R2.fastq.gz,sample1_L02_R2.fastq.gz

⚠️ 输入方式选择：

🔸 方式1： 使用--fastqs指定包含配对文件的目录。

🔸 方式2： 使用-1, --fastq1和-2, --fastq2分别指定R1和R2文件。

⚠️ 重要提示： 参数下所有文件必须来自同一文库，测序模式和暗反应设置保持一致，不同文库的数据不能合并分析。

🟢 基本设置参数

`-o, --outdir` (可选)

指定所有分析结果和报告的输出目录。

功能: 所有分析结果将保存在此目录中，流程会自动创建以样本名命名的结构化子目录。

默认值: ./ (当前目录)

示例:

--outdir ./VDJ_analysis_output

`-t, --threads` (可选)

设置分析过程中可使用的CPU线程数。

功能: 增加线程数可显著提高分析速度。
建议: 根据可用的CPU核心数进行调整，以获得最佳性能。

默认值: 使用所有可用的CPU核心

示例:

--threads 16

`-s, --beadstrans` (可选)

提供来自scRNA分析的singlecell.csv文件，用于细胞过滤和信息整合。

功能: 通过整合5' scRNA分析结果，实现磁珠合并与细胞过滤，进而建立单细胞RNA表达谱与VDJ重组序列的精确对应关系。
要求: 使用此功能需提供同一样本的5' scRNA分析输出文件singlecell.csv。
注意: 若未指定此参数，将跳过磁珠合并步骤，并默认保留所有检测到的细胞（等同于启用--keep_all_cells）。

默认值: 无

示例:

--beadstrans ./RNA_analysis_output/outs/singlecell.csv

🟢 文库设置参数

`--darkreaction` (可选)

配置VDJ文库的暗循环（dark cycle）设置。

功能: 指导软件正确解析因测序化学产生的暗反应周期。
智能检测 (auto): 默认设置。软件通过分析序列结构自动识别。强烈推荐初次分析时使用。
手动设置: 可选值为 R1 (Read1有暗循环) 或 unset (无暗循环)。

默认值: auto

示例:

# Read1存在暗循环
--darkreaction R1

⚠️ 重要提示：不正确的设置可能导致细胞条形码识别失败。仅在了解文库结构或自动检测失败时手动指定。

`--customize` (高级)

为非标准文库精确定义条形码（barcode）、UMI和有效序列（read）的提取结构。此参数为高级功能，会覆盖 --darkreaction 的设置。

语法格式: "<type>,<read>:<start>-<end>"，多个段落以分号(;)分隔。
- 参数类型 (type): cb (细胞条形码), umi (UMI), R1/R2 (有效序列)。
注意事项:
- 整个参数字符串必须用引号包裹。
- 坐标为1-based，且不能超过读长。

示例：

# 标准VDJ文库配置示例
--customize "cb,R1:1-10;cb,R1:11-20;umi,R1:21-30;R1,R1:31-120;R2,R2:1-150"

⚠️ 风险提示：错误的自定义配置可能导致数据丢失或分析失败，建议仅在标准配置无法满足需求时使用。

`--enrichment_primers` (可选)

指定用于VDJ区域特异性扩增的内部富集引物文件。

应用: 针对非人/鼠物种或使用自定义引物设计的VDJ文库。
格式: 纯文本文件，每行包含一个引物序列。
要求: 使用自定义参考数据库时必须提供此参数。

默认值: 无

文件内容示例:

GTCCTCGGTGGCCTCCACGTG
AGCACCTGGGGCCTCGGCCAC
CCTGGACTCCTGGGCCCCAG

🚩 分析设置参数

`--keep_all_cells` (标志)

启用此参数以保留所有检测到的细胞，不进行基于RNA数据的过滤。

功能: 当不提供--beadstrans参数时，此行为被自动启用。适用于独立的VDJ分析或需要最大化细胞回收的场景。

默认值: 不设置此参数（但若无--beadstrans则自动启用）

`--r2_only` (标志)

启用此参数以仅使用Read2序列进行VDJ组装。

功能: 适用于Read1仅包含条形码和UMI信息的文库设计。
注意: 软件无法自动检测此情况，需要根据文库设计手动指定。

默认值: 不设置此参数

`--sample_read_pairs` (可选)

从输入的FASTQ文件中提取指定数量的读段对进行分析。

功能: 用于在完整分析前对大数据集进行快速测试，或在资源有限时进行降采样分析。
注意: 子采样可能影响低频克隆型的检测，正式分析建议使用全部数据。

默认值: 无 (使用全部数据)

示例:

--sample_read_pairs 10000000

💡 提示

本文档持续更新中，如发现内容错误或需要补充的信息，欢迎反馈。

📝 文档版本： 3.0 beta | 最后更新： 2025年

🧬 DNBelab C Series HT scVDJ Analysis Software
高性能单细胞免疫组库数据分析流程

🧬 DNBelab C Series HT scVDJ 分析参数

🔬 主分析流程 (run)

📊 用法

📝 参数说明

🔴 必需参数

-n, --name (必需)

-r, --ref (必需)

-c, --chain (必需)

🟢 输入文件参数

--fastqs (方式1)

-1, --fastq1 (方式2A)

-2, --fastq2 (方式2B)

🟢 基本设置参数

-o, --outdir (可选)

-t, --threads (可选)

-s, --beadstrans (可选)

🟢 文库设置参数

--darkreaction (可选)

--customize (高级)

--enrichment_primers (可选)

🚩 分析设置参数

--keep_all_cells (标志)

--r2_only (标志)

--sample_read_pairs (可选)

`-n, --name` (必需)

`-r, --ref` (必需)

`-c, --chain` (必需)

`--fastqs` (方式1)

`-1, --fastq1` (方式2A)

`-2, --fastq2` (方式2B)

`-o, --outdir` (可选)

`-t, --threads` (可选)

`-s, --beadstrans` (可选)

`--darkreaction` (可选)

`--customize` (高级)

`--enrichment_primers` (可选)

`--keep_all_cells` (标志)

`--r2_only` (标志)

`--sample_read_pairs` (可选)