🧬 DNBelab C Series HT scVDJ Analysis Output Documentation

A Complete Guide to Single-Cell V(D)J Sequencing Analysis Output Files

📁 Directory Structure • 📋 File Details • 📊 Analysis Metrics • 📊 Report Interpretation

📖 Overview

After the single-cell VDJ analysis is complete, a standardized set of files and subdirectories is generated in the specified output directory, specifically for immune receptor repertoire analysis. This document details the content, format, and purpose of each output file to help users fully understand and efficiently utilize the V(D)J analysis results.

💡 Tip: VDJ analysis requires 5' end RNA sequencing data, and all output files adhere to the AIRR standard and are compatible with mainstream immunoinformatics tools.

⚠️ Prerequisite: 5' end single-cell RNA sequencing analysis must be completed first.

📁 Output Directory Structure

.
├── airr_annotations.tsv                    # Annotation file in AIRR standard format
├── all_contig_annotations.csv              # Annotation information for all assembled sequences
├── all_contig.fasta                        # FASTA file of all assembled sequences
├── all_contig.fasta.fai                    # Index file for all assembled sequences
├── clonotypes.csv                          # Clonotype analysis results
├── consensus_annotations.csv               # Annotation information for consensus sequences
├── consensus.fasta                         # FASTA file of consensus sequences
├── consensus.fasta.fai                     # Index file for consensus sequences
├── filtered_contig_annotations.csv         # Annotation information for filtered assembled sequences
├── filtered_contig.fasta                   # FASTA file of filtered assembled sequences
├── filtered_contig.fasta.fai               # Index file for filtered assembled sequences
├── metrics_summary.xls                     # Summary of analysis quality metrics
└── *_scVDJ_TR(IG)_report.html              # Analysis report in HTML format

📋 Detailed File Description

🧬 VDJ Assembly and Annotation Files

🎯 Core Content: Results of V(D)J contig sequence assembly, precise annotation, and quality assessment, covering the complete information of TCR and BCR rearranged sequences.

-----------

🧵 V(D)J Transcript Structure and Composition

Diagram of a Typical V(D)J Transcript Structure:

🔍 Explanation of Important Terms:

Region	Abbreviation	Biological Function
Untranslated Region	UTR	Regulates mRNA stability and translation efficiency; does not encode protein.
Framework Region	FWR	Maintains the conserved structural framework of the immunoglobulin fold.
Complementarity Determining Region	CDR	The key variable region that directly contacts the antigen and determines binding specificity.

🧬 Technical Advantage: The V(D)J analysis pipeline can accurately identify and provide the amino acid and nucleotide sequences of the framework (FWR) and complementarity determining (CDR) regions. All V(D)J annotation information for assembled contigs and clonotype consensus sequences is output in various standard formats.

-----------

🔍 Explanation of Important Annotation Standards

📋 Full-Length Sequence Determination Criteria (Full Length)

A contig sequence is identified as a full-length sequence if it meets the following strict conditions simultaneously:

✅ The contig sequence perfectly matches the 5' start region of an annotated V gene.
✅ The contig sequence extends completely to the 3' end region of a J gene.

🧬 Productive Sequence Determination Criteria (Productive)

A contig sequence is identified as a productive sequence (i.e., functionally active) if it meets all of the following conditions simultaneously:

✅ Meets all the requirements for a full-length sequence as described above.
✅ Contains a valid start codon (ATG) at the correct position.
✅ No premature stop codons are present within the V-J spanning region.
✅ The start codon of the V gene and the stop codon of the J gene are in the same reading frame.
✅ A complete CDR3 variable region is successfully identified.
✅ The length of the V-J spanning region is within the biologically plausible range for the respective gene.

🎯 High-Confidence Sequence Determination (High Confidence)

🔬 Expected Receptor Configurations for Different Cell Types:

Cell Type	Standard Receptor Configuration	Biological Significance
T Cell	1 productive TRA chain + 1 productive TRB chain	Normal TCR α/β heterodimer
B Cell	1 productive heavy chain + 1 productive light chain (κ or λ)	Normal BCR heavy/light chain pairing

🤔 Principles for Marking Low-Confidence Sequences:

⚠️ Important Note: The presence of extra productive contigs beyond the normal configuration is typically an anomaly and may arise from:

Anomaly Type	Cause Analysis
Ambient Contamination	Non-specific capture of free-floating mRNA, possibly from external sources or nucleic acids released by apoptotic cells.
Doublet Events	Droplets containing multiple cells (doublets), making it impossible to distinguish receptor signals from different cells.
Technical Artifacts	Artificial sequences generated during PCR amplification or sequencing, including chimeric sequences or incorrect primer binding.

📉 Basis for Determining Low-Confidence Sequences:

Abnormal receptor configuration patterns that are biologically highly improbable.
Suspicious sequences with significantly low UMI support.
A number of extra productive chains that clearly exceeds expectations.

-----------

📄 airr_annotations.tsv

Contains annotated and consensus sequences of V(D)J rearrangements in the AIRR standard format.

Purpose:
- Standardized Data Exchange: Serves as an exchange format compliant with AIRR community standards, facilitating integration with other immune repertoire analysis tools.
- In-depth Annotation: Provides detailed V, D, J gene call information, CIGAR strings, sequence alignment results, and the nucleotide and amino acid sequences of the CDR3 region.

Content and Format:

The file is in the AIRR standard TSV format.
The fields included in the file are shown in the table below:

Field Name	Detailed Description
`cell_id`	A unique identifier for the cell to which this rearrangement belongs, used for linking single-cell data.
`clone_id`	A clonotype ID that identifies the specific clonal group to which this rearrangement belongs, used for clonotype analysis.
`sequence_id`	The unique name or identifier of the contig (rearranged sequence).
`sequence`	The complete nucleotide sequence of the V(D)J rearrangement, including all variable, diversity, and joining regions.
`sequence_aa`	The amino acid sequence translated from the rearranged region, reflecting the functional protein product.
`productive`	Indicates whether the rearrangement is productive (biologically functional), requiring conditions like in-frame translation and no stop codons.
`rev_comp`	Indicates if the sequence is a reverse complement (default: false), used for sequence orientation marking.
`v_call`	The name of the identified V (variable) gene segment.
`v_cigar`	The CIGAR string for the V gene alignment, recording detailed alignment information (matches, insertions, deletions, etc.).
`d_call`	The name of the identified D (diversity) gene segment (only for heavy and beta chains).
`d_cigar`	The CIGAR string for the D gene alignment, detailing the alignment results of the diversity region.
`j_call`	The name of the identified J (joining) gene segment, a key element for completing V(D)J recombination.
`j_cigar`	The CIGAR string for the J gene alignment, recording precise alignment information of the joining region.
`c_call`	The name of the identified C (constant) gene segment, which determines the functional type of the antibody/receptor.
`c_cigar`	The CIGAR string for the C gene alignment, recording the alignment details of the constant region.
`sequence_alignment`	Detailed alignment result of the V(D)J rearranged region against the reference germline sequence, showing mutations and variations.
`germline_alignment`	Inferred full-length germline sequence alignment result, used for somatic hypermutation analysis.
`junction`	The nucleotide sequence of the V(D)J rearrangement's junction (CDR3 region), which determines antigen-binding specificity.
`junction_aa`	The amino acid sequence of the junction (CDR3 amino acids), the key domain for antigen recognition.
`junction_length`	The length of the CDR3 nucleotide sequence (in bp), which affects antigen-binding affinity and specificity.
`junction_aa_length`	The length of the CDR3 amino acid sequence (in aa), which determines the spatial structure of the antigen-binding loop.
`v_sequence_start`	The start position of the V region in the rearranged sequence (1-based coordinate system).
`v_sequence_end`	The end position of the V region in the rearranged sequence (1-based coordinate system).
`d_sequence_start`	The start position of the D region in the rearranged sequence (1-based coordinate system).
`d_sequence_end`	The end position of the D region in the rearranged sequence (1-based coordinate system).
`j_sequence_start`	The start position of the J region in the rearranged sequence (1-based coordinate system).
`j_sequence_end`	The end position of the J region in the rearranged sequence (1-based coordinate system).
`c_sequence_start`	The start position of the C region in the rearranged sequence (1-based coordinate system).
`c_sequence_end`	The end position of the C region in the rearranged sequence (1-based coordinate system).
`consensus_count`	The total number of reads supporting this rearrangement, reflecting sequencing depth and sequence reliability.
`duplicate_count`	The number of unique UMI molecules supporting this rearrangement, used for deduplication and quantitative analysis.
`is_cell`	Indicates whether this rearrangement originates from a real cell (TRUE: cell; FALSE: background/empty droplet).

-----------

📄 all_contig_annotations.csv

Contains detailed annotation information for all contig sequences (from both cellular and background barcodes).

Purpose:
- Comprehensive Data Review: Provides all assembled contig data, including low-quality or background signals, for in-depth quality control analysis.
- Complete Annotation: Offers full annotation of V(D)J gene segments and CDR/FWR regions.

Content and Format:

The file is in CSV text format.
The fields included in the file are shown in the table below:

Field Name	Description
`sample`	Sample name of the VDJ library.
`barcode`	The cell ID (or barcode) corresponding to this contig.
`is_cell`	A boolean value indicating if this cell ID was identified as a cell (TRUE for cell, FALSE for background).
`contig_id`	A unique identifier for this contig.
`high_confidence`	A boolean value indicating if this contig was marked as high confidence (not likely to be a chimera or other artifact).
`length`	The nucleotide length of the contig sequence (bp).
`chain`	The chain type associated with this contig: TRA, TRB, IGK, IGL, or IGH.
`v_gene`	The highest-scoring V gene segment, e.g., TRAV1-1.
`d_gene`	The highest-scoring D gene segment, e.g., TRBD1.
`j_gene`	The highest-scoring J gene segment, e.g., TRAJ1-1.
`full_length`	A boolean value indicating if this contig was declared as full-length.
`productive`	A boolean value indicating if this contig was declared as productive.
`fwr1`	The predicted FWR1 amino acid sequence.
`fwr1_nt`	The predicted FWR1 nucleotide sequence.
`cdr1`	The predicted CDR1 amino acid sequence.
`cdr1_nt`	The predicted CDR1 nucleotide sequence.
`fwr2`	The predicted FWR2 amino acid sequence.
`fwr2_nt`	The predicted FWR2 nucleotide sequence.
`cdr2`	The predicted CDR2 amino acid sequence.
`cdr2_nt`	The predicted CDR2 nucleotide sequence.
`fwr3`	The predicted FWR3 amino acid sequence.
`fwr3_nt`	The predicted FWR3 nucleotide sequence.
`cdr3`	The predicted CDR3 amino acid sequence.
`cdr3_nt`	The predicted CDR3 nucleotide sequence.
`fwr4`	The predicted FWR4 amino acid sequence.
`fwr4_nt`	The predicted FWR4 nucleotide sequence.
`reads`	The number of reads mapped to this contig.
`umis`	The number of distinct UMIs mapped to this contig.
`raw_clonotype_id`	The clonotype ID assigned to this cell barcode.
`raw_consensus_id`	The consensus sequence ID to which this contig was assigned.
`exact_subclonotype_id`	The exact subclonotype ID to which this cell barcode was assigned.

-----------

📄 all_contig.fasta

Contains the nucleotide sequences of all assembled contigs.

Purpose:
- Sequence Database: Serves as a sequence database for all contigs, which can be used for igBLAST alignment or other sequence analyses.
- Data Integrity: Provides the most original assembly results.
Content and Format:
- Standard FASTA format, where each sequence corresponds to a contig, and the sequence identifier is the unique name of the contig.

-----------

📄 filtered_contig_annotations.csv

A high-quality subset of all_contig_annotations.csv, containing only the annotation results for high-confidence contigs derived from cells.

Purpose:
- Core Downstream Analysis: This is the recommended input file for defining clonotypes and for most downstream analyses.
- High-Quality Data: Contains only contigs identified as from real cells and with high confidence, ensuring the accuracy of the analysis results.
Content and Format:
- The file format is identical to all_contig_annotations.csv.

-----------

📄 filtered_contig.fasta

A high-quality subset of all_contig.fasta, containing only high-quality contig sequences that have passed quality filtering and cell calling.

Purpose:
- Trusted Sequence Set: Provides a high-confidence set of rearranged sequences for subsequent functional analysis or experimental validation.
Content and Format:
- Standard FASTA format, with the sequence identifier being the contig ID.

📊 Clonotype Analysis Files

🎯 Core Content: Precise identification, frequency statistics, and CDR3 sequence diversity analysis of TCR and BCR clonotypes.

-----------

📄 clonotypes.csv

A statistical analysis file for clonotypes, providing detailed descriptive information for each unique clonotype.

Purpose:
- Clonotype Abundance Analysis: Statistics on the cell count (frequency) and proportion of each clonotype, used to assess the degree of clonal expansion.
- Immune Diversity Assessment: Analysis of clonotype distribution to study the diversity of the immune repertoire.
- CDR3 Sequence Analysis: Provides the precise amino acid and nucleotide sequences of the CDR3 for each clonotype.

Content and Format:

The file is in CSV format.
The fields included in the file are shown in the table below:

Field Name	Detailed Description
`clonotype_id`	A unique identifier for the clonotype assigned to this consensus sequence, used to link and track all related cells of a specific clonal group.
`frequency`	The absolute number of cells observed with this clonotype, reflecting the degree of clonal expansion and the strength of the immune response.
`proportion`	The relative proportion of cells of this clonotype within the total cell population, used to assess clonal dominance and diversity distribution.
`cdr3s_aa`	A semicolon-separated list of chain:sequence pairs, formatted as "chain_name:CDR3_amino_acid_sequence". Chain names include TRA, TRB, TRG, TRD (for T-cell receptors) and IGK, IGL, IGH (for B-cell receptors). The CDR3 amino acid sequence determines antigen-binding specificity and functional activity.
`cdr3s_nt`	A semicolon-separated list of chain:sequence pairs, formatted as "chain_name:CDR3_nucleotide_sequence". Provides the DNA sequence of the CDR3 region, used for somatic hypermutation analysis, clonal evolution tracking, and molecular marker design.

-----------

📄 consensus_annotations.csv

Provides detailed annotation information for each clonotype's consensus sequence.

Purpose:
- Representative Sequence Annotation: Provides a complete V(D)J gene, CDR/FWR region annotation for a representative sequence of each clonotype.
- Clonotype-Level Analysis: Supports sequence feature analysis at the clonotype level.

Content and Format:

The file is in CSV format.
The fields included in the file are shown in the table below:

Field Name	Description
`clonotype_id`	The clonotype ID assigned to this consensus sequence, corresponding to the clonotype identifier in clonotypes.csv.
`consensus_id`	A unique identifier for this consensus sequence, used to link to the sequence in the FASTA file.
`sample`	Sample name of the VDJ library.
`length`	The nucleotide length of the consensus sequence.
`chain`	The chain type associated with this consensus sequence: TRA, TRB, IGK, IGL, or IGH.
`v_gene`	The highest-scoring V gene segment call.
`d_gene`	The highest-scoring D gene segment call (if applicable).
`j_gene`	The highest-scoring J gene segment call.
`c_gene`	The highest-scoring C gene segment call.
`full_length`	A boolean value indicating if this consensus sequence was declared as full-length.
`productive`	A boolean value indicating if this consensus sequence was declared as productive.
`cdr3`	The predicted CDR3 amino acid sequence.
`cdr3_nt`	The predicted CDR3 nucleotide sequence.
`reads`	The total number of reads supporting this consensus sequence.
`umis`	The number of distinct UMIs supporting this consensus sequence.

-----------

📄 consensus.fasta

A FASTA file containing the consensus sequence for each clonotype.

Purpose:
- Representative Sequence Library: Provides a representative sequence for each clonotype, which can be used for functional prediction or comparison with other datasets.
- High-Quality Sequence: The consensus sequence is generated by a clonotype grouping algorithm and is ideally a full-length sequence (from the 5’ UTR start to the C-gene primer binding site).
Content and Format:
- Standard FASTA format, with the sequence identifier being the consensus_id.

📝 Analysis Metrics Summary

🎯 Core Content: A comprehensive evaluation and summary of statistical metrics for V(D)J assembly quality, providing complete data quality control information.

-----------

📄 metrics_summary.xls

A summary table of key analysis metrics in Excel format, providing a comprehensive assessment of the overall experiment quality.

Purpose:
- Quality Assessment: Quickly evaluate core metrics such as sequencing quality, cell identification, gene mapping, and assembly effectiveness.
- Results Overview: Get a comprehensive understanding of the analysis results without having to view all the files.

Content and Format:

Includes five main categories of key metrics:

Metric Category	Content Included
Basic Statistics	Basic sequencing metrics such as total reads, valid barcode ratio, UMI quality, Q30 base quality, etc.
Cell Identification	Cell calling results such as estimated number of cells, fraction of reads in cells, mean reads per cell, etc.
Gene Mapping	V(D)J gene mapping ratio, chain-specific mapping statistics, gene usage analysis.
Assembly Quality	Assembly effectiveness evaluation such as full-length sequence ratio, productive sequence ratio, CDR3 identification success rate, etc.
Clonotype Analysis	Immune repertoire features such as clonotype diversity, pairing success rate, major clonotype frequencies, etc.

Built-in recommended quality control standards for user convenience:
Recommended Quality Thresholds:
- ✅ Valid Barcode Rate: >70%
- ✅ Q30 Base Quality: >75% (for barcodes and UMIs)
- ✅ V(D)J Gene Mapping Rate: >30%
- ✅ Productive Pairing Rate: >20%
- ✅ Mean Reads per Cell: >5,000

-----------

📄 *_scVDJ_TR(IG)_report.html

An interactive comprehensive analysis report in HTML web format.

Purpose:
- Results Visualization: Intuitively displays key results such as QC, rearrangement analysis, and clonotype analysis in the form of interactive charts.
- Results Interpretation: Provides the biological significance and technical explanation of each metric to help users interpret the data in depth.
- Easy Sharing: A single HTML file that is easy to circulate and share.
Content and Format:
- Can be opened in any modern browser without an internet connection.
- For a detailed interpretation of the report, please refer to the Web Report Interpretation section below.

📊 Web Report Interpretation

🎯 Overview: The HTML web report provides a comprehensive visual display and detailed interpretation of single-cell V(D)J sequencing analysis results, including an evaluation of key performance indicators to help users quickly understand the experimental quality and analysis outcomes.

The HTML web report is a comprehensive platform for displaying single-cell VDJ sequencing analysis, integrating complete results from data quality control to downstream immune repertoire analysis. The report uses an interactive visual design to help users quickly assess experimental quality, understand analysis results, and guide future research directions.

💡 Usage Suggestion: It is recommended to review the metrics in the order they are presented in the report.

⚠️ Quality Standards: Each metric is provided with recommended thresholds and quality levels. Please conduct a comprehensive evaluation based on your specific experimental goals.

📊 Main Report Content and Structure

🧬 Detailed Explanation of Core Analysis Metrics

🧬 VDJ Analysis Metrics

🎯 Core Function: Cell identification, quality assessment, and immune receptor assembly statistics, providing key indicators of overall experimental effectiveness.

📊 Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Name	Recommended	Acceptable	Needs Improvement
Mean reads per cell	≥ 10,000	5,000–10,000	< 5,000
Fraction of Reads in Cells	≥ 50%	20–50%	< 20%

🔍 Detailed Metric Explanations:

Metric Name	Detailed Explanation and Technical Requirements
Estimated number of cells Estimated Cell Count	Definition: An estimate of the number of barcodes associated with cells that express the target V(D)J transcripts. Influencing Factors: The number of cells loaded and the proportion of cells expressing V(D)J transcripts. Quality Interpretation: Abnormal Causes: Inaccurate cell counting, poor T/B cell enrichment, poor sample or library quality, low sequencing depth.
Mean reads per cell Mean Reads per Cell	Definition: The ratio of the total number of input sequencing read pairs to the estimated number of valid cells. Technical Requirements: Minimum sequencing depth: 5,000 read pairs per cell (for paired-end sequencing). For single-end sequencing, it is recommended to double the depth to 10,000 reads per cell. Quality Interpretation: Insufficient sequencing depth can lead to reduced accuracy in V(D)J cell identification and lower assembly quality.
Fraction of Reads in Cells Fraction of Reads in Cells	Definition: The ratio of the number of reads with cell-associated barcodes to the total number of reads with valid barcodes. Quality Interpretation: High-Quality Sample Trait: A high ratio indicates good cell capture efficiency and effective control of background noise. Indicator of Quality Issues: A low ratio may indicate problems with the biological sample, improper cell concentration, issues with library construction quality control, or technical errors.
Median TRA/TRB or IGH/IGK/IGL UMIs per cell Median UMIs per Cell for Specific Chains	Definition: The median number of UMI molecules assigned to transcripts of a specific immune receptor chain (e.g., IGH, TRA, TRB, IGK, IGL). Biological Significance: This metric directly reflects the TCR/BCR expression level and transcriptional activity of each cell.
Number of cells with TRA/TRB or IGH/IGK/IGL contig Cells with TRA/TRB or IGH/IGK/IGL Contigs	Definition: Cells in which at least one T-cell receptor (TRA/TRB) or B-cell receptor (IGH/IGK/IGL) gene rearrangement was detected via single-cell sequencing. Note: This includes both complete and incomplete VDJ rearrangement events. It only requires the presence of a contig for the relevant gene and does not require it to be functional. It may include fragmented contigs that do not span the V-J region or non-productive rearrangements.
Cells with V-J spanning TRA/TRB or IGH/IGK/IGL contig Cells with V-J Spanning TRA/TRB or IGH/IGK/IGL Contigs	Definition: Requires the contig to span the recombination junction between the V and J genes. This is stricter than the first category but still includes cells with non-productive rearrangements. Note: Excludes invalid contigs where V-J recombination is incomplete.
Cells with productive TRA/TRB or IGH/IGK/IGL contig Cells with Functional TRA/TRB or IGH/IGK/IGL Contigs	Definition: Must simultaneously meet strict criteria: V-J spanning (for TRA/IGK/IGL) or V-D-J spanning (for TRB/IGH), `productive` is true (no frameshift mutations and a complete CDR3), and is in-frame.
Paired clonotype diversity Paired Clonotype Diversity	Definition: The effective diversity of paired clonotypes, calculated as the inverse Simpson's index of the clonotype frequencies. A value of 1 indicates a sample with minimal diversity—only one distinct clonotype was detected. A value equal to the estimated number of cells indicates a sample with maximum diversity. Quality Interpretation: This is a sample-type-dependent metric. Clonotype diversity reflects the complexity and functional state of the immune system. A lower-than-expected value may be due to a low proportion of B or T cells in the sample, poor sample quality, poor library quality, or low sequencing depth.

-----------

🔬 Sequencing Metrics

🎯 Core Function: Basic quality assessment of sequencing data, including barcode recognition rate, alignment quality, and sequencing accuracy.

📊 Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Name	Recommended	Acceptable	Needs Improvement
Valid barcodes	≥ 80%	70–80%	< 70%
Valid UMIs	≥ 80%	70–80%	< 70%
Q30 Base Quality	≥ 85%	75–85%	< 75%

🔍 Detailed Metric Explanations:

Metric Name	Detailed Explanation and Technical Requirements
Valid barcodes Valid Barcode Rate	Definition: The proportion of all reads whose Cell Barcode can be matched to the predefined whitelist (with error correction). Biological Significance: Reflects the effectiveness of cell labeling. Quality Interpretation: A low rate usually suggests sample quality issues leading to barcode degradation and adapter contamination, or a high error rate during the sequencing process.
Valid UMIs Valid UMI Rate	Definition: The proportion of all reads whose Unique Molecular Identifier (UMI) sequence does not contain 'N' bases and is not a homopolymer (e.g., AAAAAA). Biological Significance: Reflects the sequencing quality of the UMI sequence, which is key to accurate molecular counting.
Q30 bases Quality Q30 Base Rate	Definition: The proportion of bases with a sequencing quality score of Q30 or higher in the cell barcode, UMI, and RNA read sequences. Significance: Q30 represents a base sequencing error rate of less than 0.1%. This metric directly affects the accuracy of cell identity, molecular counting, and gene alignment.

-----------

🧬 Enrichment Metrics

🎯 Core Function: Evaluation of V(D)J gene enrichment efficiency, reflecting the capture effectiveness of immune receptor sequences.

📊 Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Category	Recommended	Acceptable	Needs Improvement
Reads mapped to any V(D)J gene	≥ 50%	30–50%	< 30%

🔍 Detailed Metric Explanations:

Metric Name	Detailed Explanation and Technical Requirements
Reads mapped to any V(D)J gene Fraction of Reads Mapped to Any V(D)J Gene	Definition: The fraction of reads with valid barcodes that map partially or fully to any germline V(D)J gene segment. Quality Interpretation: Quality Warning Threshold (<30%): May be caused by a low proportion of B or T cells in the sample, poor sample quality, inefficient library enrichment, or a mismatched reference genome.
Reads mapped to TRA/TRB/IGH/IGK/IGL Fraction of Reads Mapped to Specific TRA/TRB/IGH/IGK/IGL Chains	Type Definition: TRA vs TRB: TRA (α chain) expression is typically lower than TRB (β chain), reflecting the normal expression pattern of T-cell receptors. IGH vs IGK/IGL: Heavy and light chains show paired expression characteristics, and their mapping ratios reflect the relative expression abundance of each immune receptor chain. Calculation Basis Note: The above enrichment metrics are all calculated with the total number of valid barcode reads as the denominator.

Metric Name

Detailed Explanation and Technical Requirements

Reads mapped to any V(D)J gene
Fraction of Reads Mapped to Any V(D)J Gene

Definition: The fraction of reads with valid barcodes that map partially or fully to any germline V(D)J gene segment.
Quality Interpretation:
- Quality Warning Threshold (<30%): May be caused by a low proportion of B or T cells in the sample, poor sample quality, inefficient library enrichment, or a mismatched reference genome.

Reads mapped to TRA/TRB/IGH/IGK/IGL
Fraction of Reads Mapped to Specific TRA/TRB/IGH/IGK/IGL Chains

Type Definition:

TRA vs TRB: TRA (α chain) expression is typically lower than TRB (β chain), reflecting the normal expression pattern of T-cell receptors.
IGH vs IGK/IGL: Heavy and light chains show paired expression characteristics, and their mapping ratios reflect the relative expression abundance of each immune receptor chain.

Calculation Basis Note: The above enrichment metrics are all calculated with the total number of valid barcode reads as the denominator.

-----------

🧬 V(D)J Annotation Analysis (V(D)J Annotation)

🎯 Core Function: Analysis of productive rearrangement pairing to assess the functional expression level of immune receptors.

📊 Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Name	Recommended	Acceptable	Needs Improvement
Cells with productive V-J spanning pair	≥ 40%	20–40%	< 20%

🔍 Detailed Metric Explanations:

Metric Name	Detailed Explanation and Technical Requirements
Number of Cells with Productive V-J Spanning Pair Absolute Number of Cells with Productive V-J Spanning Pairs	Definition: The total number of cells with at least one productive contig for a TRA/TRB pair or an immunoglobulin heavy/light chain pair.
Cells with productive V-J spanning pair Fraction of Cells with Productive V-J Spanning Pairs	Definition: The fraction of cell-associated barcodes that have at least one complete receptor pair (with a productive contig for each chain). Criteria for a Productive Contig: Spanning Integrity: The contig annotation completely spans from the 5' end of the V region to the 3' end of the corresponding chain's J region. Start Codon: A valid start codon (ATG) is successfully identified at the expected position in the V sequence. CDR3 Integrity: A complete, in-frame CDR3 amino acid motif is found. Correct Reading Frame: No premature stop codons are present in the aligned V-J region (no frameshift mutations).
Cells with productive V-J spanning (IGK, IGH) pair Fraction of Cells with Productive IGK/IGH Pairs	Definition: The fraction of cell-associated barcodes with an (IGK, IGH) immunoglobulin receptor pair where each chain has at least one productive contig. Note: A specific metric for B-cell datasets. Depends on the proportion of B-cell subpopulations expressing the κ light chain (IGK) in the sample. The usage ratio of κ/λ light chains varies by species and individual.
Cells with productive V-J spanning (IGL, IGH) pair Fraction of Cells with Productive IGL/IGH Pairs	Definition: The fraction of cell-associated barcodes with an (IGL, IGH) immunoglobulin receptor pair where each chain has at least one productive contig. Note: A specific metric for B-cell datasets. Depends on the proportion of B-cell subpopulations expressing the λ light chain (IGL) in the sample. Complements the IGK pairing to collectively reflect the B-cell light chain usage pattern.
Cells with productive V-J spanning (TRA, TRB) pair Fraction of Cells with Productive TRA/TRB Pairs	Definition: The fraction of cell-associated barcodes with a (TRA, TRB) T-cell receptor pair where each chain has at least one productive contig. Note: A core metric for T-cell datasets. Reflects the successful pairing of TCR α and β chains. Indicates the functional receptor expression status of αβ T-cells.

📈 Visualization Chart 1

🎯 Core Function: A multi-dimensional visual display for V(D)J cell quality control, UMI analysis, and immune receptor expression assessment.

📊 V(D)J Barcode Rank Plot

Chart Function: Visualizes the UMI count distribution for each cell (counting only UMIs from productive contigs), providing an intuitive view of cell quality control results and background noise levels.

How to Interpret:

Axes:
- X-axis (Barcode Rank): All cell barcodes are ranked in descending order of total UMI counts (log scale).
- Y-axis (UMI Counts): The total UMI count for each cell (log scale).
Visual Encoding:
- 🔵 Blue Line: Identified valid cells.
- ⚫ Gray Line: Background noise cells.
- 🔷 Blue Gradient Area: The mixed transition zone between cells and background noise.
Quality Assessment:
- An ideal sample should show good separation between cell-associated barcodes and the background, indicated by a steep drop in the curve.
- VDJ-B data may show a group of cells with high UMI counts, which are typically high-expressing plasma cells.

-----------

📈 Visualization Chart 2

🎯 Core Function: A visual display for clonotype abundance analysis and immune receptor diversity assessment.

📊 Clonotype Abundance Analysis

Chart Function: Shows the relative abundance distribution of clonotypes in the sample and the degree of concentration of the immune response.

How to Interpret:

Top Chart (Top 10 Clonotypes): A bar chart showing the percentage of cells occupied by the 10 most abundant clonotypes in the sample. It intuitively reflects the relative abundance distribution of clonotypes and the concentration of the immune response.
Bottom Table (Detailed Information): Provides complete descriptive information for the top 10 most abundant clonotypes, including their clonotype ID, CDR3 amino acid/nucleotide sequences, absolute frequency, and relative proportion.

-----------

🎯 More Resources

Document Type	Resource Link and Description
🚀 Quick Start	Quick Start Guide - A complete tutorial for your first analysis.
⚙️ Parameter Reference	Parameter Reference Manual - Detailed descriptions of all configurable parameters.
🔬 Analysis Pipeline	Analysis Pipeline Description - Technical details of the entire analysis workflow.
🔧 Installation & Setup	Installation and Setup Guide - System requirements, installation steps, and environment configuration.

💡 Tip

This document is continuously updated. If you find any errors or need additional information, please feel free to provide feedback.

📝 Document Version: 3.0 beta | Last Updated: 2025

-----------

🔬 DNBelab C Series HT scVDJ Analysis Software
High-Performance Single-Cell V(D)J Sequencing Data Analysis Pipeline

🧬 DNBelab C Series HT scVDJ Analysis Output Documentation

📖 Overview

📁 Output Directory Structure

📋 Detailed File Description

🧬 VDJ Assembly and Annotation Files

🧵 V(D)J Transcript Structure and Composition

🔍 Explanation of Important Annotation Standards

📋 Full-Length Sequence Determination Criteria (Full Length)

🧬 Productive Sequence Determination Criteria (Productive)

🎯 High-Confidence Sequence Determination (High Confidence)

📄 airr_annotations.tsv

📄 all_contig_annotations.csv

📄 all_contig.fasta

📄 filtered_contig_annotations.csv

📄 filtered_contig.fasta

📊 Clonotype Analysis Files

📄 clonotypes.csv

📄 consensus_annotations.csv

📄 consensus.fasta

📝 Analysis Metrics Summary

📄 metrics_summary.xls

📄 *_scVDJ_TR(IG)_report.html

📊 Web Report Interpretation

📊 Main Report Content and Structure

🧬 Detailed Explanation of Core Analysis Metrics

🧬 VDJ Analysis Metrics

🔬 Sequencing Metrics

🧬 Enrichment Metrics

🧬 V(D)J Annotation Analysis (V(D)J Annotation)

📈 Visualization Chart 1

📊 V(D)J Barcode Rank Plot

📈 Visualization Chart 2

📊 Clonotype Abundance Analysis

🎯 More Resources

📚 Related Documentation