🏠 Home | 🌐 δΈ­ζ–‡

🧬 DNBelab C Series HT scVDJ Analysis Output Documentation

A Complete Guide to Single-Cell V(D)J Sequencing Analysis Output Files

πŸ“ Directory Structure β€’ πŸ“‹ File Details β€’ πŸ“Š Analysis Metrics β€’ πŸ“Š Report Interpretation


πŸ“– Overview

After the single-cell VDJ analysis is complete, a standardized set of files and subdirectories is generated in the specified output directory, specifically for immune receptor repertoire analysis. This document details the content, format, and purpose of each output file to help users fully understand and efficiently utilize the V(D)J analysis results.

πŸ’‘ Tip: VDJ analysis requires 5' end RNA sequencing data, and all output files adhere to the AIRR standard and are compatible with mainstream immunoinformatics tools.

⚠️ Prerequisite: 5' end single-cell RNA sequencing analysis must be completed first.


πŸ“ Output Directory Structure

.
β”œβ”€β”€ airr_annotations.tsv                    # Annotation file in AIRR standard format
β”œβ”€β”€ all_contig_annotations.csv              # Annotation information for all assembled sequences
β”œβ”€β”€ all_contig.fasta                        # FASTA file of all assembled sequences
β”œβ”€β”€ all_contig.fasta.fai                    # Index file for all assembled sequences
β”œβ”€β”€ clonotypes.csv                          # Clonotype analysis results
β”œβ”€β”€ consensus_annotations.csv               # Annotation information for consensus sequences
β”œβ”€β”€ consensus.fasta                         # FASTA file of consensus sequences
β”œβ”€β”€ consensus.fasta.fai                     # Index file for consensus sequences
β”œβ”€β”€ filtered_contig_annotations.csv         # Annotation information for filtered assembled sequences
β”œβ”€β”€ filtered_contig.fasta                   # FASTA file of filtered assembled sequences
β”œβ”€β”€ filtered_contig.fasta.fai               # Index file for filtered assembled sequences
β”œβ”€β”€ metrics_summary.xls                     # Summary of analysis quality metrics
└── *_scVDJ_TR(IG)_report.html              # Analysis report in HTML format

πŸ“‹ Detailed File Description

🧬 VDJ Assembly and Annotation Files

🎯 Core Content: Results of V(D)J contig sequence assembly, precise annotation, and quality assessment, covering the complete information of TCR and BCR rearranged sequences.

-----------

🧡 V(D)J Transcript Structure and Composition

Diagram of a Typical V(D)J Transcript Structure:

V(D)J Transcript Structure Diagram

πŸ” Explanation of Important Terms:

Region Abbreviation Biological Function
Untranslated Region UTR Regulates mRNA stability and translation efficiency; does not encode protein.
Framework Region FWR Maintains the conserved structural framework of the immunoglobulin fold.
Complementarity Determining Region CDR The key variable region that directly contacts the antigen and determines binding specificity.

🧬 Technical Advantage: The V(D)J analysis pipeline can accurately identify and provide the amino acid and nucleotide sequences of the framework (FWR) and complementarity determining (CDR) regions. All V(D)J annotation information for assembled contigs and clonotype consensus sequences is output in various standard formats.

-----------

πŸ” Explanation of Important Annotation Standards

πŸ“‹ Full-Length Sequence Determination Criteria (Full Length)

A contig sequence is identified as a full-length sequence if it meets the following strict conditions simultaneously:

🧬 Productive Sequence Determination Criteria (Productive)

A contig sequence is identified as a productive sequence (i.e., functionally active) if it meets all of the following conditions simultaneously:

🎯 High-Confidence Sequence Determination (High Confidence)

πŸ”¬ Expected Receptor Configurations for Different Cell Types:

Cell Type Standard Receptor Configuration Biological Significance
T Cell 1 productive TRA chain + 1 productive TRB chain Normal TCR Ξ±/Ξ² heterodimer
B Cell 1 productive heavy chain + 1 productive light chain (ΞΊ or Ξ») Normal BCR heavy/light chain pairing

πŸ€” Principles for Marking Low-Confidence Sequences:

⚠️ Important Note: The presence of extra productive contigs beyond the normal configuration is typically an anomaly and may arise from:

Anomaly Type Cause Analysis
Ambient Contamination Non-specific capture of free-floating mRNA, possibly from external sources or nucleic acids released by apoptotic cells.
Doublet Events Droplets containing multiple cells (doublets), making it impossible to distinguish receptor signals from different cells.
Technical Artifacts Artificial sequences generated during PCR amplification or sequencing, including chimeric sequences or incorrect primer binding.

πŸ“‰ Basis for Determining Low-Confidence Sequences:

-----------

πŸ“„ airr_annotations.tsv

Contains annotated and consensus sequences of V(D)J rearrangements in the AIRR standard format.

-----------

πŸ“„ all_contig_annotations.csv

Contains detailed annotation information for all contig sequences (from both cellular and background barcodes).

-----------

πŸ“„ all_contig.fasta

Contains the nucleotide sequences of all assembled contigs.

-----------

πŸ“„ filtered_contig_annotations.csv

A high-quality subset of all_contig_annotations.csv, containing only the annotation results for high-confidence contigs derived from cells.

-----------

πŸ“„ filtered_contig.fasta

A high-quality subset of all_contig.fasta, containing only high-quality contig sequences that have passed quality filtering and cell calling.


πŸ“Š Clonotype Analysis Files

🎯 Core Content: Precise identification, frequency statistics, and CDR3 sequence diversity analysis of TCR and BCR clonotypes.

-----------

πŸ“„ clonotypes.csv

A statistical analysis file for clonotypes, providing detailed descriptive information for each unique clonotype.

-----------

πŸ“„ consensus_annotations.csv

Provides detailed annotation information for each clonotype's consensus sequence.

-----------

πŸ“„ consensus.fasta

A FASTA file containing the consensus sequence for each clonotype.


πŸ“ Analysis Metrics Summary

🎯 Core Content: A comprehensive evaluation and summary of statistical metrics for V(D)J assembly quality, providing complete data quality control information.

-----------

πŸ“„ metrics_summary.xls

A summary table of key analysis metrics in Excel format, providing a comprehensive assessment of the overall experiment quality.

-----------

πŸ“„ *_scVDJ_TR(IG)_report.html

An interactive comprehensive analysis report in HTML web format.


πŸ“Š Web Report Interpretation

🎯 Overview: The HTML web report provides a comprehensive visual display and detailed interpretation of single-cell V(D)J sequencing analysis results, including an evaluation of key performance indicators to help users quickly understand the experimental quality and analysis outcomes.

The HTML web report is a comprehensive platform for displaying single-cell VDJ sequencing analysis, integrating complete results from data quality control to downstream immune repertoire analysis. The report uses an interactive visual design to help users quickly assess experimental quality, understand analysis results, and guide future research directions.

πŸ’‘ Usage Suggestion: It is recommended to review the metrics in the order they are presented in the report.

⚠️ Quality Standards: Each metric is provided with recommended thresholds and quality levels. Please conduct a comprehensive evaluation based on your specific experimental goals.

πŸ“Š Main Report Content and Structure

scVDJ Web Report

🧬 Detailed Explanation of Core Analysis Metrics

🧬 VDJ Analysis Metrics

🎯 Core Function: Cell identification, quality assessment, and immune receptor assembly statistics, providing key indicators of overall experimental effectiveness.

πŸ“Š Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Name Recommended Acceptable Needs Improvement
Mean reads per cell β‰₯ 10,000 5,000–10,000 < 5,000
Fraction of Reads in Cells β‰₯ 50% 20–50% < 20%

πŸ” Detailed Metric Explanations:

Metric Name Detailed Explanation and Technical Requirements
Estimated number of cells
Estimated Cell Count
  • Definition: An estimate of the number of barcodes associated with cells that express the target V(D)J transcripts.
  • Influencing Factors: The number of cells loaded and the proportion of cells expressing V(D)J transcripts.
  • Quality Interpretation:
    • Abnormal Causes: Inaccurate cell counting, poor T/B cell enrichment, poor sample or library quality, low sequencing depth.
Mean reads per cell
Mean Reads per Cell
  • Definition: The ratio of the total number of input sequencing read pairs to the estimated number of valid cells.
  • Technical Requirements:
    • Minimum sequencing depth: 5,000 read pairs per cell (for paired-end sequencing).
    • For single-end sequencing, it is recommended to double the depth to 10,000 reads per cell.
  • Quality Interpretation: Insufficient sequencing depth can lead to reduced accuracy in V(D)J cell identification and lower assembly quality.
Fraction of Reads in Cells
Fraction of Reads in Cells
  • Definition: The ratio of the number of reads with cell-associated barcodes to the total number of reads with valid barcodes.
  • Quality Interpretation:
    • High-Quality Sample Trait: A high ratio indicates good cell capture efficiency and effective control of background noise.
    • Indicator of Quality Issues: A low ratio may indicate problems with the biological sample, improper cell concentration, issues with library construction quality control, or technical errors.
Median TRA/TRB or IGH/IGK/IGL UMIs per cell
Median UMIs per Cell for Specific Chains
  • Definition: The median number of UMI molecules assigned to transcripts of a specific immune receptor chain (e.g., IGH, TRA, TRB, IGK, IGL).
  • Biological Significance: This metric directly reflects the TCR/BCR expression level and transcriptional activity of each cell.
Number of cells with TRA/TRB or IGH/IGK/IGL contig
Cells with TRA/TRB or IGH/IGK/IGL Contigs
  • Definition: Cells in which at least one T-cell receptor (TRA/TRB) or B-cell receptor (IGH/IGK/IGL) gene rearrangement was detected via single-cell sequencing.
  • Note: This includes both complete and incomplete VDJ rearrangement events. It only requires the presence of a contig for the relevant gene and does not require it to be functional. It may include fragmented contigs that do not span the V-J region or non-productive rearrangements.
Cells with V-J spanning TRA/TRB or IGH/IGK/IGL contig
Cells with V-J Spanning TRA/TRB or IGH/IGK/IGL Contigs
  • Definition: Requires the contig to span the recombination junction between the V and J genes. This is stricter than the first category but still includes cells with non-productive rearrangements.
  • Note: Excludes invalid contigs where V-J recombination is incomplete.
Cells with productive TRA/TRB or IGH/IGK/IGL contig
Cells with Functional TRA/TRB or IGH/IGK/IGL Contigs
  • Definition: Must simultaneously meet strict criteria: V-J spanning (for TRA/IGK/IGL) or V-D-J spanning (for TRB/IGH), `productive` is true (no frameshift mutations and a complete CDR3), and is in-frame.
Paired clonotype diversity
Paired Clonotype Diversity
  • Definition: The effective diversity of paired clonotypes, calculated as the inverse Simpson's index of the clonotype frequencies. A value of 1 indicates a sample with minimal diversityβ€”only one distinct clonotype was detected. A value equal to the estimated number of cells indicates a sample with maximum diversity.
  • Quality Interpretation:
    • This is a sample-type-dependent metric. Clonotype diversity reflects the complexity and functional state of the immune system.
    • A lower-than-expected value may be due to a low proportion of B or T cells in the sample, poor sample quality, poor library quality, or low sequencing depth.
-----------

πŸ”¬ Sequencing Metrics

🎯 Core Function: Basic quality assessment of sequencing data, including barcode recognition rate, alignment quality, and sequencing accuracy.

πŸ“Š Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Name Recommended Acceptable Needs Improvement
Valid barcodes β‰₯ 80% 70–80% < 70%
Valid UMIs β‰₯ 80% 70–80% < 70%
Q30 Base Quality β‰₯ 85% 75–85% < 75%

πŸ” Detailed Metric Explanations:

Metric Name Detailed Explanation and Technical Requirements
Valid barcodes
Valid Barcode Rate
  • Definition: The proportion of all reads whose Cell Barcode can be matched to the predefined whitelist (with error correction).
  • Biological Significance: Reflects the effectiveness of cell labeling.
  • Quality Interpretation: A low rate usually suggests sample quality issues leading to barcode degradation and adapter contamination, or a high error rate during the sequencing process.
Valid UMIs
Valid UMI Rate
  • Definition: The proportion of all reads whose Unique Molecular Identifier (UMI) sequence does not contain 'N' bases and is not a homopolymer (e.g., AAAAAA).
  • Biological Significance: Reflects the sequencing quality of the UMI sequence, which is key to accurate molecular counting.
Q30 bases Quality
Q30 Base Rate
  • Definition: The proportion of bases with a sequencing quality score of Q30 or higher in the cell barcode, UMI, and RNA read sequences.
  • Significance: Q30 represents a base sequencing error rate of less than 0.1%. This metric directly affects the accuracy of cell identity, molecular counting, and gene alignment.
-----------

🧬 Enrichment Metrics

🎯 Core Function: Evaluation of V(D)J gene enrichment efficiency, reflecting the capture effectiveness of immune receptor sequences.

πŸ“Š Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Category Recommended Acceptable Needs Improvement
Reads mapped to any V(D)J gene β‰₯ 50% 30–50% < 30%

πŸ” Detailed Metric Explanations:

Metric Name Detailed Explanation and Technical Requirements
Reads mapped to any V(D)J gene
Fraction of Reads Mapped to Any V(D)J Gene
  • Definition: The fraction of reads with valid barcodes that map partially or fully to any germline V(D)J gene segment.
  • Quality Interpretation:
    • Quality Warning Threshold (<30%): May be caused by a low proportion of B or T cells in the sample, poor sample quality, inefficient library enrichment, or a mismatched reference genome.
Reads mapped to TRA/TRB/IGH/IGK/IGL
Fraction of Reads Mapped to Specific TRA/TRB/IGH/IGK/IGL Chains
  • Type Definition:
    • TRA vs TRB: TRA (Ξ± chain) expression is typically lower than TRB (Ξ² chain), reflecting the normal expression pattern of T-cell receptors.
    • IGH vs IGK/IGL: Heavy and light chains show paired expression characteristics, and their mapping ratios reflect the relative expression abundance of each immune receptor chain.
  • Calculation Basis Note: The above enrichment metrics are all calculated with the total number of valid barcode reads as the denominator.
-----------

🧬 V(D)J Annotation Analysis (V(D)J Annotation)

🎯 Core Function: Analysis of productive rearrangement pairing to assess the functional expression level of immune receptors.

πŸ“Š Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider multiple factors such as tissue type, cell state, and experimental objectives. Significant differences may exist between samples, so judgment should be based on the specific experimental context.

Metric Name Recommended Acceptable Needs Improvement
Cells with productive V-J spanning pair β‰₯ 40% 20–40% < 20%

πŸ” Detailed Metric Explanations:

Metric Name Detailed Explanation and Technical Requirements
Number of Cells with Productive V-J Spanning Pair
Absolute Number of Cells with Productive V-J Spanning Pairs
  • Definition: The total number of cells with at least one productive contig for a TRA/TRB pair or an immunoglobulin heavy/light chain pair.
Cells with productive V-J spanning pair
Fraction of Cells with Productive V-J Spanning Pairs
  • Definition: The fraction of cell-associated barcodes that have at least one complete receptor pair (with a productive contig for each chain).
  • Criteria for a Productive Contig:
    • Spanning Integrity: The contig annotation completely spans from the 5' end of the V region to the 3' end of the corresponding chain's J region.
    • Start Codon: A valid start codon (ATG) is successfully identified at the expected position in the V sequence.
    • CDR3 Integrity: A complete, in-frame CDR3 amino acid motif is found.
    • Correct Reading Frame: No premature stop codons are present in the aligned V-J region (no frameshift mutations).
Cells with productive V-J spanning (IGK, IGH) pair
Fraction of Cells with Productive IGK/IGH Pairs
  • Definition: The fraction of cell-associated barcodes with an (IGK, IGH) immunoglobulin receptor pair where each chain has at least one productive contig.
  • Note:
    • A specific metric for B-cell datasets.
    • Depends on the proportion of B-cell subpopulations expressing the ΞΊ light chain (IGK) in the sample.
    • The usage ratio of ΞΊ/Ξ» light chains varies by species and individual.
Cells with productive V-J spanning (IGL, IGH) pair
Fraction of Cells with Productive IGL/IGH Pairs
  • Definition: The fraction of cell-associated barcodes with an (IGL, IGH) immunoglobulin receptor pair where each chain has at least one productive contig.
  • Note:
    • A specific metric for B-cell datasets.
    • Depends on the proportion of B-cell subpopulations expressing the Ξ» light chain (IGL) in the sample.
    • Complements the IGK pairing to collectively reflect the B-cell light chain usage pattern.
Cells with productive V-J spanning (TRA, TRB) pair
Fraction of Cells with Productive TRA/TRB Pairs
  • Definition: The fraction of cell-associated barcodes with a (TRA, TRB) T-cell receptor pair where each chain has at least one productive contig.
  • Note:
    • A core metric for T-cell datasets.
    • Reflects the successful pairing of TCR Ξ± and Ξ² chains.
    • Indicates the functional receptor expression status of Ξ±Ξ² T-cells.

πŸ“ˆ Visualization Chart 1

🎯 Core Function: A multi-dimensional visual display for V(D)J cell quality control, UMI analysis, and immune receptor expression assessment.

πŸ“Š V(D)J Barcode Rank Plot

Chart Function: Visualizes the UMI count distribution for each cell (counting only UMIs from productive contigs), providing an intuitive view of cell quality control results and background noise levels.

V(D)J Barcode Rank Plot

How to Interpret:

-----------

πŸ“ˆ Visualization Chart 2

🎯 Core Function: A visual display for clonotype abundance analysis and immune receptor diversity assessment.

πŸ“Š Clonotype Abundance Analysis

Chart Function: Shows the relative abundance distribution of clonotypes in the sample and the degree of concentration of the immune response.

scVDJ Clonotype Analysis Chart

How to Interpret:

-----------

🎯 More Resources

Document Type Resource Link and Description
πŸš€ Quick Start Quick Start Guide - A complete tutorial for your first analysis.
βš™οΈ Parameter Reference Parameter Reference Manual - Detailed descriptions of all configurable parameters.
πŸ”¬ Analysis Pipeline Analysis Pipeline Description - Technical details of the entire analysis workflow.
πŸ”§ Installation & Setup Installation and Setup Guide - System requirements, installation steps, and environment configuration.

πŸ’‘ Tip

This document is continuously updated. If you find any errors or need additional information, please feel free to provide feedback.

πŸ“ Document Version: 3.0 beta | Last Updated: 2025

-----------

πŸ”¬ DNBelab C Series HT scVDJ Analysis Software
High-Performance Single-Cell V(D)J Sequencing Data Analysis Pipeline