🏠 Home | 🌐 Chinese

🧬 DNBelab C Series HT scATAC Analysis Output Documentation

A Complete Guide to Single-Cell ATAC Sequencing Analysis Output Files

πŸ“ Directory Structure β€’ πŸ“‹ File Details β€’ 🧬 Data Matrix β€’ πŸ“Š Analysis Summary β€’ πŸ“Š Report Interpretation


πŸ“– Overview

After the single-cell ATAC sequencing analysis is complete, a standardized structure of files and subdirectories is generated in the specified output directory, specifically for chromatin accessibility analysis and epigenomic research. This document details the content, format, and purpose of each output file to help users fully understand and efficiently utilize the scATAC analysis results.

πŸ’‘ Tip: All output files use standard formats compatible with mainstream single-cell epigenomic analysis tools (e.g., Signac, ArchR), adhering to internationally accepted data format standards.


πŸ“ Directory Structure

.
β”œβ”€β”€ alignment.fragments.sorted.tagged.bam       # QC-filtered alignment results (requires 'need_bam' parameter for analysis)
β”œβ”€β”€ alignment.fragments.sorted.tagged.bam.bai   # Index file for the alignment results
β”œβ”€β”€ filter_peak_matrix/                         # Directory for the filtered peak matrix in MEX format
β”‚   β”œβ”€β”€ barcodes.tsv.gz                         # Barcode information for filtered cells
β”‚   β”œβ”€β”€ matrix.mtx.gz                           # Peak signal data in sparse matrix format for filtered data
β”‚   └── peaks.bed.gz                            # Peak position information for filtered data
β”œβ”€β”€ fragments.tsv.gz                            # Contains all fragments aligned to the genome
β”œβ”€β”€ fragments.tsv.gz.tbi                        # Index file for fragments, for fast random access
β”œβ”€β”€ filtered.fragments.tsv.gz                   # QC-filtered ATAC fragment file, containing only fragments from filtered cells
β”œβ”€β”€ filtered.fragments.tsv.gz.tbi               # Tabix index for the filtered fragments file, for fast querying of genomic intervals
β”œβ”€β”€ metrics_summary.xls                         # Summary table of analysis quality metrics
β”œβ”€β”€ raw_peak_matrix/                            # Directory for the raw peak matrix in MEX format
β”‚   β”œβ”€β”€ barcodes.tsv.gz                         # Raw cell barcode information
β”‚   β”œβ”€β”€ matrix.mtx.gz                           # Raw peak signal data in sparse matrix format
β”‚   └── peaks.bed.gz                            # Raw peak position information
β”œβ”€β”€ singlecell.csv                              # Summary table of cell information
└── *_scATAC_report.html                        # Analysis report in HTML format

πŸ“‹ File Details

🧬 ATAC Fragment and Peak Files

🎯 Core Content: ATAC-seq fragment information and peak identification results, containing complete chromatin accessibility data and cell barcode tags.

-----------

πŸ“„ fragments.tsv.gz

fragments.tsv.gz is a compressed TSV file containing ATAC-seq fragment information, which is one of the core data for downstream analysis. Its main features and contents are as follows:

-----------

πŸ“„ fragments.tsv.gz.tbi

The Tabix index file for fragments.tsv.gz.

-----------

πŸ“„ filtered.fragments.tsv.gz

This is the ATAC-seq fragment file after cell quality control and filtering. It is a subset of fragments.tsv.gz, containing only fragments from high-quality cells.

-----------

πŸ“„ filtered.fragments.tsv.gz.tbi

The Tabix index file for filtered.fragments.tsv.gz.

-----------

πŸ“„ alignment.fragments.sorted.tagged.bam

This is the ATAC-seq alignment result file containing all fragments that have a valid barcode and were successfully aligned.

-----------

πŸ“„ alignment.fragments.sorted.tagged.bam.bai

The index file for alignment.fragments.sorted.tagged.bam.


πŸ“ˆ Peak Matrix

🎯 Core Content: The single-cell peak signal count matrix, divided into raw and quality-controlled filtered data, using the standard sparse matrix format.

πŸ“ Filtered Peak Matrix (filter_peak_matrix/)

Contains the peak count matrix after high-quality cell filtering, serving as the core data for downstream quantitative analysis.

-----------

πŸ“ Raw Peak Matrix (raw_peak_matrix/)

Contains the raw peak count matrix for all detected cell barcodes (without filtering).


πŸ“ Analysis Summary

🎯 Core Content: A summary of experimental quality assessment and statistical metrics, providing complete data quality control information.

πŸ“„ metrics_summary.xls

An Excel-formatted summary table of key analysis metrics, providing a comprehensive assessment of the overall experiment quality.

-----------

πŸ“„ singlecell.csv

A CSV-formatted table of cell-level quality control information, recording detailed statistics for each cell barcode.

-----------

πŸ“„ *_scATAC_report.html

An interactive, comprehensive analysis report in HTML web format.


πŸ“„ File Format Description

Technical Specification: A detailed description of the standard formats used for the output files.

πŸ“Š Market Matrix Format (.mtx.gz)

Market Exchange Format (MEX) is a standard format for storing sparse count matrices in single-cell analysis, known for its space efficiency and high compatibility.


πŸ“Š Web Report Interpretation

🎯 Overview: The HTML web report provides a comprehensive visualization and detailed interpretation of the single-cell ATAC sequencing analysis results, including an evaluation of key performance indicators to help users quickly understand the experiment's quality and results.

HTML web report is a comprehensive display platform for single-cell ATAC sequencing analysis, integrating complete results from data quality control to downstream epigenomic analysis. The report uses interactive visualization design to help users quickly evaluate experiment quality, understand analysis results and guide subsequent research directions.

πŸ’‘ Usage Suggestion: It is recommended to review the metrics in the order they are presented in the report.

⚠️ Quality Standards: Recommended thresholds and quality levels are provided for each metric. Please conduct a comprehensive evaluation based on specific experimental goals.

πŸ“Š Main Report Content and Structure

scATAC Web Report

🧬 Core Analysis Metrics Explained

🧬 Cell Metrics

🎯 Core Function: Cell identification, quality assessment, and chromatin accessibility statistics, providing key indicators for the overall effectiveness of the experiment.

πŸ“Š Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider factors such as organism type, cell state, and experimental goals. Differences between samples may exist. It is recommended to combine specific experimental background for judgment.

Metric Name Recommended Acceptable Needs Improvement
Median fragments per cell β‰₯ 10,000 2,000–10,000 < 2,000
TSS enrichment score β‰₯ 6 4–6 < 4
Median fraction of fragments overlapping peaks β‰₯ 30% 15–30% < 15%
Median fraction of fragments overlapping TSS β‰₯ 20% 10–20% < 10%
Fraction fragments in cells β‰₯ 50% 20–50% < 20%

πŸ” Detailed Metric Explanations:

Metric Name Detailed Explanation & Technical Requirements
Estimated number of cells
  • Definition: The total number of valid cells identified from the sequencing data (as distinct from background noise or empty droplets).
  • Calculation Process: After merging barcodes from the same droplet, cells are filtered based on parameters like the number of fragments in peak regions and TSS proportion.
  • Quality Interpretation:
    • Abnormal Causes: Inaccurate cell counting, poor cell lysis, poor sample or library quality, low sequencing depth.
Species
  • Definition: The species or reference genome version used for the analysis.
  • Note: This information is derived from the reference genome provided during library preparation and is used to ensure the accuracy of alignment and annotation.
Median fragments per cell
  • Definition: The median number of valid ATAC-seq fragments contained within a single cell.
  • Biological Significance: This metric directly reflects the capture efficiency of open chromatin regions within a single nucleus and the sequencing depth. A higher value indicates better single-cell data quality.
  • Quality Interpretation:
    • High-Quality Standard: β‰₯ 10,000
    • Recommended Minimum: β‰₯ 2,000
    • Note: This value is highly dependent on cell type and sequencing depth.
Mean raw read pairs per cell
  • Definition: The average number of raw sequencing read pairs assigned to each cell.
  • Calculation: `Total Raw Read Pairs / Estimated Number of Cells`
  • Quality Interpretation: A value of β‰₯ 25,000 is recommended to ensure adequate chromatin coverage.
Fraction overlapping peaks
  • Definition: The proportion of a single cell's fragments that fall into identified open chromatin regions (Peaks).
  • Biological Significance: This is a key signal-to-noise ratio metric. A high proportion indicates that transposase activity was more concentrated in open chromatin, resulting in a high signal-to-noise ratio.
  • Quality Interpretation:
    • Quality Warning: < 15% may indicate sample quality issues.
Fraction overlapping TSS
  • Definition: The proportion of a single cell's fragments that fall within the Β±2kb region of a Transcription Start Site (TSS).
  • Biological Significance: A key metric for assessing chromatin activity in promoter regions and sequencing specificity.
  • Quality Interpretation:
    • Quality Warning: < 10% may indicate sample quality issues.
Fraction of fragments in cells
  • Definition: The proportion of all valid fragments that are successfully assigned to a high-quality cell ID.
  • Biological Significance: Reflects the efficiency of cell capture and the signal-to-noise ratio.
  • Quality Interpretation:
    • High-Quality Sample: A high ratio (e.g., > 50%) indicates high cell capture efficiency and low background noise.
    • Quality Issue: A low ratio may indicate poor sample quality or library construction anomalies.
Number of peaks
  • Definition: The total number of open chromatin regions (peaks) identified across the genome after aggregating the signal from all cells.
  • Biological Significance: Reflects the overall complexity of the sample and the number of detectable regulatory elements.
  • Influencing Factors: Affected by the number of cells, cell type heterogeneity, and sequencing depth.
  • Typical Range: 50,000 – 150,000 peaks.
-----------

πŸ”¬ Sequencing Metrics

🎯 Core Function: Basic quality assessment of sequencing data, including barcode identification rate, alignment quality, and sequencing accuracy.

πŸ“Š Quality Control Standards:

Note: The following standards are for reference only. Actual quality assessment should consider factors such as organism type, cell state, and experimental goals. Differences between samples may exist. It is recommended to combine specific experimental background for judgment.

Metric Category Recommended Acceptable Needs Improvement
Valid barcodes β‰₯ 80% 70–80% < 70%
Q30 bases in barcode > 85% 75–85% < 75%
Q30 bases in read > 85% 75–85% < 75%
Reads mapped to genome > 80% 50–80% < 50%

πŸ” Detailed Metric Explanations:

Metric Name Detailed Explanation & Technical Requirements
Total read pairs
  • Definition: The total number of raw sequencing read pairs allocated to the sample.
  • Significance: Represents the overall volume of sequencing data.
Valid barcodes
  • Definition: The proportion of reads whose cell barcode sequence can be successfully matched to the predefined whitelist (with error correction).
  • Biological Significance: Reflects the effectiveness of cell labeling.
  • Quality Interpretation: A low proportion usually suggests issues in library construction (e.g., barcode degradation, contamination) or a high sequencing error rate.
Reads mapped to genome
  • Definition: The proportion of all reads that successfully align to any location on the reference genome.
  • Quality Interpretation:
    • Needs Attention: < 50% may indicate sample contamination or species mismatch.
Mitochondria reads ratio
  • Definition: The proportion of all aligned reads that map to the mitochondrial genome.
  • Biological Significance: This is an important indicator of cell health.
  • Quality Interpretation: An excessively high ratio (e.g., > 10%) often suggests cell death or excessive lysis, leading to the capture of a large amount of mitochondrial DNA from the cytoplasm.
Nucleosome-free regions
  • Definition: The proportion of fragments originating from open chromatin regions (i.e., nucleosome-free regions).
  • Biological Significance: Reflects the strength of the valid ATAC-seq signal.
  • Quality Interpretation: A high proportion (e.g., > 40%) indicates good chromatin accessibility and efficient transposase activity.
Mono-nucleosome regions
  • Definition: The proportion of fragment regions containing a single nucleosome.
  • Biological Significance: Reflects the integrity of the chromatin structure. This metric, together with the 'Nucleosome-free regions' proportion, is used to assess the chromatin state.
Q30 bases in barcode
  • Definition: The proportion of bases with a sequencing quality score of Q30 or higher in the cell barcode sequence.
  • Significance: Q30 represents a sequencing error rate of less than 0.1%. This metric directly affects the accuracy of cell identification.
Q30 bases in read
  • Definition: The proportion of bases with a sequencing quality score of Q30 or higher in the sequencing read.
  • Significance: Reflects the overall quality level of the sequencing data and is fundamental to the accuracy of subsequent alignment and fragment identification.
-----------

πŸ“ˆ Visualization Chart 1

🎯 Core Function: Multi-dimensional visualization for cell quality control, fragment analysis, and chromatin accessibility assessment.

πŸ“Š Barcode Rank Plot

Chart Function:
This plot distinguishes high-quality real cells from background noise by ranking all cell barcodes by their fragment count.

How to Interpret:

-----------
πŸ“Š Droplet Beads Distribution

Chart Function:
Shows the distribution of the number of cell barcodes (Beads) captured in real cell droplets.

How to Interpret:

-----------
πŸ“Š Cell Data Distribution

Chart Function:
Displays the distribution of three key quality metricsβ€”Fragments, TSS Proportion, and Peak Proportionβ€”for high-quality cells using three separate violin plots.

How to Interpret:

-----------
πŸ“Š Fragment Length Distribution

Chart Function:
Shows the insertion length distribution of deduplicated ATAC-seq fragments, which is a key chart for assessing sample quality and chromatin structure integrity.

How to Interpret:


scATAC Web Report

πŸ“ˆ Other Key Metrics

Percent duplicates

-----------

Jaccard threshold


πŸ“ˆ Visualization Chart 2

🎯 Core Function: Advanced visualizations for cell clustering, TSS enrichment patterns, saturation assessment, and bead similarity.

πŸŒ€ Cluster Analysis

Chart Function:
Identifies potential cell subgroups by clustering cells with similar chromatin accessibility patterns together in a 2D space using UMAP for dimensionality reduction and Louvain for clustering.

How to Interpret:

-----------
πŸ“ˆ TSS Enrichment Profile

Chart Function:
Displays the enrichment of ATAC-seq fragment cleavage sites around the Transcription Start Sites (TSS) of all genes, serving as a core metric for ATAC-seq signal-to-noise ratio and data quality.

How to Interpret:

-----------
πŸ“Š Single Cell Targeting Plot

Chart Function:
Evaluates the effectiveness of the cell calling algorithm by displaying two key quality metrics for each cell in a scatter plot.

How to Interpret:

-----------
πŸ“ˆ Saturation Curve

Chart Function:
Assesses the sufficiency of sequencing depth and data complexity, i.e., whether further sequencing will yield more unique fragments.

How to Interpret:

-----------
πŸ“Š Bead Similarity Ranking

Chart Function:
Used in the C4 ATAC technology to merge multiple beads from the same cell droplet by calculating Jaccard similarity, a key step to ensure unique cell identity.

How to Interpret:


🎯 More Resources

Document Type Resource Link & Description
πŸš€ Quick Start Quick Start Guide - A complete tutorial for your first analysis.
βš™οΈ Parameter Reference Parameter Reference Manual - Detailed descriptions of all configurable parameters.
πŸ”¬ Analysis Pipeline Analysis Pipeline Description - Technical details of the entire analysis workflow.
πŸ”§ Installation & Setup Installation & Setup Guide - System requirements, installation steps, and environment configuration.

πŸ’‘ Tip

This document is continuously updated. If you find any errors or have information to add, feedback is welcome.

πŸ“ Document Version: 3.0 beta | Last Updated: 2025


πŸ”¬ DNBelab C Series HT scATAC Analysis Software
A High-Performance Pipeline for Single-Cell ATAC Sequencing Data Analysis