🏠 Home β€’ δΈ­ζ–‡

🧬 DNBelab C Series HT Tool-based Analysis Parameters

πŸ› οΈ GTF File Operations (mkgtf) β€’ πŸ“„ BAM to FASTQ (bam2fastq) β€’ 🧬 Chromosome Splitting (chromsplit) β€’ πŸ“ FASTQ Subsetting (fqsubC4)


πŸ› οΈ GTF File Operations (mkgtf)

🧬 Core Functionality

A comprehensive tool for GTF file operations, supporting gene type statistics, intelligent filtering, and file format validation. It provides high-quality, standardized gene annotation data for single-cell analysis.

πŸ“Š Usage

$ dnbc4tools tools mkgtf -h

optional arguments:
  -h, --help            show this help message and exit

Basic Settings:
  --action <STR>        Select action type: 'mkgtf'(filter), 'stat'(statistics) or 'check'(validation) [default: mkgtf]
  --ingtf <FILE>        Path to input GTF annotation file
  --output <FILE>       Path to output file

Filter Settings:
  GTF file format requirements:
                  RNA analysis requires "gene"/"transcript" and "exon" types, plus gene_id/name and transcript_id/name attributes.

  --include <STR>       Set filter parameters in 'mkgtf' mode, multiple filters separated by commas. Default includes: protein_coding, lncRNA, lincRNA, antisense, IG_*/TR_* genes
  --type <STR>          Set according to gene type tag in GTF attributes [default: gene_biotype]
  --feature <STR>       Select information from feature column. If no 'gene' rows, select 'transcript' [default: gene]

πŸ“ Parameter Description

πŸ”΄ Required Parameters

--ingtf (Required)

Specify the path to the input GTF gene annotation file.

Default: None

Example:

--ingtf Homo_sapiens.GRCh38.108.gtf

--output (Required)

Specify the output file for the processing results.

Default: None

Examples:

# When action is 'mkgtf' (filter)
--output ./filtered_genes.gtf
# When action is 'stat' (statistics)
--output ./gene_statistics.txt
# When action is 'check' (validation)
--output ./corrected.gtf

🟒 Optional Parameters

--action (Optional)

Select the type of operation to perform.

Default: mkgtf

Example:

--action stat

--include (Optional)

In mkgtf mode, specify the gene types to keep, separated by commas.

Default: protein_coding,lncRNA,lincRNA,antisense,IG_*,TR_*

Example:

--include protein_coding,lncRNA

--type (Optional)

Specify the tag in the GTF attributes used to identify the gene type.

Default: gene_biotype

Example:

--type gene_type

--feature (Optional)

Specify from which column (feature) of the GTF file to extract information.

Default: gene

Example:

--feature transcript

Note

πŸ’‘ Usage Examples


πŸ“„ BAM to FASTQ (bam2fastq)

πŸ“„ Professional Conversion Tool

An efficient BAM file manipulation tool specialized for converting C4 RNA BAM files into FASTQ format. It supports multi-threaded parallel processing and flexible output configuration.

πŸ“Š Usage

$ bam2fastq --help
BAM to FASTQ Converter for C4 Single Cell RNA seq Data

Usage: bam2fastq [OPTIONS] <BAM> <OUTPUT>

Arguments:
  <BAM>     Path to the input BAM file
  <OUTPUT>  Directory where FASTQ files will be written

Options:
  -t, --threads <THREADS>        Number of CPU threads for parallel processing [default: 4]
  -r, --locus <REGION>           Process reads from a specific genomic region (format: chr1:1000-2000)
  -n, --reads-per-fastq <READS>  Maximum number of reads per FASTQ file. All reads go to a single file if not specified.
      --max-memory <MEMORY>      Maximum memory to use in MB. If not specified, will be automatically determined based on system resources.
      --no-compress              Disable gzip compression for output FASTQ files
  -h, --help                     Print help
  -V, --version                  Print version

πŸ“ Parameter Description

πŸ”΄ Required Parameters

<BAM> (Required)

Specify the path to the input BAM file.

Default: None

Example:

/path/to/your.bam

<OUTPUT> (Required)

Specify the directory for the output FASTQ files.

Default: None

Example:

/path/to/output_dir

🟒 Optional Parameters

-t, --threads (Optional)

Set the number of CPU threads for parallel processing.

Default: 4

Example:

-t 8

-r, --locus (Optional)

Process only reads from a specific genomic region.

Default: None

Example:

-r chr1:1000-2000

-n, --reads-per-fastq (Optional)

Set the maximum number of reads per output FASTQ file.

Default: None

Example:

-n 10000000

--max-memory <MEMORY> (Optional)

Set the maximum memory the tool can use (in MB).

Default: Auto-determined

Example:

--max-memory 8192

--no-compress (Flag)

Disable gzip compression for the output FASTQ files to significantly increase analysis speed.

Default: Not set

Note

πŸ’‘ Usage Examples


🧬 Chromosome Splitting (chromsplit)

🧬 Core Functionality

A professional genome sequence splitting tool that intelligently identifies split points to maintain gene annotation integrity. It is primarily used in ATAC library construction to ensure chromosome lengths do not exceed the 2^29-1 limit.

πŸ“Š Usage

$ chromsplit --help

Usage: chromsplit [OPTIONS] --fasta <FA> --prefix <PREFIX>

Options:
  -f, --fasta <FA>           Input genome sequence file in FASTA format
  -g, --gtf <GTF>            Optional GTF/GFF annotation file for the genome
  -o, --prefix <PREFIX>      Prefix for output files
  --min_length <MIN_LENGTH>  Minimum length of output scaffold fragments [default: 300000000]
  --max_length <MAX_LENGTH>  Maximum length of output scaffold fragments [default: 500000000]
  --cut_site <CUT_SITE>      Optional cut site file containing predefined split positions
  -h, --help                 Print help
  -V, --version              Print version

πŸ“ Parameter Description

πŸ”΄ Required Parameters

-f, --fasta <FA> (Required)

Specify the input genome sequence file.

Default: None

Example:

--fasta genome.fasta

-o, --prefix <PREFIX> (Required)

Specify the prefix for the output files.

Default: None

Example:

--prefix split_genome

🟒 Optional Parameters

-g, --gtf <GTF> (Optional)

Specify the gene annotation file (GTF/GFF format).

Default: None

Example:

--gtf annotation.gtf

--min_length <MIN_LENGTH> (Optional)

Set the minimum length of the output fragments (unit: bp).

Default: 300000000

Example:

--min_length 300000000

--max_length <MAX_LENGTH> (Optional)

Set the maximum length of the output fragments (unit: bp).

Default: 500000000

Example:

--max_length 500000000

--cut_site <CUT_SITE> (Optional)

Provide a text file containing predefined split positions.

Default: None

Example:

--cut_site predefined_cuts.txt

Note

πŸ’‘ Usage Examples


πŸ“ FASTQ Subsetting (fqsubC4)

πŸ“ Core Functionality

A professional tool for extracting regions from FASTQ sequences, supporting precise sequence position clipping. It is mainly used to resolve data format inconsistencies from multiple sequencing runs, ensuring standardized processing of C4 sequencing data.

πŸ“Š Usage

$ fqsubC4 --help

Usage: fqsubC4 [OPTIONS] --input <FILE> --output <FILE> --regions <REGIONS>

Options:
  -i, --input <FILE>           Path to input FASTQ file
  -o, --output <FILE>          Path to output FASTQ file
  -r, --regions <REGIONS>      Comma-separated regions in format start:end (e.g., 7:16,23:32,38:47)
  -b, --batch-size <BATCH_SIZE>  Batch size for processing [default: 100000]
  --buffer-size <BUFFER_SIZE>  Buffer size for channel between reader and writer [default: 500]
  -h, --help                   Print help
  -V, --version                Print version

πŸ“ Parameter Description

πŸ”΄ Required Parameters

-i, --input <FILE> (Required)

Specify the path to the input FASTQ file.

Default: None

Example:

--input sample_R1.fastq.gz

-o, --output <FILE> (Required)

Specify the path for the output FASTQ file.

Default: None

Example:

--output extracted_R1.fastq.gz

-r, --regions <REGIONS> (Required)

Specify the regions to be extracted from the sequences.

Default: None

Example:

--regions 7:16,23:32,38:47

🟒 Optional Parameters

-b, --batch-size <BATCH_SIZE> (Optional)

Set the number of records per batch for processing (i.e., the number of FASTQ records read into memory at one time).

Default: 100000

Example:

--batch-size 200000

--buffer-size <BUFFER_SIZE> (Optional)

Set the buffer size for the channel between the reader and writer.

Default: 500

Example:

--buffer-size 1000

Note

πŸ’‘ Usage Example


πŸ’‘ Tip

This document is continuously updated. If you find any errors or have information to add, your feedback is welcome.

πŸ“ Document Version: 3.0 beta | Last Updated: 2025


πŸ› οΈ DNBelab C Series HT Tool-based Analysis Parameters
A parameter configuration guide for high-performance single-cell data analysis tools