FASTX-Toolkit是一款用于处理Short-Reads FASTA/FASTQ文件的程序,里面包含了丰富的FASTA/FASTQ文件格式转换、统计等命令。软件下载地址:http://hannonlab.cshl.edu/fastx_toolkit/download.html
下面是其功能介绍:
- FASTQ-to-FASTA converter (FASTQ转换成Fasta):Convert FASTQ files to FASTA files.
命令:usage: fastq_to_fasta [-h] [-r] [-n] [-v] [-z] [-i INFILE] [-o OUTFILE]
输入文件:fastq文件
输出文件:fasta文件
- FASTX Statistics(质量统计)
命令:fastx_quality_stats [-h] [-i INFILE] [-o OUTFILE]
输入文件: FASTA/Q文件
输出文件:文本文件
- FASTQ Information(FastQ质量统计图、核酸长度分布):Chart Quality Statistics and Nucleotide Distribution
输入文件:fastx_quality_stats结果
输出文件:png文件
- FASTQ/A Collapser:Collapsing identical sequences in a FASTQ/A file into a single sequence (while maintaining reads counts)
- FASTQ/A Trimmer (去掉FASTA/FASTQ中barcode序列):Shortening reads in a FASTQ or FASTQ files (removing barcodes or noise).
- FASTQ/A Renamer(批量对FASTA/FASTQ序列重命名):Renames the sequence identifiers in FASTQ/A file.
- FASTQ/A Clipper(去掉FASTA/FASTQ中的接头序列):Removing sequencing adapters / linkers
命令:fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE]
输入文件:FASTA/FASTQ
输出文件:FASTA/FASTQ
$ fastx_clipper -Q 33 -l 18 -a TGGAATTCTCGGGTGCCAAGG -v -i input.fastq -o input_clipped.fastq
这里 -v可以显示输入和输出功能, -l 18是去掉长度小于18nt的reads,要善用fastx_clipper -h,这样就能挑选自己想要的参数。-Q 33 在Fastx Toolkit的应用中都要加,这个在-h中不显示,我暂时能找到的解释是这么说的-Q is an undocumented parameter to indicate that quality values use ASCII 33 encoding。结果显示如下:
Clipping Adapter: TGGAATTCTCGGGTGCCAAGG
Min. Length: 18
Input: 15344568 reads.
Output: 10454576 reads.
discarded 4708543 too-short reads.
discarded 31383 adapter-only reads.
discarded 150066 N reads.
然后是去低质量的reads,可以用fastq_quality_filter这个应用
$ fastq_quality_filter -Q 33 -v -q 30 -p 80 -i input_clipped.fastq -o input_clipped_qualified.fastq
关于 -q 和 -p 下列这个图片解释的很清楚,-q 30 -p 80 所过滤掉的reads介于-q 20 -p 90和-q 20 -p 100之间。
- FASTQ/A Reverse-Complement(取FASTA/FASTQ的反向互补序列)
- Producing the Reverse-complement of each sequence in a FASTQ/FASTA file.
输入文件:FASTA/FASTQ
输出文件:FASTA/FASTQ
- FASTQ/A Barcode splitter(根据Barcode将FASTA/FASTQ分成多个样本文件)
Splitting a FASTQ/FASTA files containning multiple samples
- FASTA Formatter:changes the width of sequences line in a FASTA file
- FASTA Nucleotide Changer:Convets FASTA sequences from/to RNA/DNA
- FASTQ Quality Filter:Filters sequences based on quality
- FASTQ Quality Trimmer:Trims (cuts) sequences based on quality
- FASTQ Masker:Masks nucleotides with ‘N‘ (or other character) based on quality