Trimmomatic安装与使用

用途:trim, crop, remove adapters

支持gz, bz2的fastq

http://www.usadellab.org/cms/?page=trimmomatic

http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf

先下载二进制包

http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.33.zip

然后解压unzip

我有写MP文库,我的目的是将read开头第一个碱基的N去掉,如果第六个碱基还是N, 那前六个都去掉,去掉后最后保留前40个碱基

ILLUMINACLIP: Cut adapter and other illumina-specific sequences from the read.

SLIDINGWINDOW: Performs a sliding window trimming approach. It starts scanning at the 5? end and clips the read once the average quality within the window falls below a threshold.

MAXINFO: An adaptive quality trimmer which balances read length and error rate to maximise the value of each read

LEADING: Cut bases off the start of a read, if below a threshold quality

TRAILING: Cut bases off the end of a read, if below a threshold quality

CROP: Cut the read to a specified length by removing bases from the end

HEADCROP: Cut the specified number of bases from the start of the read

MINLEN: Drop the read if it is below a specified length

AVGQUAL: Drop the read if the average quality is below the specified level

TOPHRED33: Convert quality scores to Phred-33

TOPHRED64: Convert quality scores to Phred-64

解压后直接就可以用了

从0.32版本,软件会自动识别是33还是64.

Processing Order The different processing steps occur in the order in which the steps are specified on the command line. It is recommended in most cases that adapter clipping, if required, is done as early as possible, since correctly identifying adapters using partial matches is more difficult.

我不需要去接头,因为对于MP文库,接头在后面,

输入和输出的名字要起好:

For input files,

either of the following can be used:

Explicitly naming the 2 input files Naming the forward file using the -basein flag, where the reverse file can be determined automatically.

The second file is determined by looking for common patterns of file naming, and changing the appropriate character to reference the reverse file.

Examples which should be correctly handled include:

Sample_Name_R1_001.fq.gh -> Sample_Name_R2_001.fq.gz

Sample_Name.f.fastq -> Sample_Name.r.fastq

Sample_Name.1.sequence.txt -> Sample_Name.2.sequence.txt

For output files, either of the following can be used:

Explicity naming the 4 output files Providing a base file name using the –baseout flag,

from which the 4 output files can be derived. If the name “mySampleFiltered.fq.gz” is provided, the following 4 file names will be used:

mySampleFiltered_1P.fq.gz - for paired forward reads

mySampleFiltered_1U.fq.gz - for unpaired forward reads

mySampleFiltered_2P.fq.gz - for paired reverse reads

mySampleFiltered_2U.fq.gz - for unpaired reverse reads

Most processing steps take one or more settings, delimited by ‘:‘ (a colon)

SLIDINGWINDOW
Perform a sliding window trimming, cutting once the average quality within the window falls
below a threshold. By considering multiple bases, a single poor quality base will not cause the
removal of high quality data later in the read.
SLIDINGWINDOW:<windowSize>:<requiredQuality>
windowSize: specifies the number of bases to average across
requiredQuality: specifies the average quality required.

The SLIDINGWINDOW trimmer will cut the leftmost position in the window where the average quality drops below the threshold and remove the rest of the read. However if there is low quality in the very beginning of the read then it will fail the minimum length tests and be removed completely - the remaining 3-prime end (even if it is good quality will not be printed)

Consider the following test file t.fq:

@1\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGCATCG
+
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeEFCB
@2\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGCATCG
+
EFCBeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

and processing with:

$ java -jar trimmomatic.jar SE -phred64 t.fq tt.fq SLIDINGWINDOW:4:15
TrimmomaticSE: Started with arguments: -phred64 t.fq tt.fq SLIDINGWINDOW:4:15
Automatically using 16 threads
Input Reads: 2 Surviving: 1 (50.00%) Dropped: 1 (50.00%)
TrimmomaticSE: Completed successfully

The output file looks like the following:

@1\1
AATGATCGTAGCGATGCAAGCTAGCCCGATGCCCGATCGC
+
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

As you can see the read with the poor quality at the beginning has been removed completely

LEADING
Remove low quality bases from the beginning. As long as a base has a value below this
threshold the base is removed and the next base will be investigated.
LEADING:<quality>
quality: Specifies the minimum quality required to keep a base.

TRAILING
Remove low quality bases from the end. As long as a base has a value below this threshold
the base is removed and the next base (which as trimmomatic is starting from the 3? prime end
would be base preceding the just removed base) will be investigated. This approach can be
used removing the special illumina „low quality segment? regions (which are marked with
quality score of 2), but we recommend Sliding Window or MaxInfo instead
TRAILING:<quality>
quality: Specifies the minimum quality required to keep a base.

CROP
Removes bases regardless of quality from the end of the read, so that the read has maximally
the specified length after this step has been performed. Steps performed after CROP might of
course further shorten the read.
CROP:<length>
length: The number of bases to keep, from the start of the read.

HEADCROP
Removes the specified number of bases, regardless of quality, from the beginning of the read.
HEADCROP:<length>
length: The number of bases to remove from the start of the read.

Paired End:
java -jar trimmomatic-0.30.jar PE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz
lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz
lane1_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3
TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

This will perform the following in this order

1,Remove Illumina adapters provided in the TruSeq3-PE.fa file (provided). Initially Trimmomatic will look for seed matches (16 bases) allowing maximally 2 mismatches. These seeds will be extended and clipped if in the case of paired end reads a score of 30 is reached (about 50 bases), or in the case of single ended reads a score of 10, (about 17 bases).
2,Remove leading low quality or N bases (below quality 3)
3,Remove trailing low quality or N bases (below quality 3)
4,Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15
5, Drop reads which are less than 36 bases long after these steps

最后我的命令如下:

先把开头N去掉,然后前40个碱基,缺点是有些第六个碱基是N的,没法去掉! 估计要写脚本,但是脚本太慢了。。。改天问问老师有没有什么好工具!

freemao

FAFU

时间: 2024-10-06 01:19:23

Trimmomatic安装与使用的相关文章

Trimmomatic过滤Illumina低质量序列

1. 下载安装 直接去官网下载二进制软件,解压后的trimmomatic-0.36.jar即为我们需要的软件 wget http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.36.zip unzip Trimmomatic 2. 运行软件 一般我们使用默认参数运行即可,具体使用方法可参见官网http://www.usadellab.org/cms/?page=trimmomatic 使用默认参数运

安装Windows7系统时,提示:缺少所需的CD/DVD驱动器设备驱动程序

      测试机型:HP probook 430 g3       系统:Windows 7 Pro x64 现在笔记本电脑主板集成的USB口大多为3.0版本,而且一些厂商为了追求PC的轻薄,不再集成光驱,所以我们在安装系统时,一般只能通过U盘或U口外接光驱. 而当我们因为需要(安装OEM系统),在通过刻录软件(如UltraISO)将系统写入U盘或光盘的方式安装系统时,此时问题就可能悄悄出现了:因为Win7官方原版系统没有集成USB3.0驱动,所以可能的报错如下: 点击"浏览"或通过

Windows8.1-KB2999226-x64安装提示 此更新不适用你的计算机

如题 Windows8.1-KB2999226-x64.msu  双击安装 安装提示 此更新不适用你的计算机 . 解决方案: 放在D:\update\目录下 windows键+X  选择  命令提示符(管理员)  一定要是管理员 打开cmd 分别执行下面两句.红色部分就是自己的更新程序了.其他安装同理 例如Windows8.1-KB2919442-x64.msu 等 1    expand –F:* D:\update\Windows8.1-KB2999226-x64.msu D:\update

pip安装提示PermissionError: [WinError 5]错误问题解决

 问题现象 新安装python3.6版本后使用pip安装第三方模块失败,报错信息如下: C:\Users\linyfeng>pip install lxml Collecting lxml Downloading http://pypi.doubanio.com/packages/fb/41/b8d5c869d01fcb77c72d7d226a847a3946034ef19c244ac12920b71cd036/lxml-3.8.0-cp36-cp36m-win32.whl (2.9MB) 10

windows安装TortoiseGit详细使用教程【基础篇】

环境:win8.1 64bit 安装准备: 首先你得安装windows下的git msysgit1.9.5 安装版本控制器客户端tortoisegit  tortoisegit1.8.12.0 [32和64别下载错,不习惯英文的朋友,也可以下个语言包] 一.安装图解: 先安装GIT[一路默认即可] 安装好git以后,右键,会发现菜单多了几项关于GIT的选项 2.安装tortoisegit[一路默认即可] 安装好以后,右键,会发现菜单多了几项关于tortoisegit的选项 到此,安装算完成了,相

在Win10 Anaconda中安装Tensorflow

有需要的朋友可以参考一下 1.安装Anaconda 下载:https://www.continuum.io/downloads,我用的是Python 3.5 下载完以后,安装. 安装完以后,打开Anaconda Prompt,输入清华的仓库镜像,更新包更快: conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ conda config --set show_channel_url

Linux下WebSphereV8.5.5.0 安装详细过程

Linux下WebSphereV8.5.5.0 安装详细过程 自WAS8以后安装包不再区别OS,一份介质可以安装到多个平台.只针对Installation Manager 进行了操作系统的区分 ,Websphere产品介质必须通过专门的工具Install Managere安装.进入IBM的官网http://www.ibm.com/us/en/进行下载.在云盘http://yun.baidu.com/share/linkshareid=2515770728&uk=4252782771 中是Linu

Python学习1-Python和Pycharm的下载与安装

本文主要介绍Python的下载安装和Python编辑器Pycharm的下载与安装. 一.Python的下载与安装 1.下载 到Python官网上下载Python的安装文件,进入网站后显示如下图: 网速访问慢的话可直接在这里下载:python-2.7.11.amd64 在Downloads中有对应的支持的平台,这里我们是在Windows平台下运行,所以点击Windows,出现如下: 在这里显示了Python更新的所有版本,其中最上面两行分别是Python2.X和Python3.X对应的最后更新版本

oracle安装故障:完美解决xhost +报错: unable to open display “”

oracle安装 先切换到root用户,执行xhost + 然后再切换到oracle用户,执行export DISPLAY=:0.0 出现乱码执行export LANG=US_en 在这里给大家介绍下两种情况的常见问题: 一种是本地运行的命令,另一种则是远程ssh命令安装. DISPLAY科普 DISPLAY变量是用来设置将图形显示到何处.比如CENTOS,你用图形界面登录进去,DISPLAY自动设置为DISPLAY=:0.0表示显式到本地监视器,那么通过终端工具(例如:xshell)进去,运行