tophat cufflinks cuffcompare cuffmerge 的使用

Cole Trapnell said:

there are three strategies:

1) merge bams and assemble in a single run of Cufflinks
2) assemble each bam and cuffcompare them to get a combined.gtf
3) assemble each bam and cuffmerge them to get a merged.gtf

All three options work a little differently depending on whether you‘re also trying to integrate reference transcripts from UCSC or another annotation source.

#1 is quite different from #2 and #3, so I‘ll discuss its pros and cons first. The advantage here is simplicity of workflow. It‘s one Cufflinks run, so no need to worry about the details of the other programs. As turnersd mentions, you might also think this maximizes the accuracy of the resulting assembly, and that might be the case, but it also might not (for technical reasons that I don‘t want to get into right now). The disadvantage of this approach is that your computer might not be powerful enough to run it. More data and more isoforms means substantially more memory and running time. I haven‘t actually tried this on something like the human body map, but I would be very impressed and surprised if Cufflinks can deal with all of that on a machine owned by mere mortals.

#2 and #3 are very similar - both are designed to gracefully merge full-length and partial transcript assemblies without ever merging transfrags that disagree on splicing structure. Consider two transfrags, A and B, each with a couple exons. If A and B overlap, and they don‘t disagree on splicing structure, we can (and according to Cufflinks‘ assembly philosophy, we should) merge them. The difference between Cuffcompare and Cuffmerge is that Cuffcompare will only merge them if A is "contained" in B, or vice versa. That is, only if one of the transfrags is essentially redundant. Otherwise, they both get included. Cuffmerge on the other hand, will merge them if they overlap, and agree on splicing, and are in the same orientiation. As turnersd noted, this is done by converting the transfrags into SAM alignments and running Cufflinks on them.

The other thing that distinguishes these two options is how they deal with a reference annotation. You can read on our website how the Cufflinks Reference Annotation Based Transcript assembler (RABT) works. Cuffcompare doesn‘t do any RABT assembly, it just includes the reference annotation in the combined.gtf and discards partial transfrags that are contained and compatible with the reference. Cuffmerge actually runs RABT when you provide a reference, and this happens during the step where transfrags are converted into SAM alignments and assembled. We do this to improve quantification accuracy and reduce errors downstream. I should also say that Cuffmerge runs cuffcompare in order annotate the merged assembly with certain helpful features for use later on.

So we recommend #3 for a number of reasons, because it is the closest in spirit to #1 while still being reasonably fast. For reasons that I don‘t want to get into here (pretty arcane details about the Cufflinks assembler) I also feel that option #3 is actually the most accurate in most experimental settings.

https://www.biostars.org/p/15693/

https://www.biostars.org/p/160808/

http://seqanswers.com/forums/showthread.php?t=16422

https://www.biostars.org/p/139186/

https://www.biostars.org/p/10219/

https://www.biostars.org/p/138521/

http://www.broadinstitute.org/cancer/software/genepattern/rna-seq-analysis

时间: 2024-10-22 15:05:09

tophat cufflinks cuffcompare cuffmerge 的使用的相关文章

转录组的组装Stingtie和Cufflinks

转录组的组装Stingtie和Cufflinks Posted: 十月 18, 2017  Under: Transcriptomics  By Kai  no Comments 首先这两款软件都是用于基于参考基因组的转录组组装,当然也可用于转录本的定量.前者于2016年的 protocol上发表的转录组流程HISAT, StringTie and Ballgown后被广泛使用,后者则是老牌的RNA分析软件了.在算法上来说Stringtie使用的是流神经网络算法,Cufflinks则是吝啬算法:

用于自动处理高通量测序(RNA-seq)数据的R脚本

反馈方式: 本文的任何错误,请在留言中指正:也可发邮件至[email protected],欢迎交流: 对于任何关于新功能的建议,也可按上一步交流: 本程序待改进地方: 想着,在运行程序的同时,程序会将自身复制一份到输出文件夹用于备份(current_file_path_getter):但是该函数的可移植性很差,暂时无法识别以R CMD方式运行该脚本,但是通过source("")和“R --file=”方式运行时没有问题的:也请有更好方法的牛人不吝赐教,谢谢先: 期望在每一步完成后,脚

RNA-seq数据综合分析教程 AKAP95

RNA-seq数据综合分析教程 2 4,055 A+ 所属分类:Transcriptomics 收  藏 2 RNA-seq数据分析 mRNA-seq是目前最常用的高通量测序技术,一般的用法就是看看基因表达谱,寻找差异表达的基因.我和高通量测序数据分析结缘,也是因为RNA-seq. 一开始我对mRNA-seq数据分析一无所知,跑了"tophat+cufflinks"的流程也不知道每一步的原因,把"RNA-seq data analysis:A pratice approach

cufflinks

background: 在做ASE的过程中,发现很多的SNPsites并没有落到Osativa204 所提供的gene id上,为了给这些位点分配一个gene id, 准备用cufflinks自己拼一个出来 Cufflinks : assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts

tophat安装

1 ? ? 依赖软件:bowtie,bowtie2,samtools,boost c++ library 2 ? ? 建立索引文件: ? ? ?bowtie包括bowtie,bowtie-build,bowtie-inspect ? ? ?bowtie2包括bowtie2,bowtie2-build,bowtie2-inspect,默认会找bowtie2 ? ? ?bowtie-build运行结果会得到一些.ebwt的文件 ? ? ?bowtie2-build建index,运行结果得到一些.bt

tophat的用法

概述:tophat是以bowtie2为核心的一款比对软件. tophat工作分两步: 1.将reads用bowtie比对到参考基因组上. 2.将unmapped-reads打断成更小的fragments,比对到参考基因组上,如果比对成功,建立剪切点. 用法:tophat [options]* <index_base> <reads1_1[,-,readsN_1]> [reads1_2,-readsN_2] <index_base>:参考基因组的index文件的具体目录,

tophat

tophat -p 4 -G  filter.gtf \ -o  /SRR222 \ /index \ SRR222.fastq tophat -p 4 -G  filter.gtf \ -o  /SRR222 \ /index \ SRR717.fastq,SRR718.fastq tophat -p 4 -G  filter.gtf \ -o  /SRR222 \ /index \ SRR669_1.fastq \ SRR669_2.fastq Usage:    tophat [optio

机器学习进阶-图像形态学变化-礼帽与黑帽 1.cv2.TOPHAT(礼帽-原始图片-开运算后图片) 2.cv2.BLACKHAT(黑帽 闭运算-原始图片)

1.op = cv2.TOPHAT  礼帽:原始图片-开运算后的图片 2. op=cv2.BLACKHAT 黑帽: 闭运算后的图片-原始图片 礼帽:表示的是原始图像-开运算(先腐蚀再膨胀)以后的图像 黑帽:表示的是闭运算(先膨胀再腐蚀)后的图像 - 原始图像 代码: 第一步:读取图片 第二步:使用cv2.MOPRH_TOPHAT获得礼帽图片 第三步:使用cv2.MOPRH_BLACKHAT获得黑帽图片 import cv2 import numpy as np # 第一步读入当前图片 img =

合并基因表达水平(merge gene expression levels, FPKM)

使用tophat和cufflinks计算RNA-seq数据的表达水平时,当一个基因在一个样本中有多个表达水平时需要合并它们的表达水平. This code is a solution to collapsing duplicate FPKMs for a gene. CollapseFPKM This code is a solution to collapsing duplicate FPKMs for a gene Problem/Issue: In the cufflinks output