trinity based DEG analysis

Identifying Differentially Expressed Trinity Transcripts

Our current system for identifying differentially expressed transcripts
relies on using the EdgeR Bioconductor package. We have a protocol and scripts
described below for identifying differentially expressed transcripts and
clustering transcripts according to expression profiles. This process is
somewhat interactive, and described are automated approaches as well as manual
approaches to refining gene clusters and examining their corresonding expression
patterns.

Run EdgeR

Note: This system is not yet compatible with biological replicates, but will
soon be updated to leverage such data.

First, join the RSEM-estimated
abundance values
for each of your samples by running:

TRINITY_RNASEQ_ROOT/util/RSEM_util/merge_RSEM_counts_single_table.pl  sampleA.RSEM.isoform.results sampleB.RSEM.isoform.results ... > all.counts.matrix

Edit the column headers in the matrix file to your liking, since this is how
the samples will be named in the downstream analysis steps.

Using the all.counts.matrix file created above, perform TMM (trimmed mean of
M-values) normalization and identify differentially expressed transcripts
resulting from pairwise comparisons among the samples like so:

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/run_EdgeR.pl --matrix all.counts.matrix --transcripts Trinity.fasta --output edgeR_results_dir

If you have only a single reference sample that you want the other samples to
be compared to, as opposed to the all-vs-all comparisons, indicate the reference
sample’s column heading with: --reference ref_column_name as it exists in the
all.counts.matrix file.

Each pairwise comparison will generate a ${samplea}_vs_${sampleb}.results.txt
output file listing the differentially expressed transcripts, log fold-changes
in expression, P-values, and FDR-corrected P-values. An edgeR dispersion factor
of 0.1 (script default, but you can adjust) is used given that no biological
replicates are assumed and to minimize false-positive calls. (see edgeR manual
for details). In addition to the differentially expressed transcripts
tablulated, an MA-plot is generated for each comparison (corresponding .eps
file) as shown below. The column on the left of the MA-plot corresponds to those
transcripts that have read counts in only one of the two conditions. Transcripts
showing up as red dots in the MA-plot are those that are defined as
differentially expressed.

The TMM and length-normalized (FPKM) expression values are provided in a
file: transcript_read_counts.RAW.normalized.FPKM, which can be examined using
additional methods described below.

Analyzing
Differentially Expressed Transcripts


An initial step in analyzing differential expression is to extract those
transcripts that are most differentially expressed (most significant P-values
and fold-changes) and to cluster the transcripts according to their patterns
of differential expression across the samples. To do this, you can run the
following from within the edgeR output directory

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/analyze_diff_expr.pl
--matrix transcript_read_counts.RAW.normalized.FPKM -P 1e-3 -C 2

which will extract all genes that have P-values at most 1e-3 and are at least
2^2 fold differentially expressed. The FPKM normalized data points for these
genes will be retrieved, and written to a file:
diffExpr.P${pvalue}_C{$fold_change}.matrix . These data will then be clustered
using R, after first being log2-transformed, and mean-centered, generating a
heatmap file: diffExpr.P${pvalue}_C{$fold_change}.matrix.heatmap.eps, as shown
below:

The above is mostly just a visual reference. To more seriously study and
define your gene clusters, you will need to interact with the data as described
below. The clusters and all required data for interrogating and defining
clusters is all saved with an R-session, locally with the file
all.RData. This will be leveraged as described below.

Automatically
defining a K-number of Gene Clusters

Run the command below to automatically split the data set into a set of
$num_clusters (similar to k-means clustering).

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/define_clusters_by_cutting_tree.pl -K $num_clusters

A directory will be created called: clusters_fixed_K_${num_clusters}/ and
contain the expression matrix for each of the clusters.

To plot the mean-centered expression patterns for each cluster, visit that
directory and run:

TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/plot_expression_patterns.pl subcluster_*

This will generate a summary image file: my_cluster_plots.pdf, as shown
below:

Manually Defining Gene
Clusters

Manually defining your clusters is the best way to organize the data to your
liking. This is an interactive process. Fire up R from within your output
directory, being sure it contains the all.RData file, and enter the
following commands: R

load("all.RData")

source("TRINITY_RNASEQ_ROOT/Analysis/DifferentialExpression/R/manually_define_clusters.R")

manually_define_clusters(hc_genes, centered_data)

This should yield a display containing the hierarchically clustered genes, as
shown below:

Now, manually define your clusters from left to right (order matters here, so
you can decipher the results later!) by clicking on the branch vertical branch
that defines the clade of interest. After clicking on the branch, it will be
drawn with a red box around the selected clade, as shown below:

Right click with the mouse (or double-touch a touchpad) to exit from cluster
selection.

The clusters as selected will be written to a subdirectory
manually_defined_clusters_$count_clusters, and exist in a format similar to the
automated-selection of clusters described above. Likewise, you can generate
plots of the expression patterns for each cluster using the
plot_expression_patterns.pl script.

from:
https://github.com/genome-vendor/trinity/blob/master/docs/analysis/diff_expression_analysis.asciidoc#identifying-differentially-expressed-trinity-transcripts

时间: 2024-11-07 20:38:43

trinity based DEG analysis的相关文章

Understanding postgresql.conf : log*

After loooong pause, adding next (well, second) post to the “series“. This time, I'd like to describe how logging works. And I don't mean binary logging (WAL), but the log for us, humans, to read. Before I will go to the postgresql.conf options, let

静态类型检查与继承

类型检查(type checking)是指确认任一表达式的类型并保证各种语句符合类型的限制规则的过程.Java是静态类型检查的语言,但是仍然需要运行期类型检查,并抛出可能的运行时异常. Wiki: Static type-checking is the process of verifying the type safety of a program based on analysis of a program's source code. Dynamic type-checking is th

Android SQLite详解

在项目开发中,我们或多或少都会用到数据库.在Android中,我们一般使用SQLite,因为Android在android.database.sqlite包封装了很多SQLite操作的API.我自己写了一个Demo来总结SQLite的使用,托管在Github上,大家可以点击下载APK,也可以点击下载源码.Demo截图如下: 在使用SQLite时,我建议先下载一个本地SQLite客户端来验证操作,在本地写的SQL语句运行正确后,再转移到Android中.我用的是SQLite Expert Pers

<Using parquet with impala>

Operations upon Impala Create table stored as parquet like parquet '/user/etl/datafile1' stored as parquet Loading data shuffle / no shuffle to choose 使用insert ... select 而不是 insert ... values, 因为后者产生a separate tiny data file. impala decodes the colu

android开发系列之由ContentValues看到的

这本篇博客里面我想重点来分析一下ContentValues的源码以及它里面涉及到的继承接口Parcelabel,还有HashMap的源码. 相信使用过android里面数据库操作的朋友对于ContentValues一定不会感到陌生吧,它其实很像一个字典对象,可以用来存储键值对.比如代码如下: ContentValues contentValues=new ContentValues(); contentValues.put("name","xiao"); conte

https那些事儿

(一)SSL/TLS协议运行机制的概述 一.作用 不使用SSL/TLS的HTTP通信,就是不加密的通信.所有信息明文传播,带来了三大风险. (1) 窃听风险(eavesdropping):第三方可以获知通信内容. (2) 篡改风险(tampering):第三方可以修改通信内容. (3) 冒充风险(pretending):第三方可以冒充他人身份参与通信. SSL/TLS协议是为了解决这三大风险而设计的,希望达到: (1) 所有信息都是加密传播,第三方无法窃听. (2) 具有校验机制,一旦被篡改,通

计算机视觉code与软件

Research Code A rational methodology for lossy compression - REWIC is a software-based implementation of a a rational system for progressive transmission which, in absence of a priori knowledge about regions of interest, choose at any truncation time

paper 15 :整理的CV代码合集

这篇blog,原来是西弗吉利亚大学的Li xin整理的,CV代码相当的全,不知道要经过多长时间的积累才会有这么丰富的资源,在此谢谢LI Xin .我现在分享给大家,希望可以共同进步!还有,我需要说一下,不管你的理论有多么漂亮,不管你有多聪明,如果没有实验来证明,那么都是错误的.  OK~本博文未经允许,禁止转载哦!  By  wei shen Reproducible Research in Computational Science “It doesn't matter how beautif

java not enough memory error.

After Update from jre-7_21 to jre-7_45: Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory() failed; error='Cannot allocate memory' (errno=12) 12 posts by 7 authors Patrick Heppler  10/28/13 Hi, I'm running railo 4.1.1.009 on a centos