7、RNAseq Downstream Analysis

Created by Dennis C Wylie, last modified on Jun 29, 2015

Machine learning methods (including clustering, dimensionality reduction, classification and regression modeling, resampling techniques, etc.), ANOVA modeling, and empirical Bayes analysis.

Unsupervised Analysis

Unsupervised methods provide exploratory data analysis useful for getting a big picture view: can provide valuable QC information and can help to both assess expected trends and identify unexpected patterns in your data.

  • Deliverables:

    • Plots in png and pdf format
    • Results from any additional algorithms applied may be provided in tab-delimited or excel formatted tables as appropriate
  • Tools Used:

    • Hierarchical Clustering: both of genes and and samples.
    • Principal Components Analysis: PCA biplot of data after centering both on the gene and sample axes (and optionally scaling of gene axis if desired).
    • Other methods: (e.g., k-means clustering, self-organized maps, multidimensional scaling, etc.) available if desired

Empirical Bayes Differential Expression Analysis

RNAseq experiments yield simultaneous measurements of many intrinsically similar variables (gene expression levels) but with often limited sample sizes. Empirical Bayes methods provide a statistical approach designed just for such situations which "borrow strength" across genes to increase statistical power and decrease false discovery.

  • Deliverables:

    • Tables of model parameters, p-values, and FDR q-values (in tab-delimited and excel format)
    • Boxplots (stratified by sample group) and pairs plots of top genes provided in png and pdf format
  • Tools Used:

    • Limma: applies empirical Bayes methods in the construction of linear models (e.g, t-tests, ANOVA) for a large variety of experimental designs. Originally designed for microarray data analysis, Limma‘s developers have substantially extended its functionality into the realm of RNAseq as well.

Supervised Analysis

Many methods available for classification and regression as appropriate to your analysis. Model performance may be assessed using standard metrics evaluated under cross-validation or using independent test sets if available. Analysis will be conducted using R and/or Python scripts.

  • Deliverables:

    • Tables of results (in tab-delimited and excel formats)
    • plots in png and pdf format
    • R and/or Python source files
    • binary, JSON, or XML representations of R or Python objects can be made available if desired
    • further reports in the form of slides or text documents may be provided in standard formats (pdf, doc, ppt) if desired
  • Methods Available:

    • Diagonal linear discriminant analysis (DLDA, a form of linear naive Bayes classification)
    • Linear and quadratic discriminant analysis
    • Logistic regression including L1/lasso and/or L2/ridge regularization if desired
    • Partial least squares (PLS) discriminant analysis and regression
    • k-nearest neighbors (KNN)
    • Support vector machines (SVM)
    • Decision tree ensembles (Random Forests or AdaBoost).
    • Other methods are available on request.
时间: 2024-07-30 13:43:20

7、RNAseq Downstream Analysis的相关文章

6、RNA-Seq Analysis Pipeline

Created by Dhivya Arasappan, last modified by Dennis C Wylie on Nov 08, 2015 This pipeline uses an annotated genome to identify differential expressed genes/transcripts. 10 hour minimum ($470 internal, $600 external) per project. 1. Quality Assessmen

arc GIS10.2 安装patch analysis

具体patch analysis这个插件有何功能,既然你找到了这篇安装教程,那想必已经对它的功能有所了解,如果不了解,参考以下链接http://www.cnfer.on.ca/SEP/patchanalyst/Patch5_1_Install.htm 或是自行搜索.之所以要介绍安装过程,是因为在我不知道如何安装时搜索出的结果几乎全是对上面那个链接(英文网页)的直接翻译,叙述太累赘可操作性不强,于是才打算将自己胡乱摸索出的简易安装过程分享给大家. 安装步骤: 1.下载patch analysis插

玩转大数据:深入浅出大数据挖掘技术(Apriori算法、Tanagra工具、决策树)

一.本课程是怎么样的一门课程(全面介绍) 1.1.课程的背景 “大数据”作为时下最火热的IT行业的词汇,随之而来的数据仓库.数据分析.数据挖掘等等围绕大数据的商业价值的利用逐渐成为行业人士争相追捧的利润焦点. “大数据” 其实离我们的生活并不遥远,大到微博的海量用户信息,小到一个小区超市的月销售清单,都蕴含着大量潜在的商业价值. 正是由于数据量的快速增长,并且已经远远超过了人们的数据分析能力.因此,科学.商用等领域都迫切需要智能化.自动化的数据分析工具.在这样的背景下,数据挖掘技术应用而生,使得

性能测试工具Loadrunner使用之三(Analysis )

analysis简介 分析器就是对测试结果数据进行分析的组件,它是LR三大组件之一,保存着大量用来分析性能测试结果的数据图,但并不一定要对每个视图进行分析,可以根据实际情况选择相关的数据视图进行分析,分析结果可以生成一些不同格式的测试报告. 一.设置选项 analysis中的数据是怎么得到的呢?其实在场景运行的时候,默认情况下,所有的vuser信息都保存在该vusr的负载机上.只有当场景运行结束后,这些数据才会自动进行整理或合并,这时负载机上所有vuser的信息和数据都将被传输到结果目录中.默认

loadrunner入门篇-Analysis 分析器

analysis简介 分析器就是对测试结果数据进行分析的组件,它是LR三大组件之一,保存着大量用来分析性能测试结果的数据图,但并不一定要对每个视图进行分析,可以根据实际情况选择相关的数据视图进行分析,分析结果可以生成一些不同格式的测试报告. 一.设置选项 analysis中的数据是怎么得到的呢?其实在场景运行的时候,默认情况下,所有的vuser信息都保存在该vusr的负载机上.只有当场景运行结束后,这些数据才会自动进行整理或合并,这时负载机上所有vuser的信息和数据都将被传输到结果目录中.默认

【转载】典型关联分析(Canonical Correlation Analysis)

典型关联分析(Canonical Correlation Analysis) [pdf版本] 典型相关分析.pdf 1. 问题 在线性回归中,我们使用直线来拟合样本点,寻找n维特征向量X和输出结果(或者叫做label)Y之间的线性关系.其中,.然而当Y也是多维时,或者说Y也有多个特征时,我们希望分析出X和Y的关系. 当然我们仍然可以使用回归的方法来分析,做法如下: 假设,,那么可以建立等式Y=AX如下 其中,形式和线性回归一样,需要训练m次得到m个. 这样做的一个缺点是,Y中的每个特征都与X的所

trinity based DEG analysis

Identifying Differentially Expressed Trinity Transcripts Our current system for identifying differentially expressed transcripts relies on using the EdgeR Bioconductor package. We have a protocol and scripts described below for identifying differenti

RNA-seq数据综合分析教程 AKAP95

RNA-seq数据综合分析教程 2 4,055 A+ 所属分类:Transcriptomics 收  藏 2 RNA-seq数据分析 mRNA-seq是目前最常用的高通量测序技术,一般的用法就是看看基因表达谱,寻找差异表达的基因.我和高通量测序数据分析结缘,也是因为RNA-seq. 一开始我对mRNA-seq数据分析一无所知,跑了"tophat+cufflinks"的流程也不知道每一步的原因,把"RNA-seq data analysis:A pratice approach

Question: Should I use reads with good quality but failed-vendor flag?--biostart for vendor quality

https://www.biostars.org/p/198405/ Quick question is: I have some mapped reads in bam file which have good read quality, but they have sam flag 0x200 which means they didn't pass the vendor check. Should I include them or not in downstream analysis?