miRNA分析--比对（二）

miRNA分析--数据过滤（一）

在比对之前为了减少比对时间，将每一个样本中的reads进行合并，得到fasta格式，其命名规则如下：

样本_r数子_x数字

r 中的数字表示reads序号；

x 中的数字表示该条reads重复次数

比对分为两条策略

1、根据本物种已有的miRNA序列进行比对，

已知当miRNA序列从 miRBase或者 sRNAanno得到

(应该将clean reads比对到所研究物种到tRNA, rRNA, snoRNA,mRNA等数据，允许一个错配，将比对上等reads过滤，也可以比对到参考基因组，将为未比对到到reads过滤掉，但是本次我没有这么做)

对于第一种情况，我采用bowtie将reads比对到成熟miRNA

1 ##建立索引
2 bowtie-build ref.fa
3 ##比对
4 bowtie -v 0 -m 30 -p 10 -f  ref.fa sample.fa sample.bwt
5 参数解释
6 -v: 允许0个错配
7 -p: 10个线程
8 -m: 当比对超过这个数时，认为时未比对
9 -f： 输入序列fasta

根据.bwt 文件可以计算出每个已知当miRNA中比对上当reads数量，别忘记乘以 x后面的数

2、直接比对到参考基因组并进行新miRNA鉴定

采用miR-PREFeR进行Novel miRNA鉴定

其githup主页：https://github.com/hangelwen/miR-PREFeR

需安装ViennaRNA ( 最好是1.8.5或2.1.2, 2.1.5版本)

1 #我装的是最新版
2 wget https://www.tbi.univie.ac.at/RNA/download/sourcecode/2_4_x/ViennaRNA-2.4.10.tar.gz
3 tar zvxf ViennaRNA-2.4.10.tar.gz
4 cd ViennaRNA-2.4.10
5 ./configure --prefix="/user/tools/ViennaRNA/" --without-perl
6 make
7 make install

安装

1 git clone https://github.com/hangelwen/miR-PREFeR.git

数据准备

1??ref.fa

2??miRNA 比对到ref.fa的sam文件

3??gff 文件（可选，记录需要屏蔽掉的信息，比如重复序列等）

bowtie 比对

1 bowtie -v 0 -m 30 -p 10 -f ref.fa sample.fa -S sample.sam

准备configure 文件

 1 #example configuration file for the miR-PREFeR pipeline.
 2 #lines start with ‘#‘ are comments
 3
 4 #miR-PREFeR path, please change to your path to the script folder.
 5 #Absolute path perfered.
 6 PIPELINE_PATH = /miR
 7
 8 #Genomic sequence file in fasta format.  Absolute path perfered. If a path
 9 #relative if used, it‘s relatvie to the working directory where you execute
10 #the pipeline.
11 FASTA_FILE =  genome_v1.fa
12
13 #Small RNA read alignment file in SAM format. The SAM file should contain
14 #the SAM header. If N samples are used, then N file names are listed here,
15 #separated by comma. please note that before doing alignment, process the
16 #reads fasta files using the provided script ‘process-reads-fasta.py‘ to
17 #collapse and rename the reads. Absolute path perfered. If a path
18 #relative if used, it‘s relatvie to the working directory where you execute
19 #the pipeline.
20 ALIGNMENT_FILE = ./trm_XX-1_L1_I309.R1.fastq_trm_fa.fa.sam, ./trm_XX-2_L1_I310.R1.fastq_trm_fa.fa.sam, ./trm_XX-3_L1_I311.R1.fastq_trm_fa.fa.sam, ./trm_XY-1_L1_I312.R1.fastq_trm_fa.fa.sam, ./trm_XY-2_L1_I313.R1.fastq_trm_fa.fa.sam, ./trm_XY-3_L1_I314.R1.fastq_trm_fa.fa.sam, ./trm_YY-1_L1_I315.R1.fastq_trm_fa.fa.sam, ./trm_YY-2_L1_I316.R1.fastq_trm_fa.fa.sam, ./trm_YY-3_L1_I332.R1.fastq_trm_fa.fa.sam
21
22 #GFF file which list all existing annotations on genomic sequences FASTA_FILE.
23 #If no GFF file is availble, comment this line out or leave the value blank.
24 #Absolute path perfered. If a path relative if used, it‘s relatvie to the
25 #working directory where you execute the pipeline.
26 #CAUTION: please only list the CDS regions, not the entire miRNA region, because
27 #miRNAs could be in introns. This option is mutual exclusive with ‘GFF_FILE_INCLUDE‘
28 #option.
29 # If you have a GFF file that contains regions in which you want to predict whehter
30 # they include miRNAs, please use the ‘GFF_FILE_INCLUDE‘ option instead.
31 GFF_FILE_EXCLUDE = CDS.gff
32
33 # Only predict miRNAs from the regions given in the GFF file. This option is mutual
34 # exclusive with ‘GFF_FILE_EXCLUDE‘. Thus, only one of them can be used.
35 #GFF_FILE_INCLUDE = ./TAIR10.chr1.candidate.gff
36
37 #The max length of a miRNA precursor. The range is from 60 to 3000. The default
38 #is 300.
39 PRECURSOR_LEN = 300
40
41 #The first step of the pipeline is to identify candidate regions of the miRNA
42 #loci. If READS_DEPTH_CUTOFF = N, then bases that the mapped depth is smaller
43 #than N is not considered. The value should >=2.
44 READS_DEPTH_CUTOFF = 20
45
46 #Number of processes for this computation. Using more processes speeds up the computation,
47 #especially if you have a multi-core processor. If you have N cores avalible for the
48 #computation, it‘s better to set this value in the range of N to 2*N.
49 #If comment out or leave blank, 1 is used.
50 NUM_OF_CORE = 4
51
52 #Outputfolder. If not specified, use the current working directory. Please make sure that
53 #you have enough disk space for the folder, otherwise the pipeline may fail.
54 OUTFOLDER = spinach-result
55
56 #Absolute path of the folder that contains intermidate/temperary files during the
57 # run of the pipeline. If not specified, miR-PREFeR uses a folder with suffix "_tmp"
58 #under OUTFOLDER by default. Please make sure that you have enough disk space for the
59 # folder, otherwise the pipeline may fail.
60 #TMPFOLDER = /tmp/exmaple
61 TMPFOLDER =
62
63 #prefix for naming the output files. For portability, please DO NOT contain any
64 #spaces and special characters. The prefered includes ‘A-Z‘, ‘a-z‘, ‘0-9‘, and
65 #underscore ‘_‘.
66 NAME_PREFIX = spinach-example
67
68 #Maximum gap length between two contigs to form a candidate region.
69 MAX_GAP = 100
70
71 # Minimum and maximum length of the mature sequence. Default values are 18 and 24.
72 MIN_MATURE_LEN = 18
73 MAX_MATURE_LEN = 24
74
75 # If this is ‘Y‘, then the criteria that requries the star sequence must be expressed
76 # is loosed if the expression pattern is good enough (.e.g. the majority of the reads
77 # mapped to the region are mapped to the mature position.). There are lots of miRNAs
78 # which do not have star sequence expression. The default value is Y.
79 ALLOW_NO_STAR_EXPRESSION = Y
80
81 # In most cases, the mature star duplex has 2nt 3‘ overhangs. If this is set to ‘Y‘, then
82 # 3nt overhangs are allowed. Default is ‘N‘.
83 ALLOW_3NT_OVERHANG = N
84
85 #The pipeline makes a checkpoint after each major step. In addition, because the
86 #folding stage is the most time consuming stage, it makes a checkpiont for each
87 #folding process after folding every CHECKPOINT_SIZE sequences. If the pipeline
88 #is killed for some reason in the middle of folding, it can be restarted using
89 #‘recover‘ command from where it was stopped. The default value is 3000. On my
90 #system this means making a checkpoint about every 5 minutes.
91 CHECKPOINT_SIZE = 3000

运行

1 python miR_PREFeR.py -L -k pipeline configfile

pipeline里包含prepare, candidate, fold, predict四步。如果某步中断了，还可以续跑

1 python miR_PREFeR.py -L recover configfile

输出结果

根据example_miRNA.detail.csv 文件写一个脚本提取每个miRNA的reads 数量，进而做差异分析

差异分析

差异分析采用DESeq2，可看我之前写的miRAN 分析以及mRNA分析

ref

1、计算已知miRNA当表达量

2、省心省事的植物miRNA分析软件miR-PREFeR,值得拥有

原文地址：https://www.cnblogs.com/zhanmaomao/p/12150502.html

时间： 2024-10-08 01:12:45

miRNA分析--比对（二）的相关文章

miRNA分析--靶基因预测（三）

miRNA分析--数据过滤(一) miRNA分析--比对(二) 根据miRNA Target Prediction in Plants, miRNA并非所有区域都要求严格匹配,其中第1位碱基和第14位以后的碱基是允许错配(以miRNA 5'为始). miRNA 文件提交不管是U或者T都是可以的 miRNA 靶基因预测我采用了3个工具 1.psRNATarget: A plant Samll RNA Target Analysis Server 进去以后,提交自己miRNA文件,fasta格式,并

linux程序分析工具介绍(二)—-ldd,nm

本文要介绍的ldd和nm是linux下,两个用来分析程序很实用的工具.ldd是用来分析程序运行时需要依赖的动态库的工具:nm是用来查看指定程序中的符号表相关内容的工具.下面通过例子,分别来介绍一下这两个工具: 1. ldd, 先看下面的例子, 用ldd查看cs程序所依赖的动态库: [email protected]:~/Public$ ldd cs linux-gate.so.1 => (0xffffe000) libz.so.1 => /lib/libz.so.1 (0xb7f8c000)

Android 4.2 Bluetooth 分析总结（二）蓝牙enable 的整个过程

转载请标明出处:Android 4.2 Bluetooth 分析总结(二) 蓝牙enable 的整个过程现在开始我们分析 Android4.2 Bluetooth 打开的整个过程,由于是新手,难免有很多错误,记录只是为了以后方便查找,如发错误敬请指出. 我们整个分析过程有可能有点繁琐,但请仔细阅读,读完之后必然发现还是会有一点点收获的,虽然写的不好.搜先我们上一份enable 打开蓝牙整个过程的打印:然后我们跟踪打印来窥探 Android4.2Bluetooth 工作的流程. D/Blueto

iOS Crash 分析（文二）-崩溃日志组成

iOS Crash 分析(文二)-崩溃日志组成现在我们看一个淘宝iOS主客崩溃的例子: ### 1.进程信息 ### Incident Identifier: E4201F10-6F5F-40F9-B938-BB3DA8ED7D50 CrashReporter Key: TODO Hardware Model: iPhone4,1 Process: Taobao4iPhone [3538] Path: /var/mobile/Applications/E3B51E77-D44D-4B3E-87

"别踩白块儿"游戏源代码分析和下载(二)

四.游戏交互实现 1.前面已经介绍在 Block 类实现了每个block的触碰监听,block 实现触碰监听,当按下时,调起在GameScene中实现的touchBlock方法.下面来看改方法的实. /** * 点击到Block时进行的逻辑处理 * * @param pBlock * 所点击的block */ public void touchBlock(Block pBlock) { if (gameStatus == ConstantUtil.GAME_START) {

Hadoop源代码分析（三二）

搞定ClientProtocol,接下来是DatanodeProtocol部分.接口如下: public DatanodeRegistration register(DatanodeRegistration nodeReg ) throws IOException 用亍DataNode向NameNode登记.输入和输出参数都是DatanodeRegistration,类图如下: 前面讨论DataNode的时候,我们已绊讲过了DataNode的注册过程,我们来看NameNode的过程.下面是主要步

Python 爬虫知识点 - 淘宝商品检索结果抓包分析（续二）

一.URL分析通过对“Python机器学习”结果抓包分析,有两个无规律的参数:_ksTS和callback.通过构建如下URL可以获得目标关键词的检索结果,如下所示: https://s.taobao.com/search?data-key=s&data-value=44&ajax=true&_ksTS=1482325509866_2527&callback=jsonp2528&q=Python机器学习&imgfile=&js=1&stat

车道检测源码分析系列（二）

本节分析一个国人开发的简单的车道检测程序(不涉及跟踪) simple_lane_tracking 源码地址作者主页概述采用opencv2编写 C++ 算法核心步骤提取车道标记特征,封装在laneExtraction类中车道建模(两条边,单车道),封装在laneModeling类中对于断断续续的斑点车道标记(虚线)使用RANSAC匹配直线,对每幅图像,检测结果可能是感兴趣的左右车道都检测到或者全都没检测到主程序框架 track.cpp 主程序依次读入源文件夹下的图片,进行处理后输出到

java B2B2C电子商务平台分析之十二-----Spring Cloud Sleuth

一.简介 Spring Cloud Sleuth 主要功能就是在分布式系统中提供追踪解决方案,并且兼容支持了 zipkin,你只需要在pom文件中引入相应的依赖即可.愿意了解源码的朋友直接求求交流分享技术:二一四七七七五六三三二.服务追踪分析微服务架构上通过业务来划分服务的,通过REST调用,对外暴露的一个接口,可能需要很多个服务协同才能完成这个接口功能,如果链路上任何一个服务出现问题或者网络超时,都会形成导致接口调用失败.随着业务的不断扩张,服务之间互相调用会越来越复杂. 三.术语 Spr