allpaths 使用

软件下载与说明:http://www.broadinstitute.org/software/allpaths-lg/blog/?page_id=12

原始数据的深度要达到100以上。

至少要两个库,大库和小库,

小库的一对reads要有重叠部分。并且小库的插入片段大小分布差异要在20%以内。

大库插入片段要接近3000,并且长度分布可以有较大的差异。

ALLPATHS‐LG requires a minimum of 2 paired‐end libraries – one short and one long. The short library average separation size must be slightly less than twice the read size, such that the reads from a pair will likely overlap – for example, for 100 base reads the insert size should be 180 bases. The distribution of sizes should be as small as possible, with a standard deviation of less than 20%. The long library insert size should be approximately 3000 bases long and can have a larger size distribution. Additional optional longer insert libraries can be used to help disambiguate larger repeat structures and may be generated at lower coverage.

A fragment library is a library with a short insert separation, less than twice the read length, so that the reads may overlap (e.g., 100bp Illumina reads taken from 180bp inserts.) A jumping library has a longer separation, typically in the 3kbp‐10kbp range, and may include sheared or EcoP15I libraries or other jumping‐library construction; ALLPATHS can handle read chimerism in jumping libraries. Note that fragment reads should be long enough to ensure the overlap.

A fragment library is a library with a short insert separation, less than twice the read length, so that the reads may overlap (e.g., 100bp Illumina reads taken from 180bp inserts.) A jumping library has a longer separation, typically in the 3kbp‐10kbp range, and may include sheared or EcoP15I libraries or other jumping‐library construction; ALLPATHS can handle read chimerism in jumping libraries. Note that fragment reads should be long enough to ensure the overlap.

现在也可以加入pacbio数据,但是只针对真菌基因组。

如果你有reference genome就提供

allpaths需要的输入文件:

1, DATA directory里的base,quality score, and pairing information files.like that:

<REF>/<DATA>/frag_reads_orig.fastb
<REF>/<DATA>/frag_reads_orig.qualb
<REF>/<DATA>/frag_reads_orig.pairs
<REF>/<DATA>/jump_reads_orig.fastb
<REF>/<DATA>/jump_reads_orig.qualb
<REF>/<DATA>/jump_reads_orig.pairs

2, a ploidy file must also be present.the file polidy file is a single-line file containing a number.The specfic file name is :

<REF>/<DATA>/ploidy

如何产生这些输入文件呢:

用自带的perl脚本: PrepareALLPaths.pl。用这个脚本需要提供两个配置文件:

in_groups.csv和 in_libs.csv。

csv的意思是: comma-separated-values

首先来看in_groups.csv文件:

group_name: a UNIQUE nickname for this specific data set.
library_name: the library to which the data set belongs.
file_name: the absolute path to the data file.

再看in_libs.csv文件:这个文件是描述你的library的。

library_name: matches the same field in in_groups.csv.
project_name: a string naming the project.
organism_name: the organism.
type: fragment, jumping, EcoP15, etc. This field is only informative.
paired: 0: Unpaired reads; 1: paired reads.
frag_size: average number of bases in the fragments (only defined for FRAGMENT libraries).
frag_stddev: estimated standard deviation of the fragments sizes (only defined for FRAGMENT libraries).
insert_size: average number of bases in the inserts (only defined for JUMPING libraries; if larger than 20 kb, the library is considered to be a LONG JUMPING library).
insert_stddev: estimated standard deviation of the inserts sizes (only defined for JUMPING libraries).
read_orientation: inward or outward. Outward oriented reads will be reversed.
genomic_start: index of the FIRST genomic base in the reads. If non‐zero, all the bases before genomic_start will be trimmed out.
genomic_end: index of the LAST genomic base in the reads. If non‐zero, all the bases after genomic_end will be trimmed out.

这两个文件准备好以后,就可以run 这个perl脚本了

PrepareALLPathsInputs.pl \

DATA_DIR=‘full path to REFERENCE DIR‘/mydata \

PICARD_TOOLS_DIR=‘path to picard tools‘ \

IN_GROUPS_CSV=in_groups.csv \

IN_LIBS_CSV=in_libs.csv \

INCLUDE_NON_PF_READS=0 \

PHRED_64 = 0 \

PLOIDY = 2 \

DATA_DIR: is the location of the ALLPATHS DATA directory where the converted reads will be placed.

PICARD_TOOLS_DIR: is the path to the Picard tools needed for data conversion, if your data is in BAM format.

IN_GROUPS_CSV: 编辑的in_groups.csv的文件位置,如果在当前目录,可以不写。

IN_LIBS_CSV: 编辑的in_libs.csv的文件位置,如果在当前目录,可以不写。

INCLUDE_NON_PF_READS: 0代表只有paired end reads, 1表示含有non_PF reads

PHRED_64: 0表示碱基质量是按照phred_33,1表示碱基质量按照phred_64

PLOIDY: 产生polidy文件,是单倍体就是1,二倍体是2.

这个脚本执行完后,ALLPATHS所需的输入文件就都准备好了~

下一步,运行ALLPATHS,

RunAllPathsLG
PRE=<user pre>
DATA_SUBDIR=mydata
RUN=myrun
REFERENCE_NAME=staph
TARGETS=standard

This will create (if it doesn’t already exist) the following pipeline directory structure:
<user pre>/staph/mydata/myrun

Where staph is the REFERENCE directory, mydata is the DATA directory containing the imported data, and myrun is the RUN directory.

实际使用时的命令:

freemao

FAFU

时间: 2024-10-09 22:01:53

allpaths 使用的相关文章

二叉树中所有的路径(从根节点到叶子结点)

1 import java.util.ArrayList; 2 3 /** 4 * 寻找最短的二叉搜索的路径,从根节点到叶子结点 5 * 6 * @author jinfeng 7 * 8 */ 9 public class FindShortestBTPath { 10 11 // 用来记录所有的路径 12 private ArrayList<ArrayList<Integer>> allPaths = new ArrayList<ArrayList<Integer&

涂鸦-每次调setNeedsDisplay以后就会重新调用一次drawRect方法,每次调drawRect方法就会把之前画好的东西删掉

////  WJView.m//  zwj涂鸦////  Created by zwj on 14-9-9.//  Copyright (c) 2014年 zwj. All rights reserved.// #import "WJView.h" @interface WJView()@property(nonatomic,strong) NSMutableArray *allPaths;@end @implementation WJView - (void)backto{    [

Postgres中的锁

postgres开发实践中遇到一个问题: 1) A用户在执行一条语句 2) B用户执行语句查询相同的表 会发现B用户始终处于等待状态. 跟踪会发现A进程等待时进程堆栈如下: #0 LockAcquire (locktag=0x7fffa3ad28f0, lockmode=1, sessionLock=0 '\000', dontWait=0 '\000') at /home/hl/uda/build/../collocated_join/src/backend/storage/lmgr/lock

Entity Framework技巧系列之七 - Tip 26 – 28

提示26. 怎样避免使用不完整(Stub)实体进行数据库查询 什么是不完整(Stub)实体? 不完整实体是一个部分填充实体,用于替代真实的对象. 例如: 1 Category c = new Category {ID = 5}; 就是一个不完整实体. 这个实体中只有ID被填充,表示这是一个代表Category 5的Stub. Stub实体什么时候有用? 当你真正不需要知道一个实体的一切对象时,Stub实体就很有用,主要因为通过使用这种实体你可以避免不必要的查询,但也因为它们比EntityKey更

threejs 组成的3d管道,寻最短路径问题

threejs 里面的3d管道的每个节点ID是唯一的,且对应x,y,z坐标.那么当需要从A点到B点的时候,可能出现有多条路径可走,此时便需要求出最短行走路径,因此用到一个寻路径算法.我们将问题简化如下: var begId = 191; //起点ID var endId = 185; //终点ID //所有路径,不区分开始和结束节点的前后顺序 var allPaths = [[185,184],[186,185],[187,186],[188,187],[189,187],[191,189]];

Cpp 二叉树

#include<vector> #include<iostream> using namespace std; //二叉树的一个节点结构 struct BinaryTreeNode { int val; BinaryTreeNode *left; BinaryTreeNode *right; BinaryTreeNode(int x) : val(x), left(NULL), right(NULL) {} }; //使用递归的方法创建二叉树 struct BinaryTreeN

基于JGraphT实现的路径探寻

基于JGraphT实现的路径探寻 业务中提出基于内存,探寻的两点间的有向以及无向路径,多点间的最小子图等需求,以下记录使用JGraphT的实现过程. GraphT是免费的Java类库,提供数学图论对象和算法,本文只涉及路径探寻中的部分内容. 图实例简介 以下资料来源graph-structures 可用图概览 图类 边方向 自环 顶点对间多边 加权 SimpleGraph undirected no no no Multigraph undirected no yes no Pseudograp