augustus, gene prediction, trainning

做基因组注释

先用augustus训练，然后再用maker

网址：

http://bioinf.uni-greifswald.de/augustus/

可以在线分析

也可以本地。

在线训练网址：

http://bioinf.uni-greifswald.de/webaugustus/training/create

You have to give a species name（不能有空格！）, and a genome file!

关于参考基因组和cDNA fasta文件的head要求：

no whitespaces in the headers
no special characters in the headers (e.g. !#@&|;)
make the headers as short as possible
let headers not start with a number but with a letter
let headers contain letters and numbers, only

In the following we give some header examples that will not cause problems:

>entry1
>contig1000
>est20
>scaffold239

详细的在线训练指导：

http://bioinf.uni-greifswald.de/webaugustus/trainingtutorial.gsp

如果在线训练基因组大小和cDNA大小均不能超过100M，所以菌类可以用。植物的话还是用本地训练：

/share/bioinfo/zhangxt/software/augustus-2.4/scripts/autoAug.pl --species=Carya --genome=../Carya.fa --cdna=../Carya_400cDNA.fa

如果提示没有加环境变量：

vi ~/.bash_profile

export AUGUSTUS_CONFIG_PATH=/share/bioinfo/zhangxt/software/augustus-2.4/config

source ~/.bash_profile

freemao

FAFU

时间： 2024-10-05 06:11:28

augustus, gene prediction, trainning的相关文章

Augustus 进行基因注释

目前的从头预测软件大多是基于HMM(隐马尔科夫链)和贝叶斯理论,通过已有物种的注释信息对软件进行训练,从训练结果中去推断一段基因序列中可能的结构,在这方面做的最好的工具是AUGUSTUS它可以仅使用序列信息进行预测,也可以整合EST, cDNA, RNA-seq数据作为先验模型进行预测. 安装安装较为复杂,可选用conda进行安装使用 (1)若存在已经被训练的物种(augustus --species=help查看),则直接使用一下代码进行预测基因,以拟南芥为例: 1 augustus --

maker 2008年发表在genome Res

简单好用 identify repeats, to align ESTs and proteins to the genome, and to automatically synthesize these data into feature-rich gene annotations, including alternative splicing and UTRs, as well as attributes such as evidence trails, and confidence mea

21 、GPD PSL

1.Variant Call Format(VCF) Example ##fileformat=VCFv4.0 ##fileDate=20110705 ##reference=1000GenomesPilot-NCBI37 ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type

条件随机场介绍（7）—— An Introduction to Conditional Random Fields

参考文献 [1] S.M.AjiandR.J.McEliece,"Thegeneralizeddistributivelaw,"IEEETrans- actions on Information Theory, vol. 46, no. 2, pp. 325–343, 2000. [2] Y.Altun,I.Tsochantaridis,andT.Hofmann,"HiddenMarkovsupportvector machines," in Internation

snap

1.snap的下载与安装 snap的说明文档: /home/share/biosoft/snap/00README 下载: wget http://korflab.ucdavis.edu/Software/snap-2013-11-29.tar.gz 文件说明: DNA Contains some sample sequences HMM Contains SNAP parameter files LICENSE The GNU General Public License Makefile F

计算Gene co-expression features

Gene co-expression features 下载 co-expression 数据 The following co-expression coefficient features were attained from COXPRESdb. http://coxpresdb.jp/download.shtml 打开这个页面我们点击bulk download 然后我们下载budding yeast 文件. 在最下面我们也可以看到文件格式的说明 Under the directory n

POJ 1080 Human Gene Functions（LCS)

Description It is well known that a human gene can be considered as a sequence, consisting of four nucleotides, which are simply denoted by four letters, A, C, G, and T. Biologists have been interested in identifying human genes and determining their

学习文章题目-Transfer learning for cross-company software defect prediction

所选主题:缺陷预测论文题目: 1. Using class imbalance learning for software defect prediction 或 2.Transfer learning for cross-company software defect prediction 作者: 1. Wang Shuo, Yao Xin 2. Ying Ma, Guangchun Luo, Xue Zeng, Aiguo Chen 期刊: 1. IEEE transactions on

Intra Luma Prediction

Intra Luma Prediction 在宏块的帧内预测过程中,有四种宏块类型:I_4x4,I_8x8,I16x16,I_PCM.他们都需要在相邻块做去块滤波之前进行帧内预测. 下面为亮度帧内预测的总体流程 1-4获取当前block的帧内预测模式的预测,5-7获得最佳预测模式并对应预测模式的预测做后续处理首先需要获得当前4x4(8x8)预测块有左.上的4x4(8x8)相邻块A.B,假设其所在宏块为mbAddrA.mbAddrB. 如果mbAddrA或mbAddrB中任意一个宏块不可用于帧内