1、Variant Call Format(VCF)


##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">
##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">
##FILTER=<ID=q10,Description="Quality below 10">
##FILTER=<ID=s50,Description="Less than 50% of samples have data">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">
#CHROM POS    ID        REF  ALT     QUAL FILTER INFO                              FORMAT      Sample1        Sample2        Sample3
2      4370   rs6057    G    A       29   .      NS=2;DP=13;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:52,51 1|0:48:8:51,51 1/1:43:5:.,.
2      7330   .         T    A       3    q10    NS=5;DP=12;AF=0.017               GT:GQ:DP:HQ 0|0:46:3:58,50 0|1:3:5:65,3   0/0:41:3
2      110696 rs6055    A    G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4
2      130237 .         T    .       47   .      NS=2;DP=16;AA=T                   GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:56,51 0/0:61:2
2      134567 microsat1 GTCT G,GTACT 50   PASS   NS=2;DP=9;AA=G                    GT:GQ:DP    0/1:35:4       0/2:17:2       1/1:40:3
chr1    45796269        .       G       C
chr1    45797505        .       C       G
chr1    45798555        .       T       C
chr1    45798901        .       C       T
chr1    45805566        .       G       C
chr2    47703379        .       C       T
chr2    48010488        .       G       A
chr2    48030838        .       A       T
chr2    48032875        .       CTAT    -
chr2    48032937        .       T       C
chr2    48033273        .       TTTTTGTTTTAATTCCT       -
chr2    48033551        .       C       G
chr2    48033910        .       A       T
chr2    215632048       .       G       T
chr2    215632125       .       TT      -
chr2    215632155       .       T       C
chr2    215632192       .       G       A
chr2    215632255       .       CA      TG
chr2    215634055       .       C       T

2、Gene Predictions (Extended)(GPD)

The following definition is used for extended gene prediction tables. In alternative-splicing situations, each transcript has a row in this table. The refGene table is an example of the genePredExt format.

table genePredExt
"A gene prediction with some additional info."
    string name;        	"Name of gene (usually transcript_id from GTF)"
    string chrom;       	"Chromosome name"
    char[1] strand;     	"+ or - for strand"
    uint txStart;       	"Transcription start position"
    uint txEnd;         	"Transcription end position"
    uint cdsStart;      	"Coding region start"
    uint cdsEnd;        	"Coding region end"
    uint exonCount;     	"Number of exons"
    uint[exonCount] exonStarts; "Exon start positions"
    uint[exonCount] exonEnds;   "Exon end positions"
    int score;            	"Score"
    string name2;       	"Alternate name (e.g. gene_id from GTF)"
    string cdsStartStat; 	"enum(‘none‘,‘unk‘,‘incmpl‘,‘cmpl‘)"
    string cdsEndStat;   	"enum(‘none‘,‘unk‘,‘incmpl‘,‘cmpl‘)"
    lstring exonFrames; 	"Exon frame offsets {0,1,2}"

3、PSL format

PSL lines represent alignments, and are typically taken from files generated by BLAT or psLayout. See the BLAT documentation for more details. All of the following fields are required on each data line within a PSL file:

  1. matches - Number of bases that match that aren‘t repeats
  2. misMatches - Number of bases that don‘t match
  3. repMatches - Number of bases that match but are part of repeats
  4. nCount - Number of "N" bases
  5. qNumInsert - Number of inserts in query
  6. qBaseInsert - Number of bases inserted in query
  7. tNumInsert - Number of inserts in target
  8. tBaseInsert - Number of bases inserted in target
  9. strand - "+" or "-" for query strand. For translated alignments, second "+"or "-" is for genomic strand
  10. qName - Query sequence name
  11. qSize - Query sequence size
  12. qStart - Alignment start position in query
  13. qEnd - Alignment end position in query
  14. tName - Target sequence name
  15. tSize - Target sequence size
  16. tStart - Alignment start position in target
  17. tEnd - Alignment end position in target
  18. blockCount - Number of blocks in the alignment (a block contains no gaps)
  19. blockSizes - Comma-separated list of sizes of each block
  20. qStarts - Comma-separated list of starting positions of each block in query
  21. tStarts - Comma-separated list of starting positions of each block in target

时间: 2024-09-30 07:03:45

21 、GPD PSL的相关文章

17、如何对字符串进行左, 右, 居中对齐 18、如何去掉字符串中不需要的字符 19、如何读写文本文件 20、如何处理二进制文件 21、如何设置文件的缓冲

17.如何对字符串进行左, 右, 居中对齐 info = "GBK" print(info.ljust(20)) print(info.ljust(20,'#')) print(info.rjust(20,'#')) print(info.center(20,"#")) print(format(info,'<20')) print(format(info,'>20')) print(format(info,'^20')) result: GBK GBK


http://www.cnblogs.com/wupeiqi/articles/5237704.html http://www.cnblogs.com/wupeiqi/articles/5246483.html http://www.cnblogs.com/yuanchenqi/articles/5786089.html 基本配置 一 常用命令 django-admin startproject sitename python manage.py runserver python


/** *功能:使用while循环从标准输入(cin)中吧单词读入到string中.这是一个"无穷" * while循环,可以使用break语句中断(和退出程序).对于读入的单词用系列if语句吧 * 该单词"映射"为一个整数值,然后用该整数值作为一个switch语句的选择条件 * 的意义.同上判定那个单词是程序的结束标志,用文件输出啦测试程序 *时间:2014年8月15日08:22:17 *作者::cutter_point */ #include<iostre


点这里进入ABP系列文章总目录 基于DDD的现代ASP.NET开发框架--ABP系列之21.ABP展现层——Javascript函数库 ABP是“ASP.NET Boilerplate Project (ASP.NET样板项目)”的简称. ABP的官方网站:http://www.aspnetboilerplate.com ABP在Github上的开源项目:https://github.com/aspnetboilerplate ASP.NET Boilerplate的js库提供了一些让javas


21.自动装配[email protected]&@Inject Spring 还支持使用@Resource(JSR250)和@Inject(JSR330)[Java规范的注解] AutowiredAnnotationBeanPostProcessor 完成解析自动装配功能 21.1 @Resource 可以和@Autowired一样实现自动注入功能,默认是按照组件名称进行装配的. 没有能支持@Primary功能,没有支持@Autowired(required = false) 21.2 @In


第20章,JSON JSON(JavaScript Object Notation,JavaScript对象表示法),是JavaScript的一个严格的子集. JSON可表示一下三种类型值: 简单值:字符串,数值,布尔值,null,不支持js特殊值:undefined 对象:一组无序的键值对 数组:一组有序的值的列表 不支持变量,函数或对象实例 注:JSON的字符串必须使用双引号,这是与JavaScript字符串最大的区别 对象 { "name":"Nicholas"


一.基于HOST(宿主机)制作一个简单的可启动的linux 1.给目标磁盘分区 两个: 宿主机上:/dev/sdb1,/dev/sdb2 /dev/sdb1挂载至/mnt/boot,/dev/sdb2挂载至/mnt/sysroot 2.安装grub至目标磁盘 # grub-install --root-directory=/mnt /dev/sdb 3.复制内核和initrd文件 # cp /boot/vmlinz-VERSION /mnt/boot/vmlinuz # cp /boot/ini


一.三元表达式 格式:result=值1 if x<y else 值2 满足if条件result=值1,否则result=值2 >>> 3 if 3>2 else 10 >>> 3 if 3>4 else 10 >>> 3+2 if 3>0 else 3-1 >>> 3+2 if 3>0 and 3>4 else 3-1 二.列表解析 1 s='hello' 2 res=[i.upper() for


添加swap交换分区 SWAP即交换分区是一种类似于Windows系统虚拟内存的功能,将一部分硬盘空间虚拟成内存来使用,从而解决内存容量不足的情况,因为SWAP毕竟是用硬盘资源虚拟的,所以速度上比真实物理内存要慢很多,一般只有当真实物理内存耗尽时才会调用SWAP. 1.创建一个分区,看上篇文章,别着急w保存退出 修改分区的类型输入t: Command (m for help): tSelected partition 1 2.查看可用的分区类型,输入L: Hex code (type L to