利用 freebayes call SNP

1，软件介绍

FreeBayes
is a Bayesian
genetic variant detector designed to find small polymorphisms, specifically SNPs
(single-nucleotide polymorphisms), indels (insertions and deletions),

MNPs (multi-nucleotide polymorphisms), and complex events
(composite insertion and substitution events) smaller than the length of a
short-read sequencing alignment.

FreeBayes is haplotype-based, in the sense that it calls variants
based on the literal sequences of reads aligned to a particular target, not
their precise alignment.

This model is a straightforward generalization of previous ones (e.g.
PolyBayes, samtools, GATK) which detect or report variants based on
alignments.

This method avoids one of the core problems with alignment-based variant
detection--- that identical sequences may have multiple possible alignments.

2，安装与使用

服务器已经安装好软件，下面举例说明用法，

参考序列为：

RAP_cDAN.fasta

bam文件为：

LCS173-1.sorted.rmp.rg.bam

LCS173-2.sorted.rmp.rg.bam

LCS173-3.sorted.rmp.rg.bam

RCS173-1.sorted.rmp.rg.bam

RCS173-2.sorted.rmp.rg.bam

RCS173-3.sorted.rmp.rg.bam

这是样本CS173的六组数据，每组bam数据都经过了sort, remove duplicate, add read group
步骤，这几个前期处理数据步骤是必须的。对bam文件的处理

详见本博客其他文章。除了这几步以为，还要对bam文件进行index,即每个bam文件都要有相应的bai文件。数据处理好后，就可以进行call
snp了。

命令：

freebayes -f RAP_cDAN.fasta -L bamfile.list > CS173.vcf

其中，-f 后跟参考序列。-L 后面跟纯文本文件，里面就是bam文件的名字，因为文件较多，所以将名字都放到一个文件中。如果你只有一个文件，那直接将

文件名放于参考序列后面就可以了。 > 是进行重定向，即将结果保存到CS173.vcf文件中。

也可以同时call 不同的样本，只要将不同的样本数据放到bam.list文件中就可以了，注意，不同样本数据的read
group必须唯一，不能重复。

当命令结束后，就会得到结果文件，文件中记录了snp的信息。详细的vcf格式说明另见博客文章。

这是最简单的使用方法，还有一些参数，如果感兴趣的话可以调试，如-C ，默认值为2，即变异位点处至少有两个和参考碱基不同，才会考虑是snp位点。

假如，此某位点的覆盖度为10，但之后一个和参考碱基不同，剩下9个是一样的，那么这个位点不会被call出。

另一个参数-F 默认为0.2
,即至少有20%的碱基和参考碱基不同才能被call出。假如某位点深度为100，但是只有10个碱基和参考序列不同，是不会被call出的。

这主要是考虑到了测序错误，比对错误等原因，才设置下限，目的是提高call
snp的准确性。但是要根据你自己的数据情况，假如你用RNA-seq数据研究等位基因特异性表达，

某些位点也许表达量很多，深度达到了1000，其中有100个碱基和参考序列有差别，如果你用默认设置，此时并不能将snp call
出，但这可能是一个很重要的特异表达位点。

所以要根据自己的使用目的来设置参数。

3，更多的例子。

examples:

# call variants assuming a diploid sample

freebayes -f ref.fa aln.bam >var.vcf

# require at least 5 supporting observations to consider a
variant

freebayes -f ref.fa -C 5 aln.bam
>var.vcf

# use a different ploidy

freebayes -f ref.fa -p 4 aln.bam
>var.vcf

# assume a pooled sample with a known number of genome
copies

freebayes -f ref.fa -p 20
--pooled-discrete aln.bam >var.vcf

# generate frequency-based calls for all variants passing
input thresholds

freebayes -f ref.fa -F 0.01 -C 1
--pooled-continuous aln.bam >var.vcf

# use an input VCF (bgzipped + tabix indexed) to force
calls at particular alleles

freebayes -f ref.fa [email protected] in.vcf.gz aln.bam
>var.vcf

# generate long haplotype calls over known variants

freebayes -f ref.fa --haplotype-basis-alleles
in.vcf.gz \ --haplotype-length 50 aln.bam

# naive variant calling: simply annotate observation
counts of SNPs and indels

freebayes -f ref.fa --haplotype-length 0
--min-alternate-count 1 \ --min-alternate-fraction 0
--pooled-continuous --report-monomorphic >var.vcf

3，进一步阅读

https://github.com/ekg/freebayes

在装有freebayes的服务器中敲击 freebayes -h

By freemao

FAFU.

[email protected]

时间： 2024-10-29 10:47:49

利用 freebayes call SNP

利用 freebayes call SNP的相关文章

互信息应用于SNP特征选择的局限

全基因组重测序基础及高级分析知识汇总

利用千人基因组数据库查看SNP在不同地区、国家、洲的频率及个数

如何用 samtools 和 bcftools call snp

GATK的BaseRecalibration程序在无标准SNP数据库情况下，是否还能进行BaseRecalibration?

利用MATLAB进行曲线拟合

利用 Django REST framework 编写 RESTful API

LR如何利用siteScope监控MySQL性能

利用PHP实现登录与注册功能以及使用PHP读取mysql数据库——以表格形式显示数据