Extracting info from VCF files

R, Bioconductor

filterVcf: Extract Variants of Interest from a Large VCF File (Paul Shannon)

We demonstrate three methods:  filtering by genomic region,  filtering on attributes of
each specific variant call, and intersecting with known regions of interest (exons, splice
sites, regulatory regions, etc.).

http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/filterVcf.pdf

Java

SelectVariants -- Select a subset of variants from a larger callset ( GATK SelectVariants )

Often, a VCF containing many samples and/or variants will need to be subset in order to facilitate certain analyses (e.g. comparing and contrasting cases vs. controls; extracting variant or non-variant loci that meet certain requirements, displaying just a few samples in a browser like IGV, etc.). SelectVariants can be used for this purpose.

https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

Biostars

Question: How To Split Multiple Samples In Vcf File Generated By Gatk?
I did variant calling using BWA + PiCard + GATK and have just got the filtered VCF files from GATK. In the process of running GATK, I used list of inputs (11 samples) and for most steps, I had only one output file for each step. Now, I got two VCF files (one for SNPs and the other is for indels), each of which contains 11 samples. I can see the names of the 11 samples in the header of vcf files, and each sample seems to have one column of data. So I am wondering how to split each VCF files into individual sample vcf files?

https://www.biostars.org/p/78929/

bcftools

for file in *.vcf*; do
  for sample in `bcftools view -h $file | grep "^#CHROM" | cut -f10-`; do
    bcftools view -c1 -Oz -s $sample -o ${file/.vcf*/.$sample.vcf.gz} $file
  done
done

https://www.biostars.org/p/12535/#115691

vcf-subset

vcf-subset -c S1 bigfile.vcf > S1.vcf

https://www.biostars.org/p/78929/

http://campagnelab.org/software/goby/reference-documentation/modes/vcf-subset/

REF:

http://samtools.github.io/hts-specs/VCFv4.2.pdf

时间: 2024-08-26 13:51:38

Extracting info from VCF files的相关文章

Variant Call Format(VCF)

Introduction Variant Call Format (VCF) is a text file format for storing marker and genotype data. This short tutorial describes how Variant Call Format encodes data for single nucleotide variants. Every VCF file has three parts in the following orde

【干货】国外程序员整理的 C++ 资源大全【转】

来自 https://github.com/fffaraz/awesome-cpp A curated list of awesome C/C++ frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff Standard Libraries C++ Standard Library - including STL Containers, STL Aglorithm, STL Functio

Windows平台CUDA开发之前的准备工作

CUDA是NVIDIA的GPU开发工具,目前在大规模并行计算领域有着广泛应用. windows平台上面的CUDA开发之前,最好去NVIDIA官网查看说明,然后下载相应的driver, ToolKits等等.如果你下载最新版本的CUDA7.0,里面其实已经包含了driver及Tool kits. 特别要注意:目标最高版本为CUDA7.0,仅支持64位系统(32位没法安装CUDA 7.0 Tool Kits),另外,VS编译平台最低要求是VS2010. So,那些依然用VC6或者VS2008的就别犹

VCFtools

The C++ executable module examples This page provides usage examples for the executable module. Extended documentation for all of the options can be found on the manual page. Running the program Getting basic file statistics Applying a filter Writing

Upgrade Ver 4.3.x from 4.2.x

级到遇到个小问题,解决细节记录如下. [[email protected] ~]$ gpmigrator /usr/local/greenplum-db-4.2.7.2 /usr/local/greenplum-db-4.3.3.1 20141020:10:29:05:005944 gpmigrator:wx60:gpadmin-[INFO]:-Beginning upgrade 20141020:10:29:05:005944 gpmigrator:wx60:gpadmin-[INFO]:-C

大版本升级额外步骤

这里的大版本是指版本号的第2位数字,比如4.2.7.3升级到4.3.3.1,从4.2升级到4.3,会出现无法启动问题,提示catalog版本不兼容. 而如果只是在第2位数字之内的相同版本的升级没有此类问题:最前面的那个版本数字号由于目前还没有出现对应版本,所以没有相关问题. [[email protected] ~]$ gpstart -a 20141019:12:40:51:004622 gpstart:gtlions60:gpadmin-[INFO]:-Starting gpstart wi

Gumshoe - Microsoft Code Coverage Test Toolset

Gumshoe - Microsoft Code Coverage Test Toolset 2014-07-16 What is Gumshoe? How to instrument a binary? How to collect data? How to vewi results? Gumshoe concpets Gumshoe Server What is Gumshoe? Top Gumshoe is a toolset for integrating code coverage i

【Linux命令】linux一次性解压多个.gz或者.tar.gz文件

原文:linux一次性解压多个.gz或者.tar.gz文件 解压多个压缩包 对于解压多个.gz文件的,用此命令: for gz in *.gz; do gunzip $gz; done 对于解压多个.tar.gz文件的,用下面命令: for tar in *.tar.gz; do tar xvf $tar; done 扩展:tar命令 tar [-] A --catenate --concatenate | c --create | d --diff --compare | --delete |

tar 命令man说明

TAR(1) User Commands TAR(1) NAME tar - manual page for tar 1.26 SYNOPSIS tar [OPTION...] [FILE]... DESCRIPTION GNU `tar' saves many files together into a single tape or disk archive, and can restore individual files from the archive. Note that this m