pyvcf 中带的一个工具 比其他工具用着好些 其他filter我很信不过~~ 自己写的功能又很有限 所以转投vcf_filter.py啦
Filtering a VCF file based on some properties of interest is a common enough operation that PyVCF offers an extensible script. vcf_filter.pydoes the work of reading input, updating the metadata and filtering the records.
usage: vcf_filter.py [-h] [--no-short-circuit] [--no-filtered]
[--output OUTPUT] [--local-script LOCAL_SCRIPT]
input filter [filter_args] [filter [filter_args]] ...
Filter a VCF file
positional arguments:
input File to process (use - for STDIN) (default: None)
optional arguments:
-h, --help Show this help message and exit. (default: False)
--no-short-circuit Do not stop filter processing on a site if any filter
is triggered (default: False)
--output OUTPUT Filename to output [STDOUT] (default: <open file
‘<stdout>‘, mode ‘w‘ at 0x2b0f9435c150>)
--no-filtered Output only sites passing the filters (default: False)
--local-script LOCAL_SCRIPT
Python file in current working directory with the
filter classes (default: None)
sq:
Filter low quailty sites
--site-quality SITE_QUALITY
Filter sites below this quality (default: 30)
dps:
Threshold read depth per sample
--depth-per-sample DEPTH_PER_SAMPLE
Minimum required coverage in each sample (default: 5)
avg-dps:
Threshold average read depth per sample (read_depth / sample_count)
--avg-depth-per-sample AVG_DEPTH_PER_SAMPLE
Minimum required average coverage per sample (default:
3)
eb:
Filter sites that look like correlated sequencing errors. Some sequencing
technologies, notably pyrosequencing, produce mutation hotspots where
there is a constant level of noise, producing some reference and some
heterozygote calls. This filter computes a Bayes Factor for each site by
comparing the binomial likelihood of the observed allelic depths under: *
A model with constant error equal to the MAF. * A model where each sample
is the ploidy reported by the caller. The test value is the log of the
bayes factor. Higher values are more likely to be errors. Note: this
filter requires rpy2
--eblr EBLR Filter sites above this error log odds ratio (default:
-10)
snp-only:
Choose only SNP variants
mgq:
Filters sites with only low quality variants. It is possible to have a
high site quality with many low quality calls. This filter demands at
least one call be above a threshold quality.
--genotype-quality GENOTYPE_QUALITY
Filter sites with no genotypes above this quality
(default: 50)
懒得翻译 自己看吧
by freemao
FAFU
[email protected]
vcf_filter.py