Filter FASTA files

Use a regular expression for filtering sequences by id from a FASTA file, e.g. just certain chromosomes from a genome. There are other tools as part of bigger packages to install (and no regex support), mostly awk-based awkward (sorry for the pun) bash solutions, and scripts using packages that one needs to install and with still no support for regular expressions. This however is a simple, straightforward little python script for a simple task. It doesn’t do anything else and doesn’t need anything but a stock python installation. Based on the FASTA reader snippet.

Download here.

Usage:

python FASTAfilter.py [-h] regex infile outfile

From a FASTA-file with multiple >entries, filter by sequence ids using a
regex.

positional arguments:
regex Regex to filter entry ids, e.g. ‘chr[1-4]’. Note that the id does not contain the initial > character.
infile A FASTA input file, usually with multiple entries.
outfile The new file with only the matching entries.

optional arguments:
-h, –help show this help message and exit

INSTALL:

cd /data/software
wget http://dm516.user.srcf.net/fastafilter/FASTAfilter.zip
unzip FASTAfilter.zip
easy_install argparse

USAGE:

python FASTAfilter.py   [1-9,10,11,12,13,14,15,16,17,18,X]  \
/dat2/INPUT.fa \
/dat2/OUTPUT.fa

Error:

Traceback (most recent call last):
  File "FASTAfilter.py", line 3, in <module>
    import argparse
ImportError: No module named argparse

Solution:

run "easy_install argparse" as root user.

http://dm516.user.srcf.net/?p=314

时间: 2024-10-29 15:53:19

Filter FASTA files的相关文章

OpenFileDialog.Filter 属性

如果 Filter 属性为 Empty,将显示所有文件. 始终显示文件夹. Filter 由以下部分组成:筛选器说明,后跟竖线 (|) 和筛选模式. 筛选器可以指定一个或多个文件类型. 说明描述了对话框中显示的文件的类型. 尽管说明可以是任意字符串,但它通常由以下部分组成:筛选器中包括的文件的类型,后跟其中包含与该说明关联的扩展名的括号. 筛选器说明显示在对话框的下拉列表中. 下面是一个筛选器说明的示例. My Files (*.my) 筛选模式确定对话框显示哪些文件. 相同说明的筛选模式由分号

fastax-toolkit使用详解

FASTX-Toolkit是一款用于处理Short-Reads FASTA/FASTQ文件的程序,里面包含了丰富的FASTA/FASTQ文件格式转换.统计等命令.软件下载地址:http://hannonlab.cshl.edu/fastx_toolkit/download.html 下面是其功能介绍: FASTQ-to-FASTA converter (FASTQ转换成Fasta):Convert FASTQ files to FASTA files. 命令:usage: fastq_to_fa

Falcon Genome Assembly Tool Kit Manual

Falcon Falcon: a set of tools for fast aligning long reads for consensus and assembly The Falcon tool kit is a set of simple code collection which I use for studying efficient assembly algorithm for haploid and diploid genomes. It has some back-end c

&amp;lt;二代測序&amp;gt; 下载 NCBI sra 文件

本文近期更新地址: http://blog.csdn.net/tanzuozhev/article/details/51077222 随着測序技术的不断提高.二代測序数据成指数增长. NCBI提供了SRA数据库存储这些数据. http://www.ncbi.nlm.nih.gov/sra 为了方便更好的分析这些数据,NCBI提供了下载的命令行工具:sra-toolkit. 包含下面命令: 官方文档: http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi

定时器(Quartz)快速入门

Quartz概述 Quartz中的触发器 Quartz中提供了两种触发器,分别是CronTrigger和SimpleTrigger. SimpleTrigger 每 隔若干毫秒来触发纳入进度的任务.因此,对于夏令时来说,根本不需要做任何特殊的处理来"保持进度".它只是简单地保持每隔若干毫秒来触发一次,无论你的 SimpleTrigger每隔10秒触发一次还是每隔15分钟触发一次,还是每隔24小时触发一次. CronTrigger 在特定"格林日历"时刻触发纳入进程的

8) pom.xml

http://maven.apache.org/xsd/maven-4.0.0.xsd <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://maven.apache.org/POM/4.0.0" elementFormDefault="qualified" targetNamespace="http://maven.apache.org/POM/

格式化xml

打开xml: string xmlstring = ""; private void button1_Click(object sender, EventArgs e) { OpenFileDialog dialog = new OpenFileDialog(); dialog.InitialDirectory = Application.StartupPath; dialog.Filter = "All Files|*.*|xml file(*.xml)|*.xml&quo

去除文本多余空行

1.读取文件: OpenFileDialog dialog=new OpenFileDialog(); dialog.InitialDirectory = Application.StartupPath; dialog.Filter = "All Files|*.*|text file(*.txt)|*.txt"; dialog.RestoreDirectory = true; if (dialog.ShowDialog() == DialogResult.OK) { string f

基于C#的单元测试(VS2015)

这次来联系怎么用VS2015来进行C#代码的单元测试管理,首先,正好上次写了一个C#的WordCount程序,就用它来进行单元测试联系吧. 首先,根据VS2015的提示,仅支持在共有类或共有方法中支持创建单元测试.所以,如果我们要测试私有或是保护的类和方法,是要先将他们暂时设定成公有类型. 在VS2015中创建单元测试非常简单,只要在我们想测试的地方点击右键,就会出现 “创建单元测试” 选项. 如果发现菜单没有显示,可以参照这篇博客进行设置.http://www.bubuko.com/infod