How To Use Coordinates To Extract Sequences In Fasta File

[1] bedtools (https://github.com/arq5x/bedtools2)

here is also bedtools (https://github.com/arq5x/bedtools2) getfasta. It uses Erik‘s code under the hood.

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10

$ bedtools getfasta -fi test.fa -bed test.bed -fo test.fa.out

$ cat test.fa.out
>chr1:5-10
AAACC

Docs: http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html

And it is wrapped in pybedtools as well: http://pythonhosted.org/pybedtools/autodocs/pybedtools.BedTool.sequence.html?highlight=fasta

https://code.google.com/p/bedtools/

[2] Samtools faidx feature

faidx samtools faidx <ref.fasta> [region1 [...]] Index reference sequence in the FASTA format or extract subsequence from indexed reference sequence. If no region is specified, faidx will index the file and create <ref.fasta>.fai on the disk. If regions are speficified, the subsequences will be retrieved and printed to stdout in the FASTA format.

You will have to first create the fasta indexes of the reference genome fasta file and then use this command.

[3] python implementation of faidx to GitHub.

https://github.com/mdshw5/pyfaidx

[4] UCSC twoBitToFa

use ucsc twoBitToFa in http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

see also http://genome.ucsc.edu/goldenPath/help/twoBit.html

[5] UCSC DAS

python script to fetch sequences from UCSC DAS server:
http://genome.ucsc.edu/cgi-bin/das/h...r4:35654,35695

[6] ensembl biomart

Ref:

https://www.biostars.org/p/81087/

http://stackoverflow.com/questions/23089388/a-fast-way-to-get-human-genome-sequence-by-coordinate

http://seqanswers.com/forums/showthread.php?t=42463

时间: 2024-10-08 14:09:30

How To Use Coordinates To Extract Sequences In Fasta File的相关文章

利用基因ID去gtf文件中查找基因相应的位置及正反义链并提取相应的序列

#!/usr/bin/env pythondef splic_seq_2(fa,r_id_,g_id_,position_1,position_2,strand):    import sys    import Anti_#   sequence_file= open(options.fasta_seq)    sequence_file=open(fa)    seq_line= sequence_file.readline()#   for seq_line in sequence_fil

Biopython - sequences and alphabets

The Sequence object Some examples will also require a working internet connection in order to run. >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("AGTACACTGGT", IUPAC.unambiguous_dna) &

监控视频相关数据集

BOSS dataset Website: Datasets are available here. Dataset: The BOSS project aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside. I

Run a local BLAST

In this context, ‘local’ means you are running BLAST on your own server, not at NCBI or anyone else’s server. This gives you the flexibility of comparing your query either against precomputed databases (like NR, Swissprot, trEMBL, etc.) or against a

maker 2008年发表在genome Res

简单好用 identify repeats, to align ESTs and proteins to the genome, and to automatically synthesize these data into feature-rich gene annotations, including alternative splicing and UTRs, as well as attributes such as evidence trails, and confidence mea

mysql之mysqldump、mysqlimport

一.引言 前一段在做一个csv的导入工具,最麻烦的部分就是对csv文件的解析,最后,老大提醒说是不是考虑的过于麻烦了,由于当时考虑到mysql是允许指定导出的csv文件的格式的,所以考虑到想要兼容这种方式,于是思路就麻烦了,考虑到一些特殊的符号,比如:数据中可能存在换行符,这样就无法按行读取了:所以,思路是读取一块数据,然后一个一个字符的解析数据:听了老大的提示,然后我就考虑是不是真的考虑麻烦了,实际环境中换行符处在数据中的情况是相当少见的,我就还从mysql入手,mysql导出csv是使用的语

Matlab中所有自定义的函数

Functions By Category | Alphabetical List Language Fundamentals Entering Commands ans Most recent answer clc Clear Command Window diary Save Command Window text to file format Set display format for output home Send cursor home iskeyword Determine wh

python第五节

一.定义模块: 模块:用来从逻辑上组织python代码(变量.函数.类.逻辑:实现一个功能),本质就是以.py结尾的python文件(文件名:test.py ,对应的模块名就是test) 包:用来从逻辑上组织模块的,本质就是一个目录(必须带有__init__.py的文件)二.导入方法: 1.import module_XP#命名为module_XP.py#需要导入的模块内容#!/usr/bin/env python# -*- coding: utf-8 -*-# Author :XPname =

Goldengate常用命令

1.Goldengate的起停 启动goldengate a> 启动goldengate时最好先从target节点开始,然后是source节点.否则data pump进程可能会由于没有收到target端的响应而异常退出. b> manager进程是其他进程的管理程序,需要先启动.如果manager配置参数中设置了AUTOSTART参数,则可由manager进程自动启动其他进程. 例如: log in target server: cd <$GG_HOME> ggsci GGSCI&