Biopython - basics

Introduction

From the biopython website their goal is to “make it as easy as possible to use Python for bioinformatics by creating high-quality, reusable modules and scripts.” These modules use the biopython tutorial as a template for what you will learn here.  Here is a list of some of the most common data formats in computational biology that are supported by biopython.

Uses Note
Blast finds regions of local similarity between sequences
ClustalW multiple sequence alignment program
GenBank NCBI sequence database
PubMed and Medline Document database
ExPASy SIB resource portal (Enzyme and Prosite)
SCOP Structural Classification of Proteins (e.g. ‘dom’,’lin’)
UniGene computationally identifies transcripts from the same locus
SwissProt annotated and non-redundant protein sequence database

Some of the other principal functions of biopython.

  • A standard sequence class that deals with sequences, ids on sequences, and sequence features.
  • Tools for performing common operations on sequences, such as translation, transcription and weight calculations.
  • Code to perform classification of data using k Nearest Neighbors, Naive Bayes or Support Vector Machines.
  • Code for dealing with alignments, including a standard way to create and deal with substitution matrices.
  • Code making it easy to split up parallelizable tasks into separate processes.
  • GUI-based programs to do basic sequence manipulations, translations, BLASTing, etc.

Getting started

>>> import Bio
>>> Bio.__version__
‘1.58‘

Some examples will also require a working internet connection in order to run.

>>> from Bio.Seq import Seq
>>> my_seq = Seq("AGTACACTGGT")
>>> my_seq
Seq(‘AGTACACTGGT‘, Alphabet())
>>> aStringSeq = str(my_seq)
>>> aStringSeq
‘AGTACACTGGT‘
>>> my_seq_complement = my_seq.complement()
>>> my_seq_complement
Seq(‘TCATGTGACCA‘, Alphabet())
>>> my_seq_reverse = my_seq.reverse()
>>> my_seq_rc = my_seq.reverse_complement()
>>> my_seq_rc
Seq(‘ACCAGTGTACT‘, Alphabet())

There is so much more, but first before we get into it we should figure out how to get sequences in and out of python.

File download

FASTA formats are the standard format for storing sequence data.  Here is a little reminder about sequences.

Nucleic acid code Note Nucleic acid code Note
A adenosine K G/T (keto)
T thymidine M A/C (amino)
C cytidine R G/A (purine)
G guanine S G/C (strong)
N A/G/C/T (any) W A/T (weak)
U uridine B G/T/C
D G/A/T Y T/C (pyrimidine)
H A/C/T V G/C/A

Here is quickly a bit about how biopython works with sequences

>>> for seq_record in SeqIO.parse(os.path.join("data","ls_orchid.fasta"), "fasta"):
...     print seq_record.id
...     print repr(seq_record.seq)
...     print len(seq_record)
...
gi|2765658|emb|Z78533.1|CIZ78533
Seq(‘CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC‘, SingleLetterAlphabet())
740
时间: 2024-10-28 19:24:51

Biopython - basics的相关文章

[Java Basics] Stack, Heap, Constructor

Good about Java: friendly syntax, memory management[GC can collect unreferenced memory resources], object-oriented features, portability. Stack Stores method invocations, local variables(include object reference, but the object itself is still stored

Radio Basics for RFID

Radio Basics for RFID (2015/09/24 22:30:37) Radio Basics for RFID (2015/09/24 22:30:37) Radio Basics for RFID (2015/09/24 22:30:37) Radio Basics for RFID (2015/09/24 22:30:37) Radio Basics for RFID (2015/09/24 22:30:37) Radio Basics for RFID (2015/09

Basics of AngularJS

Basics of AngularJS: Part 1 By Gowtham Rajamanickam on Apr 09, 2015 AngularJS I have planned to start writing about AngularJS. I have begun to learn AngularJS a few months back, it's a great feature for website UI developers. It is also easy to learn

[DS Basics] List

1, LinkedList composed of one and one Node: [data][next]. [head] -> [data][next] -> [data][next] -> [data][next] -> [null]. Empty linkedList: head == null. V.S. Array DS: fast at insert/delete. [DS Basics] List,布布扣,bubuko.com

[DS Basics] Sorting

Time complexity: Binary search O(log2 n): i=0.   n elements:         ------------------- i=1.   n/2 elements:                   ---------- i=2.   n/4 elements:                          ----- ... i=i.    n/2^i elements:                        - 进行n/2^

[Java Basics] multi-threading

1, Interrupt Interruption in Java is not pre-emptive. Put another way both threads have to cooperate in order to process the interrupt properly. If the target thread does not poll the interrupted status the interrupt is effectively ignored. Polling o

[Camel Basics]

Define routes: Either using Spring xml or Java DSL. Spring xml: <camelContext> <routeBuilder ref="myBuilder" />   //to load the Java DSL routes defined in MyRouteBuilder class <routeContextRef> //to load the routes in <route

Linux Basics 正则表达式 grep

grep全称是:Global search Regular Expression and Printing全局搜索正则表达式并显示出来 使用正则表达式来描述选择条件. 取行选择:选取行的筛选条件,给定选取条件,只显示符合条件的行,或者只显示不符合条件的行. 对于类似的操作有三个命令:grep; egrep; fgrep grep:默认支持基本正则表达式: egrep:扩展正则表达式: fgrep:不支持正则表达式元字符,搜索字符串的速度快: 正则表达式是一类字符所书写的模式(pattern)  

Git Basics ——————github : release 发布!

1 http://git-scm.com/book/en/v2/Git-Basics-Tagging Git Basics 2.6 Git Basics - Tagging Tagging Like most VCSs, Git has the ability to tag specific points in history as being important. Typically people use this functionality to mark release points (v