DNA Sequencing |
A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder whose sides, made of sugar and phosphate molecules, are connected by rungs of nitrogen-containing chemicals called
bases. Each strand is a linear arrangement of repeating similar units called nucleotides, which are each composed of one sugar, one phosphate, and a nitrogenous base. Four different bases are present in DNA: adenine (A), thymine (T), cytosine (C), and guanine
(G). The particular order of the bases arranged along the sugar-phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits.
Geneticists often compare DNA strands and are interested in finding the longest common base sequence in the two strands. Note that these strands can be represented as strings consisting of the lettersa, t, c and g. So, the
longest common sequence in the two strands atgc and tga is tg. It is entirely possible that two different common sequences exist that are the same length and are the longest possible common sequences. For example in the strands atgc and gctg,
the longest common sequences aregc and tg.
Input and Output
Write a program that accepts as input two strings representing DNA strands, and prints as output the longest common sequence(s) in lexicographical order.
If there isn‘t any common sequence between the two strings, just print: ``No common sequence."
If there are more than one test cases, it must be a blank line between two consecutive, both in input and output files.
The strings are at most 300 characters-long.
Sample Input
atgc tga atgc gctg
Sample Output
tg gc tg
0ms
ac代码
#include<stdio.h> #include<string.h> #include<algorithm> #include<iostream> #define min(a,b) (a>b?b:a) using namespace std; char str1[660],str2[660]; int sa[660],c[660],t2[660]; int t1[660],s[660]; int rank[660],height[660]; int len1,len2; void build_sa(int s[],int n,int m) { int i,j,p,*x=t1,*y=t2; for(i=0;i<m;i++) c[i]=0; for(i=0;i<n;i++) c[x[i]=s[i]]++; for(i=1;i<m;i++) c[i]+=c[i-1]; for(i=n-1;i>=0;i--) sa[--c[x[i]]]=i; for(j=1;j<=n;j<<=1) { p=0; for(i=n-j;i<n;i++) y[p++]=i; for(i=0;i<n;i++) if(sa[i]>=j) y[p++]=sa[i]-j; for(i=0;i<m;i++) c[i]=0; for(i=0;i<n;i++) c[x[y[i]]]++; for(i=1;i<m;i++) c[i]+=c[i-1]; for(i=n-1;i>=0;i--) sa[--c[x[y[i]]]]=y[i]; swap(x,y); p=1; x[sa[0]]=0; for(i=1;i<n;i++) x[sa[i]]=y[sa[i-1]]==y[sa[i]]&&y[sa[i-1]+j]==y[sa[i]+j]?p-1:p++; if(p>=n) break; m=p; } } void getHeight(int s[],int n) { int i,j,k=0; for(i=0;i<=n;i++) rank[sa[i]]=i; for(i=0;i<n;i++) { if(k) k--; j=sa[rank[i]-1]; while(s[i+k]==s[j+k]) k++; height[rank[i]]=k; } } int judge(int len,int k) { int i; for(i=1;i<=len;i++) { if(height[i]>=k) { if(sa[i]>len1&&sa[i-1]<=len1) return 1; if(sa[i-1]>len1&&sa[i]<=len1) return 1; } } return 0; } int main() { int flag=0; while(scanf("%s%s",str1,str2)!=EOF) { int i,j,k; if(flag) printf("\n"); flag=1; len1=strlen(str1); len2=strlen(str2); for(i=0;i<len1;i++) { s[i]=str1[i]-'a'+1; } s[len1]=27; int n=len1+1; for(i=0;i<len2;i++) s[n++]=str2[i]-'a'+1; s[n]=0; build_sa(s,n+1,28); getHeight(s,n); int l=0,r=min(len1,len2),ans=0; while(l<=r) { int mid=(l+r)>>1; if(judge(n,mid)) { ans=mid; l=mid+1; } else r=mid-1; } if(!ans) { printf("No common sequence.\n"); continue; } // printf("%d %d\n",n,len1+len2+2); for(i=1;i<=n;i++) { if(height[i]>=ans) { for(j=i;j<=n&&height[j]>=ans;j++) ; for(k=i;k<j;k++) { if(sa[k]>len1&&sa[k-1]<len1) break; if(sa[k-1]>len1&&sa[k]<len1) break; } if(j!=k) { int st; for(st=0;st<ans;st++) { printf("%c",s[sa[k]+st]+'a'-1); } printf("\n"); } i=j-1; } } } }
版权声明:本文为博主原创文章,未经博主允许不得转载。