UVA 题目760 DNA Sequencing (后缀数组求两个串最长公共子串,字典序输出)

 DNA Sequencing 

A DNA molecule consists of two strands that wrap around each other to resemble a twisted ladder whose sides, made of sugar and phosphate molecules, are connected by rungs of nitrogen-containing chemicals called
bases. Each strand is a linear arrangement of repeating similar units called nucleotides, which are each composed of one sugar, one phosphate, and a nitrogenous base. Four different bases are present in DNA: adenine (A), thymine (T), cytosine (C), and guanine
(G). The particular order of the bases arranged along the sugar-phosphate backbone is called the DNA sequence; the sequence specifies the exact genetic instructions required to create a particular organism with its own unique traits.

Geneticists often compare DNA strands and are interested in finding the longest common base sequence in the two strands. Note that these strands can be represented as strings consisting of the lettersatc and g. So, the
longest common sequence in the two strands atgc and tga is tg. It is entirely possible that two different common sequences exist that are the same length and are the longest possible common sequences. For example in the strands atgc and gctg,
the longest common sequences aregc and tg.

Input and Output

Write a program that accepts as input two strings representing DNA strands, and prints as output the longest common sequence(s) in lexicographical order.

If there isn‘t any common sequence between the two strings, just print: ``No common sequence."

If there are more than one test cases, it must be a blank line between two consecutive, both in input and output files.

The strings are at most 300 characters-long.

Sample Input

atgc
tga

atgc
gctg

Sample Output

tg

gc
tg

0ms

ac代码

#include<stdio.h>
#include<string.h>
#include<algorithm>
#include<iostream>
#define min(a,b) (a>b?b:a)
using namespace std;
char str1[660],str2[660];
int sa[660],c[660],t2[660];
int t1[660],s[660];
int rank[660],height[660];
int len1,len2;
void build_sa(int s[],int n,int m)
{
    int i,j,p,*x=t1,*y=t2;
    for(i=0;i<m;i++)
        c[i]=0;
    for(i=0;i<n;i++)
        c[x[i]=s[i]]++;
    for(i=1;i<m;i++)
        c[i]+=c[i-1];
    for(i=n-1;i>=0;i--)
        sa[--c[x[i]]]=i;
    for(j=1;j<=n;j<<=1)
    {
        p=0;
        for(i=n-j;i<n;i++)
            y[p++]=i;
        for(i=0;i<n;i++)
            if(sa[i]>=j)
                y[p++]=sa[i]-j;
        for(i=0;i<m;i++)
            c[i]=0;
        for(i=0;i<n;i++)
            c[x[y[i]]]++;
        for(i=1;i<m;i++)
            c[i]+=c[i-1];
        for(i=n-1;i>=0;i--)
            sa[--c[x[y[i]]]]=y[i];
        swap(x,y);
        p=1;
        x[sa[0]]=0;
        for(i=1;i<n;i++)
            x[sa[i]]=y[sa[i-1]]==y[sa[i]]&&y[sa[i-1]+j]==y[sa[i]+j]?p-1:p++;
        if(p>=n)
            break;
        m=p;
    }
}
void getHeight(int s[],int n)
{
    int i,j,k=0;
    for(i=0;i<=n;i++)
        rank[sa[i]]=i;
    for(i=0;i<n;i++)
    {
        if(k)
            k--;
        j=sa[rank[i]-1];
        while(s[i+k]==s[j+k])
            k++;
        height[rank[i]]=k;
    }
}
int judge(int len,int k)
{
	int i;
	for(i=1;i<=len;i++)
	{
		if(height[i]>=k)
		{
			if(sa[i]>len1&&sa[i-1]<=len1)
				return 1;
			if(sa[i-1]>len1&&sa[i]<=len1)
				return 1;
		}
	}
	return 0;
}
int main()
{
	int flag=0;
	while(scanf("%s%s",str1,str2)!=EOF)
	{
		int i,j,k;
		if(flag)
			printf("\n");
		flag=1;
		len1=strlen(str1);
		len2=strlen(str2);
		for(i=0;i<len1;i++)
		{
			s[i]=str1[i]-'a'+1;
		}
		s[len1]=27;
		int n=len1+1;
		for(i=0;i<len2;i++)
			s[n++]=str2[i]-'a'+1;
		s[n]=0;
		build_sa(s,n+1,28);
		getHeight(s,n);
		int l=0,r=min(len1,len2),ans=0;
		while(l<=r)
		{
			int mid=(l+r)>>1;
			if(judge(n,mid))
			{
				ans=mid;
				l=mid+1;
			}
			else
				r=mid-1;
		}
		if(!ans)
		{
			printf("No common sequence.\n");
			continue;
		}
	//	printf("%d %d\n",n,len1+len2+2);
		for(i=1;i<=n;i++)
		{
			if(height[i]>=ans)
			{
				for(j=i;j<=n&&height[j]>=ans;j++)
					;
				for(k=i;k<j;k++)
				{
					if(sa[k]>len1&&sa[k-1]<len1)
						break;
					if(sa[k-1]>len1&&sa[k]<len1)
						break;
				}
				if(j!=k)
				{
					int st;
					for(st=0;st<ans;st++)
					{
						printf("%c",s[sa[k]+st]+'a'-1);
					}
					printf("\n");
				}
				i=j-1;
			}
		}
	}
}

版权声明:本文为博主原创文章,未经博主允许不得转载。

时间: 2024-12-20 11:55:54

UVA 题目760 DNA Sequencing (后缀数组求两个串最长公共子串,字典序输出)的相关文章

求两个字符串最长公共子串

一.问题描述: 最长公共子串 (LCS-Longest Common Substring) LCS问题就是求两个字符串最长公共子串的问题.比如输入两个字符串"ilovechina"和“chinabest”的最长公共字符串有"china",它们的长度是5. 二.解法 解法就是用一个矩阵来记录两个字符串中所有位置的两个字符之间的匹配情况,若是匹配则为1,否则为0.然后求出对角线最长的1序列,其对应的位置就是最长匹配子串的位置.如下图: i   l   o  v  e  

POJ - 3415 Common Substrings(后缀数组求长度不小于 k 的公共子串的个数+单调栈优化)

Description A substring of a string T is defined as: T( i, k)= TiTi+1... Ti+k-1, 1≤ i≤ i+k-1≤| T|. Given two strings A, B and one integer K, we define S, a set of triples (i, j, k): S = {( i, j, k) | k≥ K, A( i, k)= B( j, k)}. You are to give the val

UVA 题目1223 - Editor(后缀数组求出现次数超过两次的最长子串的长度)

Mr. Kim is a professional programmer. Recently he wants to design a new editor which has as many functions as possible. Most editors support a simple search function that finds one occurrence (or all occurrences successively) of a query pattern strin

poj2774 后缀数组2个字符串的最长公共子串

Long Long Message Time Limit: 4000MS   Memory Limit: 131072K Total Submissions: 26601   Accepted: 10816 Case Time Limit: 1000MS Description The little cat is majoring in physics in the capital of Byterland. A piece of sad news comes to him these days

UVA 题目11512 - GATTACA(后缀数组求出现次数最多的子串及重复次数)

The Institute of Bioinformatics and Medicine (IBM) of your country has been studying the DNA sequences of several organisms, including the human one. Before analyzing the DNA of an organism, the investigators must extract the DNA from the cells of th

poj 1743 二分答案+后缀数组 求不重叠的最长重复子串

题意:给出一串序列,求最长的theme长度 (theme:完全重叠的子序列,如1 2 3和1 2 3  or  子序列中每个元素对应的差相等,如1 2 3和7 8 9) 要是没有差相等这个条件那就好办多了,直接裸题. 一开始想了个2B方法,后来发现真心2B啊蛤蛤蛤 1 for i=1 to 88 do 2 { 3 for j=1 to length 4 { 5 r2[j]=r[j]+i; 6 if (r2[j]>88) r2[i]-=88; 7 } 8 把新序列r2连接到原序列r的后面 9 pr

求两个串的最大公共子串

 给定一个query和一个text,均由小写字母组成.要求在text中找出以同样的顺序连续出现在query中的最长连续字母序列的长度.例如,query为 "acbac",text为"acaccbabb",那么text中的"cba"为最长的连续出现在query中的字母序列,因此,返回结果应该为其长度3. int getLongestSubString(char* query, char* text) { int imax = INT_MIN; in

求两个字符串最长公共子串(动态规划)

code如下: //Longest common sequence, dynamic programming method void FindLCS(char *str1, char *str2) { if(str1 == NULL || str2 == NULL) return; int length1 = strlen(str1)+1; int length2 = strlen(str2)+1; int **csLength,**direction;//two arrays to recor

java实现字符串匹配问题之求两个字符串的最大公共子串

转载请注明出处:http://blog.csdn.net/xiaojimanman/article/details/38924981 近期在项目工作中有一个关于文本对照的需求,经过这段时间的学习,总结了这篇博客内容:求两个字符串的最大公共子串. 算法思想:基于图计算两字符串的公共子串.详细算法思想參照下图: 输入字符串S1:achmacmh    输入字符串S2:macham 1)第a步,是将字符串s1,s2分别按字节拆分,构成一个二维数组: 2)二维数组中的值如b所看到的,比方第一行第一列的值