poj3080(kmp+枚举)

Blue Jeans

Time Limit: 1000MS   Memory Limit: 65536K
Total Submissions: 20163   Accepted: 8948

Description

The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated.

As an IBM researcher, you have been tasked with writing a program that will find commonalities amongst given snippets of DNA that can be correlated with individual survey information to identify new genetic markers.

A DNA base sequence is noted by listing the nitrogen bases in the order in which they are found in the molecule. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). A 6-base DNA sequence could be represented as TAGACC.

Given a set of DNA base sequences, determine the longest series of bases that occurs in all of the sequences.

Input

Input to this problem will begin with a line containing a single integer n indicating the number of datasets. Each dataset consists of the following components:

  • A single positive integer m (2 <= m <= 10) indicating the number of base sequences in this dataset.
  • m lines each containing a single base sequence consisting of 60 bases.

Output

For each dataset in the input, output the longest base subsequence common to all of the given base sequences. If the longest common subsequence is less than three bases in length, display the string "no significant commonalities" instead. If multiple subsequences of the same longest length exist, output only the subsequence that comes first in alphabetical order.

Sample Input

3
2
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
3
GATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATACCAGATA
GATACTAGATACTAGATACTAGATACTAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
GATACCAGATACCAGATACCAGATACCAAAGGAAAGGGAAAAGGGGAAAAAGGGGGAAAA
3
CATCATCATCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
ACATCATCATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AACATCATCATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

Sample Output

no significant commonalities
AGATAC
CATCATCAT

Source

South Central USA 2006

题意:输入t组数据,每组有n个60个字符大小的字符串,求他们的最长公共子序列,在长度相同的情况下,输出字典序最小的那个,如果子序列的长度小于3,输出

no significant commonalities

思路:随便找一个字符串,从第1号位置一直枚举到第57号位置,然后这每一种情况都与其他n-1个字符串匹配,看每一种情况与这n-1个字符串最长可以匹配多长,在每一种情况下取与这m-1个字符串匹配最小的长度(保证它可以和这n-1个字符都匹配的上),在57种情况中取最大的长度(保证是这n个字符的最长公共子序列)。

代码:

#include<stdio.h>
#include<string.h>
char s[12][62],p[62];
char ans[62];
int next[62];
int N;
int getnext(int n)
{
  next[0]=-1;
  int i,j=1,k=-1;
  while(j<n)
  {
    while(k>-1&&p[j]!=p[k+1])
    {
      k=next[k];
    }
    if(p[j]==p[k+1])
    k++;
    next[j]=k;
    j++;
  }
  return 0;
}
int kmp(int n)
{
  getnext(n);
  int i,j,k,sum,mx=0;
  int max=100;
  for(i=1;i<N;i++)//与剩下n-1个字符匹配
  {
    j=0,k=0,mx=0;
    while(j<60&&k<n)
    {
      if(p[k]==s[i][j])//匹配时
      {
        k++;
        j++;
      }
      else
      {
        if(k==0)//回到了模式串的开头
        j++;
        else
        k=next[k-1]+1;                

      }
      if(mx<k)
      mx=k;
    }
    if(max>mx)
    max=mx;
  }
  return max;
}
int main()
{
  int t;
  scanf("%d",&t);
  int i,j;
  int len;
  while(t--)
  {
    len=0;
    scanf("%d",&N);
    for(i=0;i<N;i++)
    {
      scanf("%s",s[i]);
      //printf("%s\n",s[i]);
    }
    for(i=0;i<58;i++)
    {
      strcpy(p,s[0]+i);
      p[60-i]=‘\0‘;
      int mx=kmp(60-i);
      if(len<mx)
      {
        strncpy(ans,s[0]+i,mx);
        ans[mx]=‘\0‘;
        len=mx;
      }
      else if(len==mx)
      {
        p[mx]=‘\0‘;
        if(strcmp(p,ans)<0)
        {
          strcpy(ans,p);
          ans[mx]=‘\0‘;
        }
      }
    }
    if(len>=3)
    printf("%s\n",ans);
    else
    printf("no significant commonalities\n");
  }
  return 0;
}

原文地址:https://www.cnblogs.com/cglongge/p/9053106.html

时间: 2024-08-30 04:20:58

poj3080(kmp+枚举)的相关文章

poj3080(kmp)

欢迎参加hihoCoder挑战赛14和15,赢取100件Tshirt! Language: Default Blue Jeans Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 14450   Accepted: 6437 Description The Genographic Project is a research partnership between IBM and The National Geographic

hdu-1238(kmp+枚举)

题意:给你n个字符串,问你这里面最长的公共子串的长度是多少,一个公共子串的反串也算,比如样例二: 解题思路:随便找一个字符,枚举它的子串然后跑kmp就行了,很多人的博客都是用string类里面的函数来解决的,学到了... 代码: #include<iostream> #include<algorithm> #include<cstdio> #include<cstring> #define maxn 205 using namespace std; char

POJ 1699 kmp+枚举

Best Sequence Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 5135   Accepted: 2040 Description The twenty-first century is a biology-technology developing century. One of the most attractive and challenging tasks is on the gene project,

POJ3080题解——暴力orKMP

题目链接:http://poj.org/problem?id=3080 The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousands of contributors to map how the Earth was populated. As an IBM resea

[Strings]一些字符串题目

Trie BZOJ 3689 异或之 大意: 给定n个数,求这n个数两两异或的值中的前k小 解: 将加法换成异或就变成了一个用堆合并多个有序表的经典问题,对于加法我们按大小排序就能得到有序表,而由于这里是异或,我们需要高效地维护有序表,即对于一个数ai快速求出与它异或第k小的数. 我们将所有数按二进制建成Trie,然后在Trie的结点上记录下子树中的结束结点个数,再在Trie树上走一遍就得到了答案 BZOJ 3439 Kpm的MC密码 大意: 给定n个字符串,对于每个字符串求以这个字符串为后缀的

poj3080(Blue Jeans)kmp求多个串公共子串

题意:给出1-10个长度为60的字符串,求出最长的公共子串(长度不能小于3),如果有多个一样长的,输出字典序最短的. 解法:想到kmp时,自己第一反应枚举第一个串的所有子串,在其他所有串中走一遍kmp,复杂度为10*60*60*60,但是发现只需枚举第一个串后缀就可以,每次枚举记录在所有串能走最远中走的最短的那个长度.这样复杂度就成了10*60*60,0ms AC. 代码: /**************************************************** * autho

POJ 3080--Blue Jeans【KMP &amp;&amp; 暴力枚举】

Blue Jeans Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 14316   Accepted: 6374 Description The Genographic Project is a research partnership between IBM and The National Geographic Society that is analyzing DNA from hundreds of thousa

POJ 3450--Corporate Identity【KMP &amp;amp;&amp;amp; 枚举】

Corporate Identity Time Limit: 3000MS   Memory Limit: 65536K Total Submissions: 5696   Accepted: 2075 Description Beside other services, ACM helps companies to clearly state their "corporate identity", which includes company logo but also other

hdu_2328_Corporate Identity(暴力枚举子串+KMP)

题目链接:hdu_2328_Corporate Identity 题意: 给你n个串,让你找这n个串的最大公共子串 题解: 串比较小,暴力枚举第一个的子串,然后KMP判断是否可行 1 #include<cstdio> 2 #include<cstring> 3 #define F(i,a,b) for(int i=a;i<=b;i++) 4 5 const int N=210; 6 int nxt[N],n,lens[4001],ans,l,r,cnt; 7 char dt[