POJ 题目3294Life Forms(后缀数组求超过k个的串的最长公共子串)

Life Forms

Time Limit: 5000MS   Memory Limit: 65536K
Total Submissions: 11178   Accepted: 3085

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes
like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled
The Chase. It turns out that in the vast majority of the quadrant‘s life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤
n
≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the
last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output "?". Leave an empty line between test
cases.

Sample Input

3
abcdefg
bcdefgh
cdefghi
3
xxx
yyy
zzz
0

Sample Output

bcdefg
cdefgh

?

Source

Waterloo Local Contest, 2006.9.30

就是找超过k/2个的最长公共子串

poj第200道题,,这个艰辛啊。。。本来很简单的一道题,超时一下午,,无语死了,vis数组开大了,换成bool型又快了800ms

ac代码

Problem: 3294		User: kxh1995
Memory: 6196K		Time: 1266MS
Language: C++		Result: Accepted
#include<stdio.h>
#include<string.h>
#include<algorithm>
#include<iostream>
#define min(a,b) (a>b?b:a)
#define max(a,b) (a>b?a:b)
#define N 1000005
using namespace std;
char str[1000010];
int sa[1000010],Rank[1000010],rank2[1000010],height[1000010],c[1000010],*x,*y,s[1000010],k;
void cmp(int n,int sz)
{
    int i;
    memset(c,0,sizeof(c));
    for(i=0;i<n;i++)
        c[x[y[i]]]++;
    for(i=1;i<sz;i++)
        c[i]+=c[i-1];
    for(i=n-1;i>=0;i--)
        sa[--c[x[y[i]]]]=y[i];
}
void build_sa(int *s,int n,int sz)
{
    x=Rank,y=rank2;
    int i,j;
    for(i=0;i<n;i++)
        x[i]=s[i],y[i]=i;
    cmp(n,sz);
    int len;
    for(len=1;len<n;len<<=1)
    {
        int yid=0;
        for(i=n-len;i<n;i++)
        {
            y[yid++]=i;
        }
        for(i=0;i<n;i++)
            if(sa[i]>=len)
                y[yid++]=sa[i]-len;
            cmp(n,sz);
        swap(x,y);
        x[sa[0]]=yid=0;
        for(i=1;i<n;i++)
        {
            if(y[sa[i-1]]==y[sa[i]]&&sa[i-1]+len<n&&sa[i]+len<n&&y[sa[i-1]+len]==y[sa[i]+len])
                x[sa[i]]=yid;
            else
                x[sa[i]]=++yid;
        }
        sz=yid+1;
        if(sz>=n)
            break;
    }
    for(i=0;i<n;i++)
        Rank[i]=x[i];
}
void getHeight(int *s,int n)
{
    int k=0;
    for(int i=0;i<n;i++)
    {
        if(Rank[i]==0)
            continue;
        k=max(0,k-1);
        int j=sa[Rank[i]-1];
        while(s[i+k]==s[j+k])
            k++;
        height[Rank[i]]=k;
    }
}
int len[110],anssize,ans[1000010];
bool vis[105];
int judge(int n,int mid)
{
	int i,j;
	int cnt=0;
	int size=0;
	memset(vis,0,sizeof(vis));
	for(i=1;i<n;i++)
	{
		if(height[i]>=mid)
		{
			for(j=1;j<=k;j++)
			{
				if(sa[i]>len[j-1]&&sa[i]<len[j])
				{
					if(!vis[j])
					{
						cnt++;
						vis[j]=1;
					}
				}
				if(sa[i-1]>len[j-1]&&sa[i-1]<len[j])
				{
					if(!vis[j])
					{
						cnt++;
						vis[j]=1;
					}
				}
			}
		}
		else
		{
			if(cnt>k/2)
				ans[++size]=sa[i-1];
			cnt=0;
			memset(vis,0,sizeof(vis));
		}
	}
	if(cnt>k/2)
		ans[++size]=sa[n];
	if(size)
	{
		anssize=size;
		return 1;
	}
	return 0;
}
int main()
{
	//int k;
	int flag=0;
	while(scanf("%d",&k)!=EOF,k)
	{
		int i,ll=0,j;
		int	num=0;
		for(i=1;i<=k;i++)
		{
			scanf("%s",str+ll);
			for(;str[ll];ll++)
				s[ll]=str[ll];
			s[ll]='#'+i;
			len[++num]=ll;
			ll++;
		}
		s[ll-1]=0;
		build_sa(s,ll,255);
		getHeight(s,ll-1);
		int l=0,r=ll;
		while(l<=r)
		{
			int mid=(l+r)>>1;
			if(judge(ll,mid))
			{
				l=mid+1;
			}
			else
				r=mid-1;
		}
		if(flag)
			printf("\n");
		flag=1;
		if(l<2)
		{
			printf("?\n");
		}
		else
		{
			for(i=1;i<=anssize;i++)
			{
				for(j=0;j<l-1;j++)
				{
					printf("%c",str[ans[i]+j]);
				}
				printf("\n");
			}
		}
	}
}

版权声明:本文为博主原创文章,未经博主允许不得转载。

时间: 2024-10-06 08:12:26

POJ 题目3294Life Forms(后缀数组求超过k个的串的最长公共子串)的相关文章

POJ 3294 Life Forms(后缀数组求k个串的最长子串)

题目大意:给出n个字符串,让你求出最长的子串,如果有多个按照字典序顺序输出. 解题思路:将n个字符串连起来,中间需要隔开,然后我们二分枚举字符串的长度,求最长的长度,如果多个需要按照字典序保存起来,最后输出答案就可以了.时间复杂度是:O(n*log(n)). Life Forms Time Limit: 5000MS   Memory Limit: 65536K Total Submissions: 10275   Accepted: 2822 Description You may have

POJ 3294 Life Forms (后缀数组)

题目大意: 求出在m个串中出现过大于m/2次的子串. 思路分析: 如果你只是直接跑一次后缀数组,然后二分答案扫描的话. 那么就试一下下面这个数据. 2 abcdabcdefgh efgh 这个数据应该输出 efgh 问题就在于对于每一个串,都只能参与一次计数,所以在check的时候加一个标记数组是正解. #include <cstdio> #include <iostream> #include <algorithm> #include <cstring>

【poj1226-出现或反转后出现在每个串的最长公共子串】后缀数组

题意:求n个串的最长公共子串,子串出现在一个串中可以是它的反转串出现.总长<=10^4. 题解: 对于每个串,把反转串也连进去.二分长度,分组,判断每个组. 1 #include<cstdio> 2 #include<cstdlib> 3 #include<cstring> 4 #include<iostream> 5 using namespace std; 6 7 const int N=2*21000; 8 int n,sl,cl,c[N],rk

SPOJ 1811 Longest Common Substring(求两个串的最长公共子串)

http://www.spoj.com/problems/LCS/ 题目:求两个串的最长公共子串 分析: 以A建立SAM 让B在SAM上匹配可以类比于kmp思想,我们知道在Parent树上,fa是当前节点的子集,也就是说满足最大前缀,利用这个就可以做题了 #include <bits/stdc++.h> #define LL long long #define P pair<int, int> #define lowbit(x) (x & -x) #define mem(a

Poj 3294 Life Forms (后缀数组 + 二分 + Hash)

题目链接: Poj 3294 Life Forms 题目描述: 有n个文本串,问在一半以上的文本串出现过的最长连续子串? 解题思路: 可以把文本串用没有出现过的不同字符连起来,然后求新文本串的height.然后二分答案串的长度K,根据K把新文本串的后缀串分块,统计每块中的原文本串出现的次数,大于原文本串数目的一半就作为答案记录下来,对于输出字典序,height就是排好序的后缀数组,只要按照顺序输出即可. 1 #include <cstdio> 2 #include <cstring>

POJ - 3261 Milk Patterns (后缀数组求可重叠的 k 次最长重复子串)

Description Farmer John has noticed that the quality of milk given by his cows varies from day to day. On further investigation, he discovered that although he can't predict the quality of milk from one day to the next, there are some regular pattern

POJ - 3415 Common Substrings(后缀数组求长度不小于 k 的公共子串的个数+单调栈优化)

Description A substring of a string T is defined as: T( i, k)= TiTi+1... Ti+k-1, 1≤ i≤ i+k-1≤| T|. Given two strings A, B and one integer K, we define S, a set of triples (i, j, k): S = {( i, j, k) | k≥ K, A( i, k)= B( j, k)}. You are to give the val

POJ 3415 Common Substrings(后缀数组求重复字串)

题目大意:给你两个字符串,让你求出来两个字符串之间的重复子串长度大于k的有多少个. 解题思路: 先说论文上给的解释:基本思路是计算A的所有后缀和B的所有后缀之间的最长公共前缀的长度,把最长公共前缀长度不小于k的部分全部加起来.先将两个字符串连起来,中间用一个没有出现过的字符隔开.按height值分组后,接下来的工作便是快速的统计每组中后缀之间的最长公共前缀之和.扫描一遍,每遇到一个B的后缀就统计与前面的A的后缀能产生多少个长度不小于k的公共子串,这里A的后缀需要用一个单调的栈来高效的维护.然后对

POJ - 1743 Musical Theme (后缀数组求不可重叠最长重复子串)

Description A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It is unfortunate but true that this representation of melodies ignores the notion of music