poj 3294(经典后缀数组模板)

Life Forms

Time Limit: 5000MS   Memory Limit: 65536K
Total Submissions: 9820   Accepted: 2708

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes
like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant‘s life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains
at least one and not more than 1000 letters. A line containing 0 follows the last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output "?". Leave an empty line between test
cases.

Sample Input

3
abcdefg
bcdefgh
cdefghi
3
xxx
yyy
zzz
0

Sample Output

bcdefg
cdefgh

?

Source

Waterloo Local Contest, 2006.9.30

AC代码:

#include<iostream>
#include<string>
#include<cstring>
#include<algorithm>
#include<cstdio>
#include<cmath>
using namespace std;
const int Max = 200010;
int  num[Max];
int sa[Max], rank[Max], height[Max];
int wa[Max], wb[Max], wv[Max], wd[Max];
int pos[105];
int ans[105];
int cmp(int *r, int a, int b, int l){
    return r[a] == r[b] && r[a+l] == r[b+l];
}
void da(int *r, int n, int m){          //  倍增算法 r为待匹配数组  n为总长度 m为字符范围
    int i, j, p, *x = wa, *y = wb, *t;
    for(i = 0; i < m; i ++) wd[i] = 0;
    for(i = 0; i < n; i ++) wd[x[i]=r[i]] ++;
    for(i = 1; i < m; i ++) wd[i] += wd[i-1];
    for(i = n-1; i >= 0; i --) sa[-- wd[x[i]]] = i;
    for(j = 1, p = 1; p < n; j *= 2, m = p){
        for(p = 0, i = n-j; i < n; i ++) y[p ++] = i;
        for(i = 0; i < n; i ++) if(sa[i] >= j) y[p ++] = sa[i] - j;
        for(i = 0; i < n; i ++) wv[i] = x[y[i]];
        for(i = 0; i < m; i ++) wd[i] = 0;
        for(i = 0; i < n; i ++) wd[wv[i]] ++;
        for(i = 1; i < m; i ++) wd[i] += wd[i-1];
        for(i = n-1; i >= 0; i --) sa[-- wd[wv[i]]] = y[i];
        for(t = x, x = y, y = t, p = 1, x[sa[0]] = 0, i = 1; i < n; i ++){
            x[sa[i]] = cmp(y, sa[i-1], sa[i], j) ? p - 1: p ++;
        }
    }
}
void calHeight(int *r, int n){           //  求height数组。
    int i, j, k = 0;
    for(i = 1; i <= n; i ++) rank[sa[i]] = i;
    for(i = 0; i < n; height[rank[i ++]] = k){
        for(k ? k -- : 0, j = sa[rank[i]-1]; r[i+k] == r[j+k]; k ++);
    }
}
int judge(int mid,int mx,int T){
    int sum,t;
    int vis[105];
    sum=t=0; memset(vis,0,sizeof(vis));
    for(int i=1;i<=mx;i++){
        if(height[i]>=mid){
            for(int j=1;j<=T;j++){
                if(sa[i]>pos[j-1] && sa[i]<pos[j]){
                    if(!vis[j]){
                        sum++;
                        vis[j]=1;
                    }
                }
                if(sa[i-1]>pos[j-1] && sa[i-1]<pos[j]){
                    if(!vis[j]){
                        sum++;
                        vis[j]=1;
                    }
                }
            }
        }
        else{
            if(sum>T/2)
                ans[++t]=sa[i-1];
            sum=0;
            memset(vis,0,sizeof(vis));
        }
    }
    if(sum>T/2)
        ans[++t]=sa[mx];
    if(t){
        ans[0]=t;
        return 1;
    }
    return 0;
}
int main(){
    int T;
    int sign=0;
    while(scanf("%d",&T)!=EOF){
        if(T==0)
            break;
        int num[100010];
        int len=0; ans[0]=0;
        for(int i=1;i<=T;i++){
            char str[1005]="\0";
            scanf("%s",str);
            int ls=strlen(str);
            for(int j=0;j<ls;j++)
                num[len++]=str[j]-'a'+101;
            pos[i]=len;
            num[len++]=i;
        }
        num[len]=0;
        da(num,len+1,150);
        calHeight(num,len);
        int L,R;
        L=1; R=len;
        while(L<=R){
            int mid=L+R>>1;
            if(judge(mid,len,T))
                L=mid+1;
            else
                R=mid-1;
        }
        if(sign++)
            printf("\n");
        if(L-1==0)
            printf("?\n");
        else{
            for(int i=1;i<=ans[0];i++){
                for(int j=ans[i];j<ans[i]+L-1;j++){
                    printf("%c",num[j]+'a'-101);
                }
                printf("\n");
            }
        }
    }
    return 0;
}
时间: 2024-10-17 09:36:02

poj 3294(经典后缀数组模板)的相关文章

POJ 2217 Secretary (后缀数组)

题目大意: 计算两个字符串的最长的公共字符串字串的长度. 思路分析: 将两个串合并起来. 然后直接跑后缀数组求出height 然后就可以直接扫描一次height ,加个是不是在一个串中的判断就可以了. #include <cstdio> #include <iostream> #include <algorithm> #include <cstring> #define maxn 200005 using namespace std; char str[ma

后缀数组模板第一版

/*---------------倍增算法+RMQ后缀数组模板-------------- 输入:从0开始的字符串g,长度len最大为10^6 输出: sa[]表示:n 个后缀从小到大进行排序之后把排好序的后缀的开头位置顺 次放入 sa 中,sa[i]表示排第i位的字符串开头是sa[i],因为添加了一个结尾0,所以sa[0]=len height 数组(h[]):定义 h[i]=suffix(sa[i-1])和 suffix(sa[i])的最长公 共前缀,也就是排名相邻的两个后缀的最长公共前缀.

POJ 1226 Substrings (后缀数组)

题目大意: 问的是m个字符串里,都出现过的子串.子串也可以出现在这个串的逆序串中. 思路分析: 居然wa在全5个 "a" 的数据上. 二分的时候下界不能为0.. 思路大致上是把原串和逆序串全部处理出来,放入str中,然后在每个串中间加一个没有出现过的. 此处注意输入不仅仅是字母. 然后跑一遍后缀数组. 然后用标记计数就好了. #include <iostream> #include <cstdio> #include <algorithm> #inc

后缀数组模板一份

1 /****************** 2     by zhuyuqi      * 3     QQ:1113865149 * 4     name:2-sat    * 5                   * 6 ******************/ 7  8 using namespace std; 9 const int MAX = 1000;10 int r[MAX],*rank;11 int wa[MAX],wb[MAX],ws[MAX],wv[MAX];12 int h

后缀数组模板/LCP模板

1 //后缀数组模板,MANX为数组的大小 2 //支持的操作有计算后缀数组(sa数组), 计算相邻两元素的最长公共前缀(height数组),使用get_height(); 3 //计算两个后缀a, 和b的最长公共前缀,请先使用lcp_init(),再调用get_lcp(a, b)得到 4 //下面的n是输入字符串的长度+1(n = strlen(s) + 1), m是模板的范围 m=128表示在字母,数字范围内,可以扩大也可缩小 5 //s[len] 是插入的一个比输入字符都要小的字符 6 s

后缀数组模板及解释

以前做过后缀数组,直接用模板,最近打算重新认真的学一遍.感觉学一个东西一定要弄懂了,不然到最后还是要重学. int wa[MAXN],wb[MAXN],wv[MAXN],Ws[MAXN]; void da(int *r,int *sa,int n,int m){//n表示字符串长度 + 1,包括添加的那个0,m表示取值的范围 //把单个字符进行基数排序 int *x = wa,*y = wb; for(int i = 0; i < m; i++)Ws[i] = 0; for(int i = 0;

POJ 2406 KMP/后缀数组

题目链接:http://poj.org/problem?id=2406 题意:给定一个字符串,求由一个子串循环n次后可得到原串,输出n[即输出字符串的最大循环次数] 思路一:KMP求最小循环机,然后就能求出循环次数. #define _CRT_SECURE_NO_DEPRECATE #include<iostream> #include<cstdio> #include<cstring> #include<algorithm> #include<str

poj Common Substrings(后缀数组&amp;单调队列)

Common Substrings Time Limit: 5000MS   Memory Limit: 65536K Total Submissions: 7082   Accepted: 2355 Description A substring of a string T is defined as: T(i, k)=TiTi+1...Ti+k-1, 1≤i≤i+k-1≤|T|. Given two strings A, B and one integer K, we define S, a

后缀数组模板及一些数组的含义

最近学习了一下后缀数组,模板原理以后再看,先记一下一些数组的含义.用以下这张图做例子: rank(i)代表第i个后缀的字典序排名 sa(i)代表排名为i的字典序对应的位置 lcp(i, j)表示suffix(i)和suffix(j)的公共最长前缀 height(i) = lcp(sa(i-1), sa(i)) 当rank(i)<rank(j),有lcp(i,j) = min(height(k)),   rank(i)<k<=rank(j) 原文地址:https://www.cnblogs