POJ3294---Life Forms(后缀数组,二分+给后缀分组)

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant’s life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output “?”. Leave an empty line between test cases.

Sample Input

3

abcdefg

bcdefgh

cdefghi

3

xxx

yyy

zzz

0

Sample Output

bcdefg

cdefgh

?

Source

Waterloo Local Contest, 2006.9.30

将串都连在一起,中间用没有出现过的字符连起来,这些字符要不同,然后二分答案,给后缀分组,看每组里的后缀是否出现在一半以上的串中

/*************************************************************************
    > File Name: POJ3294.cpp
    > Author: ALex
    > Mail: [email protected]
    > Created Time: 2015年04月03日 星期五 21时07分09秒
 ************************************************************************/

#include <functional>
#include <algorithm>
#include <iostream>
#include <fstream>
#include <cstring>
#include <cstdio>
#include <cmath>
#include <cstdlib>
#include <queue>
#include <stack>
#include <map>
#include <bitset>
#include <set>
#include <vector>

using namespace std;

const double pi = acos(-1.0);
const int inf = 0x3f3f3f3f;
const double eps = 1e-15;
typedef long long LL;
typedef pair <int, int> PLL;

int pos[110100];
char str[1300];

class SuffixArray
{
    public:
        static const int N = 110100;
        int init[N];
        int X[N];
        int Y[N];
        int Rank[N];
        int sa[N];
        int height[N];
        int buc[N];
        bool vis[1200];
        int size;
        set <string> st;

        void clear()
        {
            size = 0;
        }

        void insert(int n)
        {
            init[size++] = n;
        }

        bool cmp(int *r, int a, int b, int l)
        {
            return (r[a] == r[b] && r[a + l] == r[b + l]);
        }

        void getsa(int m = 256)
        {
            init[size] = 0;
            int l, p, *x = X, *y = Y, n = size + 1;
            for (int i = 0; i < m; ++i)
            {
                buc[i] = 0;
            }
            for (int i = 0; i < n; ++i)
            {
                ++buc[x[i] = init[i]];
            }
            for (int i = 1; i < m; ++i)
            {
                buc[i] += buc[i - 1];
            }
            for (int i = n - 1; i >= 0; --i)
            {
                sa[--buc[x[i]]] = i;
            }
            for (l = 1, p = 1; l <= n && p < n; m = p, l *= 2)
            {
                p = 0;
                for (int i = n - l; i < n; ++i)
                {
                    y[p++] = i;
                }
                for (int i = 0; i < n; ++i)
                {
                    if (sa[i] >= l)
                    {
                        y[p++] = sa[i] - l;
                    }
                }
                for (int i = 0; i < m; ++i)
                {
                    buc[i] = 0;
                }
                for (int i = 0; i < n; ++i)
                {
                    ++buc[x[y[i]]];
                }
                for (int i = 1; i < m; ++i)
                {
                    buc[i] += buc[i - 1];
                }
                for (int i = n - 1; i >= 0; --i)
                {
                    sa[--buc[x[y[i]]]] = y[i];
                }
                int i;
                for (swap(x, y), x[sa[0]] = 0, p = 1, i = 1; i < n; ++i)
                {
                    x[sa[i]] = cmp(y, sa[i - 1], sa[i], l) ? p - 1 : p++;
                }
            }
        }

        void getheight()
        {
            int h = 0, n = size;
            for (int i = 0; i <= n; ++i)
            {
                Rank[sa[i]] = i;
            }
            height[0] = 0;
            for (int i = 0; i < n; ++i)
            {
                if (h > 0)
                {
                    --h;
                }
                int j = sa[Rank[i] - 1];
                for (; i + h < n && j + h < n && init[i + h] == init[j + h]; ++h);
                height[Rank[i] - 1] = h;
            }
        }

        bool check(int k, int n)
        {
            int cnt = 1;
            memset(vis, 0, sizeof(vis));
            vis[pos[sa[1]]] = 1;
            for (int i = 1; i < size; ++i)
            {
                if (height[i] >= k)
                {
                    if (pos[sa[i + 1]] != -1 && !vis[pos[sa[i + 1]]])
                    {
                        ++cnt;
                        vis[pos[sa[i + 1]]] = 1;
                    }
                }
                else
                {
                    if (cnt > n / 2)
                    {
                        return 1;
                    }
                    memset(vis, 0, sizeof(vis));
                    cnt = 1;
                    if (pos[sa[i + 1]] != -1)
                    {
                        vis[pos[sa[i + 1]]] = 1;
                    }
                }
            }
            return 0;
        }

        void solve(int n)
        {
            int l = 1, r = size, mid;
            int ans = 0;
            while (l <= r)
            {
                mid = (l + r) >> 1;
                if (check(mid, n))
                {
                    ans = mid;
                    l = mid + 1;
                }
                else
                {
                    r = mid - 1;
                }
            }
            if (!ans)
            {
                printf("?\n");
            }
            else
            {
                st.clear();
                int cnt = 1;
                memset(vis, 0, sizeof(vis));
                vis[pos[sa[1]]] = 1;
                for (int i = 0; i < size; ++i)
                {
                    if (height[i] >= ans)
                    {
                        if (!vis[pos[sa[i + 1]]])
                        {
                            ++cnt;
                            vis[pos[sa[i + 1]]] = 1;
                        }
                        for (int j = sa[i + 1]; j < sa[i + 1] + ans; ++j)
                        {
                            str[j - sa[i + 1]] = (char)init[j];
                        }
                        str[ans] = ‘\0‘;
                        st.insert(str);
                    }
                    else if (height[i] < ans)
                    {
                        if (cnt > n / 2)
                        {
                            set <string> :: iterator it;
                            for (it = st.begin(); it != st.end(); ++it)
                            {
                                printf("%s\n", it -> c_str());
                            }
                        }
                        st.clear();
                        cnt = 1;
                        memset(vis, 0, sizeof(vis));
                        vis[pos[sa[i + 1]]] = 1;
                    }
                }
            }
        }
}SA;

int main()
{
    int n;
    bool flag = 0;
    while (~scanf("%d", &n), n)
    {
        int maxs = 0;
        SA.clear();
        int cnt = 0;
        for (int i = 1; i <= n; ++i)
        {
            scanf("%s", str);
            int len = strlen(str);
            for (int j = 0; j < len; ++j)
            {
                SA.insert((int)str[j]);
                maxs = max(maxs, (int)str[j]);
                pos[cnt++] = i;
            }
            SA.insert((int)(‘z‘) + i);
            pos[cnt++] = -1;
        }
        if (flag)
        {
            printf("\n");
        }
        else
        {
            flag = 1;
        }
        SA.getsa();
        SA.getheight();
        SA.solve(n);
    }
    return 0;
}
时间: 2024-10-18 10:51:15

POJ3294---Life Forms(后缀数组,二分+给后缀分组)的相关文章

hdu 5030 Rabbit&#39;s String(后缀数组&amp;二分)

Rabbit's String Time Limit: 40000/20000 MS (Java/Others)    Memory Limit: 65536/65536 K (Java/Others) Total Submission(s): 288    Accepted Submission(s): 108 Problem Description Long long ago, there lived a lot of rabbits in the forest. One day, the

HDU5008 Boring String Problem(后缀数组 + 二分 + 线段树)

题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5008 Description In this problem, you are given a string s and q queries. For each query, you should answer that when all distinct substrings of string s were sorted lexicographically, which one is

BZOJ 3230: 相似子串( RMQ + 后缀数组 + 二分 )

二分查找求出k大串, 然后正反做后缀数组, RMQ求LCP, 时间复杂度O(NlogN+logN) --------------------------------------------------------------------- #include<cstdio> #include<algorithm> #include<cstring> #include<cctype> using namespace std; typedef long long

hdu 5008(2014 ACM/ICPC Asia Regional Xi&#39;an Online ) Boring String Problem(后缀数组&amp;二分)

Boring String Problem Time Limit: 6000/3000 MS (Java/Others)    Memory Limit: 65536/65536 K (Java/Others) Total Submission(s): 219    Accepted Submission(s): 45 Problem Description In this problem, you are given a string s and q queries. For each que

【bzoj4310】跳蚤 后缀数组+二分

题目描述 很久很久以前,森林里住着一群跳蚤.一天,跳蚤国王得到了一个神秘的字符串,它想进行研究. 首先,他会把串分成不超过 k 个子串,然后对于每个子串 S,他会从S的所有子串中选择字典序最大的那一个,并在选出来的 k 个子串中选择字典序最大的那一个.他称其为“魔力串”. 现在他想找一个最优的分法让“魔力串”字典序最小. 输入 第一行一个整数 k. 接下来一个长度不超过 105 的字符串 S. 输出 输出一行,表示字典序最小的“魔力串”. 样例输入 13 bcbcbacbbbbbabbacbcb

WHU---1084 - 连续技 (后缀数组+二分)

Description 不管是什么武功,多少都会有一或两个连续技多次出现,这些连续技常常是发明该武功的人的习惯性动作,如果这些动作被对手分析出来了,就很容易被对手把握住先机.比如松风剑谱里面有一式叫做迎风傲骨是如下的动作: 劈 刺 削 刺 削 踢 刺 削 刺 削 很明显 刺-削 这个连续动作出现了4次,而 刺-削-刺-削 这个连续动作则出现了两次. 现在刘白宇弄到了一本魔教的掌法,想让你帮忙来分析其中最长的且出现尽量多的连续技,当然,他不好意思麻烦你太久,只想让你告诉它这个连续技有多长并且出现了

poj 3261 Milk Patterns 后缀数组+二分

1 /*********************************************************** 2 题目: Milk Patterns(poj 3261) 3 链接: http://poj.org/problem?id=3261 4 题意: 给一串数字,求这些数字中公共子串个数大于k的 5 最长串. 6 算法: 后缀数组+二分 7 ***********************************************************/ 8 #incl

POJ1743---Musical Theme(后缀数组+二分)

Description A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It is unfortunate but true that this representation of melodies ignores the notion of music

Poj 3294 Life Forms (后缀数组 + 二分 + Hash)

题目链接: Poj 3294 Life Forms 题目描述: 有n个文本串,问在一半以上的文本串出现过的最长连续子串? 解题思路: 可以把文本串用没有出现过的不同字符连起来,然后求新文本串的height.然后二分答案串的长度K,根据K把新文本串的后缀串分块,统计每块中的原文本串出现的次数,大于原文本串数目的一半就作为答案记录下来,对于输出字典序,height就是排好序的后缀数组,只要按照顺序输出即可. 1 #include <cstdio> 2 #include <cstring>

POJ 3294 Life Forms(后缀数组+二分答案)

[题目链接] http://poj.org/problem?id=3294 [题目大意] 求出在至少在一半字符串中出现的最长子串. 如果有多个符合的答案,请按照字典序输出. [题解] 将所有的字符串通过不同的拼接符相连,作一次后缀数组, 二分答案的长度,然后在h数组中分组,判断是否可行, 按照sa扫描输出长度为L的答案即可.注意在一个子串中重复出现答案串的情况. [代码] #include <cstdio> #include <cstring> #include <vecto