POJ3294--Life Forms 后缀数组+二分答案 大于k个字符串的最长公共子串

Life Forms

Time Limit: 5000MS   Memory Limit: 65536K
Total Submissions: 10800   Accepted: 2967

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant‘s life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output "?". Leave an empty line between test cases.

Sample Input

3
abcdefg
bcdefgh
cdefghi
3
xxx
yyy
zzz
0

Sample Output

bcdefg
cdefgh

?

题意: n个字符串, 求大于n/2个字符串的最长子串。  如果有多个按字典序输出。

大致思路:首先把所有字符串用不相同的一个字符隔开(用同一个字符隔开wa了好久), 这里我是用数字来隔开的。然后依次求sa,lcp。 我们可以二分答案的长度, 对于长度x,我们可以把 后缀进行分组(lcp[i] < x时 隔开), 然后对于每一组判断有多少个字符串出现,如果大于n/2说明符合。。对于字典序就不用排序了,,因为我们就是按照sa数组来遍历lcp的。。所以直接得到的答案就是字典序从小到大。
  1 #include <set>
  2 #include <map>
  3 #include <cmath>
  4 #include <ctime>
  5 #include <queue>
  6 #include <stack>
  7 #include <cstdio>
  8 #include <string>
  9 #include <vector>
 10 #include <cstdlib>
 11 #include <cstring>
 12 #include <iostream>
 13 #include <algorithm>
 14 using namespace std;
 15 typedef unsigned long long ull;
 16 typedef long long ll;
 17 const int inf = 0x3f3f3f3f;
 18 const double eps = 1e-8;
 19 const int M = 2e6+10;
 20 int s[M];
 21 int sa[M], tmp[M], rank[M], lcp[M], k, len;
 22 bool cmp(int i, int j)
 23 {
 24     if (rank[i] != rank[j])
 25         return rank[i] < rank[j];
 26     else
 27     {
 28         int x = i+k <= len ? rank[i+k] : -1;
 29         int y = j+k <= len ? rank[j+k] : -1;
 30         return x < y;
 31     }
 32 }
 33 void build_sa()
 34 {
 35     for (int i = 0; i <= len; i++)
 36     {
 37         sa[i] = i;
 38         rank[i] = i < len ? s[i] : -1;
 39     }
 40     for (k = 1; k <= len; k *= 2)
 41     {
 42         sort (sa, sa+len+1, cmp);
 43         tmp[sa[0]] = 0;
 44         for (int i = 1; i <= len; i++)
 45         {
 46             tmp[sa[i]] = tmp[sa[i-1]] + (cmp(sa[i-1], sa[i]) ? 1 : 0);
 47         }
 48         for (int i = 0; i <= len; i++)
 49         {
 50             rank[i] = tmp[i];
 51         }
 52     }
 53 }
 54 void Get_Lcp()
 55 {
 56     for (int i = 0; i < len; i++)
 57     {
 58         rank[sa[i]] = i;
 59     }
 60     int h = 0;
 61     lcp[0] = 0;
 62     for (int i = 0; i <  len; i++)
 63     {
 64         int j = sa[rank[i]-1];
 65         if (h > 0)
 66             h--;
 67         for (; i+h < len && j+h < len; h++)
 68             if (s[i+h] != s[j+h])
 69                 break;
 70         lcp[rank[i]] = h;
 71     }
 72 }
 73 int vis[110], pos[M];
 74 int ans[M], tot;
 75 int Stack[M], top;
 76 bool solve (int x, int n)
 77 {
 78     int minv = inf;
 79     int cnt = 0;
 80     bool flag = false;
 81     for (int i = 0; i <= len+1; i++)
 82     {
 83         if (lcp[i] < x)
 84         {
 85
 86             if ( cnt+ (!vis[pos[sa[i-1]]]) > n/2 && (minv != inf && minv >= x))
 87             {
 88                 if (!flag )
 89                     tot = 0;
 90                 flag = true;
 91                 ans[tot++] = sa[i-1];
 92             }
 93             minv = inf;
 94             cnt = 0;
 95             memset(vis, 0, sizeof (vis));
 96             continue;
 97         }
 98         if ( vis[pos[sa[i-1]]]==0)
 99         {
100             cnt++;
101
102         }
103         vis[pos[sa[i-1]]] = 1;
104         minv = min(minv, lcp[i]);
105
106     }
107     return tot > 0 && flag;
108 }
109 int string_len[110], c1;
110 void init()
111 {
112     c1 = tot = 0;
113     memset(vis, 0, sizeof (vis));
114     memset(string_len, 0, sizeof (string_len));
115 }
116 char cacaca[1100];
117 int main()
118 {
119 #ifndef ONLINE_JUDGE
120     freopen("in.txt","r",stdin);
121    // freopen("wa.txt","w",stdout);
122 #endif
123     int n, cas = 1;
124     while ( scanf ("%d", &n), n)
125     {
126         if (cas != 1)
127             printf("\n");
128         cas++;
129         init();
130         len = 0;
131         int del = 1;
132         for (int i = 0; i < n; i++)
133         {
134             scanf ("%s", cacaca);
135             int sub_len = strlen(cacaca);
136             for (int j = 0; j < sub_len; j++)
137             {
138                 s[len++] = cacaca[j];
139             }
140             s[len++] = M+del;
141             del++;
142             string_len[c1] = sub_len + string_len[c1-1];
143             if (c1)
144                 string_len[c1]++;
145             c1++;
146         }
147         if (n == 1)
148         {
149             for (int i = 0; i < len-1; i++)
150             {
151                 printf("%c", s[i]);
152             }
153             continue;
154         }
155         for (int i = 0, j = 0; i < len; i++)
156         {
157             if (i >= string_len[j])
158             {
159                 pos[i] = -1;
160                 j++;
161                 continue;
162             }
163             pos[i] = j+1;
164         }
165         build_sa();
166         Get_Lcp();
167
168         int ua = 0, ub = M;
169         while (ua + 1 < ub)
170         {
171             int mid = (ua + ub) >> 1;
172             if (mid&&solve(mid, n) == true)
173             {
174
175                 ua = mid;
176             }
177             else
178                 ub = mid;
179         }
180         if (tot == 0)
181             printf("?\n");
182         else
183         {
184             if (ua == 0)
185             {
186                 printf("?\n");
187                 continue;
188             }
189             for (int i = 0; i < tot; i++)
190             {
191                 for (int j = ans[i]; j < ans[i]+ua; j++)
192                 {
193                     printf("%c", s[j]);
194                 }
195                 printf("\n");
196             }
197         }
198     }
199     return 0;
200 }
 
时间: 2024-10-22 22:06:08

POJ3294--Life Forms 后缀数组+二分答案 大于k个字符串的最长公共子串的相关文章

POJ 3080 Blue Jeans(后缀数组+二分答案)

[题目链接] http://poj.org/problem?id=3080 [题目大意] 求k个串的最长公共子串,如果存在多个则输出字典序最小,如果长度小于3则判断查找失败. [题解] 将所有字符串通过拼接符拼成一个串,做一遍后缀数组,二分答案,对于二分所得值,将h数组大于这个值的相邻元素分为一组,判断组内元素是否覆盖全字典,是则答案成立,对于答案扫描sa,输出第一个扫描到的子串即可. [代码] #include <cstdio> #include <cstring> #inclu

后缀数组(多个字符串的最长公共子串)—— POJ 3294

对应POJ 题目:点击打开链接 Life Forms Time Limit:6666MS     Memory Limit:0KB     64bit IO Format:%lld & %llu Submit Status Description Problem C: Life Forms You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial tra

cogs249 最长公共子串(后缀数组 二分答案

http://cogs.pro:8080/cogs/problem/problem.php?pid=pxXNxQVqP 题意:给m个单词,让求最长公共子串的长度. 思路:先把所有单词合并成一个串(假设长度是n,包含分隔符),中间用不同符号分隔,求出high[i](表示rk为i的和rk为i+1的后缀的最长公共前缀),然后二分答案ans,对于rk从1扫到n,如果有一段连续的rk值使得high[rk]>=ans且这段的串盖满了每个单词块,那么ans成立,即最终答案大于ans. #include <a

POJ 3294 Life Forms(后缀数组+二分答案)

[题目链接] http://poj.org/problem?id=3294 [题目大意] 求出在至少在一半字符串中出现的最长子串. 如果有多个符合的答案,请按照字典序输出. [题解] 将所有的字符串通过不同的拼接符相连,作一次后缀数组, 二分答案的长度,然后在h数组中分组,判断是否可行, 按照sa扫描输出长度为L的答案即可.注意在一个子串中重复出现答案串的情况. [代码] #include <cstdio> #include <cstring> #include <vecto

Poj 3294 Life Forms (后缀数组 + 二分 + Hash)

题目链接: Poj 3294 Life Forms 题目描述: 有n个文本串,问在一半以上的文本串出现过的最长连续子串? 解题思路: 可以把文本串用没有出现过的不同字符连起来,然后求新文本串的height.然后二分答案串的长度K,根据K把新文本串的后缀串分块,统计每块中的原文本串出现的次数,大于原文本串数目的一半就作为答案记录下来,对于输出字典序,height就是排好序的后缀数组,只要按照顺序输出即可. 1 #include <cstdio> 2 #include <cstring>

POJ 1743 Musical Theme(后缀数组+二分答案)

[题目链接] http://poj.org/problem?id=1743 [题目大意] 给出一首曲子的曲谱,上面的音符用不大于88的数字表示, 现在请你确定它主旋律的长度,主旋律指的是出现超过一次, 并且长度不小于5的最长的曲段,主旋律出现的时候并不是完全一样的, 可能经过了升调或者降调,也就是说, 是原来主旋律所包含的数字段同时加上或者减去一个数所得, 当然,两段主旋律之间也是不能有重叠的,现在请你求出这首曲子主旋律的长度, 如果不存在请输出0. [题解] 首先要处理的是升调和降调的问题,由

SPOJ 220 Relevant Phrases of Annihilation(后缀数组+二分答案)

[题目链接] http://www.spoj.pl/problems/PHRASES/ [题目大意] 求在每个字符串中出现至少两次的最长的子串 [题解] 注意到这么几个关键点:最长,至少两次,每个字符串. 首先对于最长这个条件,我们可以想到二分答案, 然后利用后缀数组所求得的三个数组判断是否满足条件. 其次是出现两次,每次出现这个条件的时候, 我们就应该要想到这是最大值最小值可以处理的, 将出现在同一个字符串中的每个相同字符串的起始位置保存下来, 如果最小值和最大值的差距超过二分长度L,则表明在

POJ3294:Life Forms(后缀数组)

Description You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or

POJ 3261 Milk Patterns(后缀数组+二分答案)

[题目链接] http://poj.org/problem?id=3261 [题目大意] 求最长可允许重叠的出现次数不小于k的子串. [题解] 对原串做一遍后缀数组,二分子串长度x,将前缀相同长度超过x的后缀分组, 如果存在一个大小不小于k的分组,则说明答案可行,分治得到最大可行解就是答案. [代码] #include <cstdio> #include <cstring> #include <vector> using namespace std; const int