POJ 3294 Life Forms [最长公共子串加强版 后缀数组 && 二分]

题目:http://poj.org/problem?id=3294

Life Forms

Time Limit: 5000MS   Memory Limit: 65536K
Total Submissions: 18549   Accepted: 5454

Description

You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, ears, eyebrows and the like. A few bear no human resemblance; these typically have geometric or amorphous shapes like cubes, oil slicks or clouds of dust.

The answer is given in the 146th episode of Star Trek - The Next Generation, titled The Chase. It turns out that in the vast majority of the quadrant‘s life forms ended up with a large fragment of common DNA.

Given the DNA sequences of several life forms represented as strings of letters, you are to find the longest substring that is shared by more than half of them.

Input

Standard input contains several test cases. Each test case begins with 1 ≤ n ≤ 100, the number of life forms. n lines follow; each contains a string of lower case letters representing the DNA sequence of a life form. Each DNA sequence contains at least one and not more than 1000 letters. A line containing 0 follows the last test case.

Output

For each test case, output the longest string or strings shared by more than half of the life forms. If there are many, output all of them in alphabetical order. If there is no solution with at least one letter, output "?". Leave an empty line between test cases.

Sample Input

3
abcdefg
bcdefgh
cdefghi
3
xxx
yyy
zzz
0

Sample Output

bcdefg
cdefgh

?

Source

Waterloo Local Contest, 2006.9.30

题意概括:

给出 N 个字符串,求其中出现次数超过 N/2 次的最长公共子串,如果有多种输出多种。

解题思路:

做法依然是二分答案长度,关键在于判断条件有两个:

①出现次数是否大于 N/2,这个通过height分组,统计一下即可。

②当前所枚举的子串不仅要求不能重叠,而且要满足来源于原本不同的字符串(因为合并了所有字符串,所以以原来字符串分区,判断两个子串要在不同区)

二分不重叠相同子串的加强版,网上很多版本都是暴力 O( n ) 判断子串是否来自不同串的,复杂度有点爆炸。

这道题复杂度的优化关键在于优化这个判断条件。

有个技巧:合并字符串时在中间加入分隔标志,后面通过 O(1) 标记即可判断是否满足区间要求。

输出子串的话,只要保存满足条件的 sa 即可。

AC code:

  1 #include <set>
  2 #include <map>
  3 #include <cmath>
  4 #include <vector>
  5 #include <cstdio>
  6 #include <cstring>
  7 #include <string>
  8 #include <iostream>
  9 #include <algorithm>
 10 #define INF 0x3f3f3f3f
 11 #define LL long long
 12 #define inc(i, j, k) for(int i = j; i <= k ; i++)
 13 #define mem(i, j) memset(i, j, sizeof(i))
 14 #define gcd(i, j) __gcd(i, j)
 15 #define F(x) ((x)/3+((x)%3==1?0:tb))
 16 #define G(x) ((x)<tb?(x)*3+1:((x)-tb)*3+2)
 17 using namespace std;
 18 const int MAXN = 3e5+10;
 19 const int maxn = 3e5+10;
 20 int r[MAXN];
 21 int wa[MAXN], wb[MAXN], wv[MAXN], tmp[MAXN];
 22 int sa[MAXN]; //index range 1~n value range 0~n-1
 23 int cmp(int *r, int a, int b, int l)
 24 {
 25     return r[a] == r[b] && r[a + l] == r[b + l];
 26 }
 27
 28 void da(int *r, int *sa, int n, int m)
 29 {
 30     int i, j, p, *x = wa, *y = wb, *ws = tmp;
 31     for (i = 0; i < m; i++) ws[i] = 0;
 32     for (i = 0; i < n; i++) ws[x[i] = r[i]]++;
 33     for (i = 1; i < m; i++) ws[i] += ws[i - 1];
 34     for (i = n - 1; i >= 0; i--) sa[--ws[x[i]]] = i;
 35     for (j = 1, p = 1; p < n; j *= 2, m = p)
 36     {
 37         for (p = 0, i = n - j; i < n; i++) y[p++] = i;
 38         for (i = 0; i < n; i++)
 39             if (sa[i] >= j) y[p++] = sa[i] - j;
 40         for (i = 0; i < n; i++) wv[i] = x[y[i]];
 41         for (i = 0; i < m; i++) ws[i] = 0;
 42         for (i = 0; i < n; i++) ws[wv[i]]++;
 43         for (i = 1; i < m; i++) ws[i] += ws[i - 1];
 44         for (i = n - 1; i >= 0; i--) sa[--ws[wv[i]]] = y[i];
 45         for (swap(x, y), p = 1, x[sa[0]] = 0, i = 1; i < n; i++)
 46             x[sa[i]] = cmp(y, sa[i - 1], sa[i], j) ? p - 1 : p++;
 47     }
 48 }
 49
 50 int Rank[MAXN]; //index range 0~n-1 value range 1~n
 51 int height[MAXN]; //index from 1   (height[1] = 0)
 52 void calheight(int *r, int *sa, int n)
 53 {
 54     int i, j, k = 0;
 55     for (i = 1; i <= n; ++i) Rank[sa[i]] = i;
 56     for (i = 0; i < n; height[Rank[i++]] = k)
 57         for (k ? k-- : 0, j = sa[Rank[i] - 1]; r[i + k] == r[j + k]; ++k);
 58     return;
 59 }
 60
 61 int N;
 62 string tp;
 63 vector<int>ans_id;
 64 int f[MAXN], kase;
 65
 66 bool check(int limit, int n, int len)
 67 {
 68     bool flag = false;
 69     int cnt = 1;
 70     ans_id.clear();
 71     f[sa[1]/len] = kase;
 72     for(int i = 1; i <= n; i++){
 73         if(height[i] < limit){          //按height分组
 74                 f[sa[i]/len] = ++kase;      //给区间标记上组的标号
 75                 cnt = 1;
 76         }
 77         else{
 78             if(f[sa[i]/len] != kase){       //判断一组中是否有相同区间
 79                 f[sa[i]/len] = kase;
 80                 if(cnt>=0) cnt++;
 81                 if(cnt > N/2){
 82                     flag = true;
 83                     ans_id.push_back(sa[i]);
 84                     cnt = -1;
 85                 }
 86             }
 87         }
 88     }
 89     return flag;
 90 }
 91
 92 int main()
 93 {
 94     bool book = false;
 95     int ssize, n_len = 0, ans;
 96     while(~scanf("%d", &N) && N){
 97         n_len = 0;
 98         kase = 1;
 99         ans = 0;
100         for(int i = 1; i <= N; i++){
101             cin >> tp;
102             ssize = tp.size();
103             for(int k = 0; k < ssize; k++){
104                 r[n_len++] = tp[k]+100;
105             }
106             r[n_len++] = i;                 //作分隔标记
107         }
108         n_len--;
109         r[n_len] = 0;
110
111         da(r, sa, n_len+1, 277);
112         calheight(r, sa, n_len);
113
114         int L = 0, R = ssize+1, mid;
115         while(L <= R){
116             mid = (L+R)>>1;
117             if(check(mid, n_len, ssize+1)){
118                 L = mid+1;
119                 ans = mid;
120             }
121             else R = mid-1;
122         }
123         check(ans, n_len, ssize+1);
124
125         if(book) puts("");
126         if(ans == 0) puts("?");
127         else{
128             int len = ans_id.size();
129 //            printf("%d\n", len);
130             for(int i = 0; i < len; i++){
131                 for(int k = ans_id[i]; k-ans_id[i]+1 <= ans; k++){
132                     printf("%c", r[k]-100);
133                 }
134                 puts("");
135             }
136         }
137         if(!book) book = true;
138     }
139     return 0;
140 }

422ms 3300k

原文地址:https://www.cnblogs.com/ymzjj/p/10693387.html

时间: 2024-08-06 01:15:53

POJ 3294 Life Forms [最长公共子串加强版 后缀数组 && 二分]的相关文章

cogs249 最长公共子串(后缀数组 二分答案

http://cogs.pro:8080/cogs/problem/problem.php?pid=pxXNxQVqP 题意:给m个单词,让求最长公共子串的长度. 思路:先把所有单词合并成一个串(假设长度是n,包含分隔符),中间用不同符号分隔,求出high[i](表示rk为i的和rk为i+1的后缀的最长公共前缀),然后二分答案ans,对于rk从1扫到n,如果有一段连续的rk值使得high[rk]>=ans且这段的串盖满了每个单词块,那么ans成立,即最终答案大于ans. #include <a

【poj1226-出现或反转后出现在每个串的最长公共子串】后缀数组

题意:求n个串的最长公共子串,子串出现在一个串中可以是它的反转串出现.总长<=10^4. 题解: 对于每个串,把反转串也连进去.二分长度,分组,判断每个组. 1 #include<cstdio> 2 #include<cstdlib> 3 #include<cstring> 4 #include<iostream> 5 using namespace std; 6 7 const int N=2*21000; 8 int n,sl,cl,c[N],rk

POJ 2774 Long Long Message(最长公共子串 -初学后缀数组)

后缀数组的两篇神论文: 国家集训队2004论文集 许智磊 算法合集之<后缀数组--处理字符串的有力工具> 很多人的模版都是用论文上的 包括kuangbin的模版:(DA算法) 模版中比较难理解的地方有两点1.按关键词排序 2.把字符串长度增加一位 按关键词排序的意思其实是基数排序中相当把两位数排序时先排个位,再排十位 这里也一样先排后2^k长度的字符串,再排前2^k长度的字符串,最终排成2^(k+1)字符长度的后缀数组sa 把字符串增加一位,是为了让有意义的串的rank从1开始,还有便于后边不

【wikioi】3160 最长公共子串(后缀自动机)

http://codevs.cn/problem/3160/ sam的裸题...(之前写了spoj上另一题sam的题目,但是spoj被卡评测现在还没评测完QAQ打算写那题题解时再来详细介绍sam的....那就再等等吧. 求两个串的lcs话,就是先建立a串的sam,然后用b的字串去匹配a中. 因为sam中每个状态的len对应最长子串,因此自动机不断trans匹配时,如果没找到下一个点,那么在parent树的祖先中找是否还有子串可以更新(因为祖先的max比这个节点小,且都包含当前状态的right,所

BZOJ 3230: 相似子串( RMQ + 后缀数组 + 二分 )

二分查找求出k大串, 然后正反做后缀数组, RMQ求LCP, 时间复杂度O(NlogN+logN) --------------------------------------------------------------------- #include<cstdio> #include<algorithm> #include<cstring> #include<cctype> using namespace std; typedef long long

POJ 3261 可重叠的 k 次最长重复子串【后缀数组】

这也是一道例题 给定一个字符串,求至少出现 k 次的最长重复子串,这 k 个子串可以重叠.算法分析:这题的做法和上一题差不多,也是先二分答案,然后将后缀分成若干组.不同的是,这里要判断的是有没有一个组的后缀个数不小于 k.如果有,那么存在k 个相同的子串满足条件,否则不存在.这个做法的时间复杂度为 O(nlogn). Source Code: //#pragma comment(linker, "/STACK:16777216") //for c++ Compiler #include

POJ-1743 Musical Theme(最长不可重叠子串,后缀数组+二分)

A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It is unfortunate but true that this representation of melodies ignores the notion of musical timing; b

后缀数组(多个字符串的最长公共子串)—— POJ 3294

对应POJ 题目:点击打开链接 Life Forms Time Limit:6666MS     Memory Limit:0KB     64bit IO Format:%lld & %llu Submit Status Description Problem C: Life Forms You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial tra

POJ 题目3294Life Forms(后缀数组求超过k个的串的最长公共子串)

Life Forms Time Limit: 5000MS   Memory Limit: 65536K Total Submissions: 11178   Accepted: 3085 Description You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, e