后缀数组(至少重复k次的可重叠的最长重复子串)—— POJ 3882

对应POJ 题目:点击打开链接

Stammering Aliens

Time Limit:3000MS     Memory Limit:0KB     64bit IO Format:%lld
& %llu

Submit Status

Description

Dr. Ellie Arroway has established contact with an extraterrestrial civilization. However, all efforts to decode their messages have failed so far because, as luck would have it, they have stumbled upon a race of stuttering aliens! Her team has found out that,
in every long enough message, the most important words appear repeated a certain number of times as a sequence of consecutive characters, even in the middle of other words. Furthermore, sometimes they use contractions in an obscure manner. For example, if
they need to say babtwice, they might just send the message babab, which has been abbreviated because the second b of the first word can be reused as the
first b of the second one.

Thus, the message contains possibly overlapping repetitions of the same words over and over again. As a result, Ellie turns to you, S.R. Hadden, for help in identifying the gist of the message.

Given an integer m, and a string s, representing the message, your task is to find the longest substring of s that
appears at least m times. For example, in the message baaaababababbababbab, the length-5 word babab is contained 3 times,
namely at positions 5, 7 and 12(where indices start at zero). No substring appearing 3 or more times is longer (see the first example from the sample input).
On the other hand, no substring appears 11 times or more (see example 2).

In case there are several solutions, the substring with the rightmost occurrence is preferred (see example 3).

Input

The input contains several test cases. Each test case consists of a line with an integer m ( m1),
the minimum number of repetitions, followed by a line containing a string s of length between m and 40 000, inclusive. All characters in s are
lowercase characters from ``a‘‘ to ``z‘‘. The last test case is denoted by m = 0 and must not be processed.

Output

Print one line of output for each test case. If there is no solution, output none; otherwise, print two integers in a line, separated by a space. The first integer denotes the maximum length of a substring appearing at least m times;
the second integer gives the rightmost possible starting position of such a substring.

Sample Input

3
baaaababababbababbab
11
baaaababababbababbab
3
cccccc
0

Sample Output

5 12
none
4 2

题意:给定一个n和一个字符串,求至少重复k次的可重叠的最长子串的长度和位于最右边的该子串的起始下标。如果不存在则输出none。

思路:后缀数组的基础应用,二分答案分组求最大长度不是问题,反而在求下标那卡了一下。。。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MS(x, y) memset(x, y, sizeof(x))
const int MAXN = 40000+10;

int wa[MAXN],wb[MAXN],wv[MAXN],ws[MAXN];
int rank[MAXN],r[MAXN],sa[MAXN],height[MAXN];
char str[MAXN];

int cmp(int *r, int a, int b, int l)
{
	return r[a] == r[b] && r[a+l] == r[b+l];
}

void da(int *r, int *sa, int n, int m)
{
	int i, j, p, *x = wa, *y = wb, *t;

	for(i=0; i<m; i++) ws[i] = 0;
	for(i=0; i<n; i++) ws[x[i] = r[i]]++;
	for(i=1; i<m; i++) ws[i] += ws[i-1];
	for(i=n-1; i>=0; i--) sa[--ws[x[i]]] = i;

	for(j=1,p=1; p<n; j<<=1, m=p){

		for(p=0,i=n-j; i<n; i++) y[p++] = i;
		for(i=0; i<n; i++) if(sa[i] >= j) y[p++] = sa[i] - j;

		for(i=0; i<n; i++) wv[i] = x[y[i]];
		for(i=0; i<m; i++) ws[i] = 0;
		for(i=0; i<n; i++) ws[wv[i]]++;
		for(i=1; i<m; i++) ws[i] += ws[i-1];
		for(i=n-1; i>=0; i--) sa[--ws[wv[i]]] = y[i];

		for(t=x,x=y,y=t,p=1,x[sa[0]]=0,i=1; i<n; i++)
			x[sa[i]] = cmp(y, sa[i-1], sa[i], j) ? p-1 : p++;

	}
	return;
}

void calheight(int *r, int *sa, int n)
{
	int i, j, k = 0;
	for(i=1; i<n; i++) rank[sa[i]] = i;
	for(i=0; i<n-1; height[rank[i++]] = k)
		for(k ? k-- : 0,j=sa[rank[i]-1]; r[i+k] == r[j+k]; k++);
	return;
}

int main()
{
	//freopen("in.txt", "r", stdin);
	int n;
	while(~scanf("%d", &n), n)
	{
		MS(rank, 0);
		MS(sa, 0);
		MS(wa, 0);
		MS(wb, 0);
		MS(ws, 0);
		MS(wv, 0);
		MS(r, 0);
		MS(height, 0);
		scanf("%s", str);
		int len = strlen(str);
		if(1 == n){
			printf("%d 0\n", len);
			continue;
		}
		int maxn = 0;
		for(int i=0; i<len; i++){
			r[i] = str[i] - 'a' + 1;
			if(r[i] > maxn) maxn = r[i];
		}
		r[len++] = 0;//末尾添加一个最小值
		da(r, sa, len, maxn+1);
		calheight(r, sa, len);
#if 0
		printf("rank  : ");
		for(int i=0; i<len; i++)
			printf(" %d", rank[i]);
		printf("\n");

		printf("sa    : ");
		for(int i=0; i<len; i++)
			printf(" %d", sa[i]);
		printf("\n");

		printf("height: ");
		for(int i=0; i<len; i++)
			printf(" %d", height[i]);
		printf("\n");
#endif
		int left = 0, right = len-1;
		int mlen = 0, max1 = 0, max2 = 0, max3 = 0;
		int beg = 0, end = 0, ok;
		while(left <= right)
		{
			ok = 0;
			int mid = left + (right - left)/2;//二分答案
			max1 = max2 = 0;

			for(int i=2; i<len; i++){
				if(height[i] >= mid){//确定某一组的起点终点
					if(!beg) beg = i;
					end = i;
				}
				if((beg && end) && (i == len - 1 || height[i] < mid)){
					if(end - beg + 2 >= n){//符合题意的组
						max1 = 0;
						for(int i=beg-1; i<=end; i++)//求该组最右边的下标
							if(sa[i] > max1) max1 = sa[i];
						mlen = mid;
						if(max1 > max2) max2 = max1;
						ok = 1;
					}
					beg = end = 0;
				}
			}

			if(ok) max3 = max2;
			if(ok) left = mid + 1;
			else right = mid - 1;
		}
		if(mlen) printf("%d %d\n", mlen, max3);
		else printf("none\n");
	}
}
时间: 2024-10-12 13:29:10

后缀数组(至少重复k次的可重叠的最长重复子串)—— POJ 3882的相关文章

SPOJ 220后缀数组:求每个字符串至少出现两次且不重叠的最长子串

思路:也是n个串连接成一个串,中间用没出现过的字符隔开,然后求后缀数组. 因为是不重叠的,所以和POJ 1743判断一样,只不过这里是多个串,每个串都要判断里面的最长公共前缀有没有重叠,所以用数组存下来就得了,然后再判断. #include<iostream> #include<cstdio> #include<cstring> #include<algorithm> #include<map> #include<queue> #in

POJ 题目3294Life Forms(后缀数组求超过k个的串的最长公共子串)

Life Forms Time Limit: 5000MS   Memory Limit: 65536K Total Submissions: 11178   Accepted: 3085 Description You may have wondered why most extraterrestrial life forms resemble humans, differing by superficial traits such as height, colour, wrinkles, e

HDU 5008西安网络赛B题:后缀数组求第k小子串

思路:尼玛,这题搞了一天了,比赛的时候用了n^2的方法绝对T了,然后今天看别人代码看了一天才知道.后面感觉也挺容易的,就是没想到,之前做过SPOJ 694 705求过不同子串了,知道怎么求不同子串个数了,但是比赛的时候这个技巧竟然抛在脑后了,然后就不会了. 但是今天自己用了自己的两个后缀数组的模板(倍增和DC3)的都WA了,搞得自己真想跳楼去了!! 到现在都不知道到底是哪里错了,处理的方法和标准做法都一样,但是就是WA,然后用了别人的模板,再用自己的处理方法就过了,怀疑自己的两个模板是不是哪里错

字符串----不可重叠的最长重复子串

题目:给定一个字符串,求最长重复子串,这两个子串不能重叠.例如,str = "acdcdcdcd",则不可重叠的最长子串为"cdcd". 思路:二分枚举+height数组分组.这道题的思想很巧妙,后面要仔细推敲.先二分答案,把题目变成判定性问题:判断是否存在两个长度为k的子串是相同的,且不重叠.解决这个问题的关键还是利用height数组.把排序后的后缀分成若干组,其中每组的后缀之间的height值都不小于k.例如,字符串为“aabaaaab”,当k=2时,后缀分成了

POJ 1743 不可重叠的最长重复子串

原问题,其实是找最长的相似子串,所谓相似就是一个子串每个值加上一个偏移值可以得到另一个子串. 我们先求原数组的差值数组,对新数组求后缀数组,二分答案,判定是否有某个Height数组中的sa最小值与最大值之差大于当前枚举的子串长度. #include <iostream> #include <vector> #include <algorithm> #include <string> #include <string.h> #include <

hihocoder-1415 后缀数组三&#183;重复旋律3 两个字符串的最长公共子串

把s1,s2拼接,求Height.相邻的Height判断左右串起点是否在两个串中,另外对Height和s1.length()-SA[i-1]取min. #include <iostream> #include <cstring> #include <string> #include <queue> #include <vector> #include <map> #include <set> #include <st

hdu-6194 string string string 后缀数组 出现恰好K次的串的数量

最少出现K次我们可以用Height数组的lcp来得出,而恰好出现K次,我们只要除去最少出现K+1次的lcp即可. #include <cstdio> #include <cstring> #include <algorithm> #include <iostream> using namespace std; const int maxn = 100000 + 10; int t1[maxn], t2[maxn], c[maxn]; bool cmp(int

poj 1743 二分答案+后缀数组 求不重叠的最长重复子串

题意:给出一串序列,求最长的theme长度 (theme:完全重叠的子序列,如1 2 3和1 2 3  or  子序列中每个元素对应的差相等,如1 2 3和7 8 9) 要是没有差相等这个条件那就好办多了,直接裸题. 一开始想了个2B方法,后来发现真心2B啊蛤蛤蛤 1 for i=1 to 88 do 2 { 3 for j=1 to length 4 { 5 r2[j]=r[j]+i; 6 if (r2[j]>88) r2[i]-=88; 7 } 8 把新序列r2连接到原序列r的后面 9 pr

FOJ 题目 2075 Substring (后缀数组求出现k次的最小字典序子串)

Problem 2075 Substring Accept: 70    Submit: 236 Time Limit: 1000 mSec    Memory Limit : 65536 KB Problem Description Given a string, find a substring of it which the original string contains exactly n such substrings. Input There are several cases.