SPOJ Repeats(后缀数组+RMQ)

REPEATS - Repeats

no tags

A string s is called an (k,l)-repeat if s is obtained by concatenating k>=1 times some seed string t with length l>=1. For example, the string

s = abaabaabaaba

is a (4,3)-repeat with t = aba as its seed string. That is, the seed string t is 3 characters long, and the whole string s is obtained by repeating t 4 times.

Write a program for the following task: Your program is given a long string u consisting of characters ‘a’ and/or ‘b’ as input. Your program must find some (k,l)-repeat that occurs as substring within u with k as large as possible. For example, the input string

u = babbabaabaabaabab

contains the underlined (4,3)-repeat s starting at position 5. Since u contains no other contiguous substring with more than 4 repeats, your program must output the maximum k.

Input

In the first line of the input contains H- the number of test cases (H <= 20). H test cases follow. First line of each test cases is n - length of the input string (n <= 50000), The next n lines contain the input string, one character (either ‘a’ or ‘b’) per line, in order.

Output

For each test cases, you should write exactly one interger k in a line - the repeat count that is maximized.

Example

Input:
1
17
b
a
b
b
a
b
a
a
b
a
a
b
a
a
b
a
b

Output:
4

since a (4, 3)-repeat is found starting at the 5th character of the input string.

Submit solution!

题目链接:SPOJ Repeats

论文里写的比较模糊,突然就往后匹配了,还往前匹配,完全没讲怎么匹配啊,代码还是看这个博客写的:传送门

说一下个人理解,为什么$LCP(i,i+L)/len+1$就是出现的次数?

首先对于一个由循环节构成的字符串$str$,假设它的长度为$len$,最小循环节长度为$k$,那么对于任意的$0 \le i \le len-1-k$,都有$str[i]==str[k+i]$

现在回到LCP问题上,假设两个串的公共前缀已知记为$lcp$,我们枚举的循环节长度为$L$,当前遍历位置为$i$,那么显然有$S[i+j]==S[i+L+j], 0 \le j \le lcp-1$,看这条式子,是不是跟上面的定义式子很像,显然有$len-1-k=lcp-1$,化简得$len=lcp+k$,因此仅仅往后推的循环次数是$(lcp+k)/k=lcp/k+1$,那么仅仅是往后推的最优解,那说不定前面刚好多了几个位置也是相同前缀,跟$lcp\%L$多出来的数凑一凑又是$L$呢?如果这样要至少补$lcp-lcp\%L$,因此我们枚举这个"至少"的位置$i-(lcp-lcp\%L)$,如果这个位置都可以和后面多余的补出一个$L$,那么往前也肯定是可以的,这里可能又回想,那干嘛不再往前考虑考虑,补出2个、3个、4个甚至更多的L呢,应该是没这个必要,因为假如你前面可以补更多的L,那么在前几次遍历的时候它早就被算进了往后推的$lcp$里了,不需要多往前考虑,当然全过程要注意下标是否合法,往前推到负数位置肯定是不行的。还有就是这个题一开始的答案一定要是1,因为1是肯定可以的,因此我们是从$L=2$开始枚举

代码:

#include <stdio.h>
#include <iostream>
#include <algorithm>
#include <cstdlib>
#include <cstring>
#include <bitset>
#include <string>
#include <stack>
#include <cmath>
#include <queue>
#include <set>
#include <map>
using namespace std;
#define INF 0x3f3f3f3f
#define LC(x) (x<<1)
#define RC(x) ((x<<1)+1)
#define MID(x,y) ((x+y)>>1)
#define fin(name) freopen(name,"r",stdin)
#define fout(name) freopen(name,"w",stdout)
#define CLR(arr,val) memset(arr,val,sizeof(arr))
#define FAST_IO ios::sync_with_stdio(false);cin.tie(0);
typedef pair<int, int> pii;
typedef long long LL;
const double PI = acos(-1.0);
const int N = 50010;
int wa[N], wb[N], cnt[N], sa[N];
int ran[N], height[N];
char s[N];

inline int cmp(int r[], int a, int b, int d)
{
    return r[a] == r[b] && r[a + d] == r[b + d];
}
void DA(int n, int m)
{
    int i;
    int *x = wa, *y = wb;
    for (i = 0; i < m; ++i)
        cnt[i] = 0;
    for (i = 0; i < n; ++i)
        ++cnt[x[i] = s[i]];
    for (i = 1; i < m; ++i)
        cnt[i] += cnt[i - 1];
    for (i = n - 1; i >= 0; --i)
        sa[--cnt[x[i]]] = i;
    for (int k = 1; k <= n; k <<= 1)
    {
        int p = 0;
        for (i = n - k; i < n; ++i)
            y[p++] = i;
        for (i = 0; i < n; ++i)
            if (sa[i] >= k)
                y[p++] = sa[i] - k;
        for (i = 0; i < m; ++i)
            cnt[i] = 0;
        for (i = 0; i < n; ++i)
            ++cnt[x[y[i]]];
        for (i = 1; i < m; ++i)
            cnt[i] += cnt[i - 1];
        for (i = n - 1; i >= 0; --i)
            sa[--cnt[x[y[i]]]] = y[i];
        swap(x, y);
        x[sa[0]] = 0;
        p = 1;
        for (i = 1; i < n; ++i)
            x[sa[i]] = cmp(y, sa[i - 1], sa[i], k) ? p - 1 : p++;
        m = p;
        if (m >= n)
            break;
    }
}
void gethgt(int n)
{
    int i, k = 0;
    for (i = 1; i <= n; ++i)
        ran[sa[i]] = i;
    for (i = 0; i < n; ++i)
    {
        if (k)
            --k;
        int j = sa[ran[i] - 1];
        while (s[j + k] == s[i + k])
            ++k;
        height[ran[i]] = k;
    }
}
namespace SG
{
    int dp[N][17];
    void init(int l, int r)
    {
        int i, j;
        for (i = l; i <= r; ++i)
            dp[i][0] = height[i];
        for (j = 1; l + (1 << j) - 1 <= r; ++j)
        {
            for (i = l; i + (1 << j) - 1 <= r; ++i)
                dp[i][j] = min(dp[i][j - 1], dp[i + (1 << (j - 1))][j - 1]);
        }
    }
    int ask(int l, int r)
    {
        int len = r - l + 1;
        int k = 0;
        while (1 << (k + 1) <= len)
            ++k;
        return min(dp[l][k], dp[r - (1 << k) + 1][k]);
    }
    int LCP(int l, int r, int len)
    {
        l = ran[l], r = ran[r];
        if (l > r)
            swap(l, r);
        if (l == r)
            return len - sa[l];
        return ask(l + 1, r);
    }
}
int main(void)
{
    int T, len, i;
    scanf("%d", &T);
    while (T--)
    {
        scanf("%d", &len);
        for (i = 0; i < len; ++i)
            scanf("%s", s + i);
        DA(len + 1, 130);
        gethgt(len);
        SG::init(1, len);
        int ans = 1;
        for (int L = 1; L < len; ++L)
        {
            for (i = 0; i + L < len; i += L)
            {
                int lcp = SG::LCP(i, i + L, len);
                int cnt = lcp / L + 1;
                int j = i - (L - lcp % L);
                if (j >= 0)
                    cnt = max(cnt, SG::LCP(j , j + L, len) / L + 1);
                ans = max(ans, cnt);
            }
        }
        printf("%d\n", ans);
    }
    return 0;
}
时间: 2024-10-01 07:39:14

SPOJ Repeats(后缀数组+RMQ)的相关文章

SPOJ REPEATS 后缀数组

题目链接:http://www.spoj.com/problems/REPEATS/en/ 题意:首先定义了一个字符串的重复度.即一个字符串由一个子串重复k次构成.那么最大的k即是该字符串的重复度.现在给定一个长度为n的字符串,求最大重复次数. 思路:根据<<后缀数组——处理字符串的有力工具>>的思路,先穷举长度L,然后求长度为L 的子串最多能连续出现几次.首先连续出现1 次是肯定可以的,所以这里只考虑至少2 次的情况.假设在原字符串中连续出现2 次,记这个子字符串为S,那么S 肯

【uva10829-求形如UVU的串的个数】后缀数组+rmq or 直接for水过

题意:UVU形式的串的个数,V的长度规定,U要一样,位置不同即为不同字串 https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&category=&problem=1770 题解:一开始理解错题意,以为是abcxxxcba(xxx为v),开心地打了后缀数组后发现哎样例不对丫.. UVA的意思是abcxxxabc(xxx为v). 类似poj3693,我们暴

BZOJ 题目3172: [Tjoi2013]单词(AC自动机||AC自动机+fail树||后缀数组暴力||后缀数组+RMQ+二分等五种姿势水过)

3172: [Tjoi2013]单词 Time Limit: 10 Sec  Memory Limit: 512 MB Submit: 1890  Solved: 877 [Submit][Status][Discuss] Description 某人读论文,一篇论文是由许多单词组成.但他发现一个单词会在论文中出现很多次,现在想知道每个单词分别在论文中出现多少次. Input 第一个一个整数N,表示有多少个单词,接下来N行每行一个单词.每个单词由小写字母组成,N<=200,单词长度不超过10^6

SPOJ 220后缀数组:求每个字符串至少出现两次且不重叠的最长子串

思路:也是n个串连接成一个串,中间用没出现过的字符隔开,然后求后缀数组. 因为是不重叠的,所以和POJ 1743判断一样,只不过这里是多个串,每个串都要判断里面的最长公共前缀有没有重叠,所以用数组存下来就得了,然后再判断. #include<iostream> #include<cstdio> #include<cstring> #include<algorithm> #include<map> #include<queue> #in

SPOJ687---REPEATS - Repeats(后缀数组+RMQ)

A string s is called an (k,l)-repeat if s is obtained by concatenating k>=1 times some seed string t with length l>=1. For example, the string s = abaabaabaaba is a (4,3)-repeat with t = aba as its seed string. That is, the seed string t is 3 charac

SPOJ SUBST1 POJ 2406 POJ REPEATS 后缀数组小结

//聪神说:做完了题目记得总结,方便以后复习. SPOJ SUBST1 题目链接:点击打开链接 题意:给一个字符串,求不同子串个数. 思路:假设所有子串都不同,答案为len*(len+1)/2;然而不是这样... 下面我们就找出重复的子串: 首先先将后缀排序,对于后缀i能生成len-sa[i]个子串,这其中有height[i]个子串与第i-1个后缀生成的子串重复了: 所以答案为 len*(len+1)/2-segema(height[i]) . cpp代码: //spoj disubstr #i

SPOJ - REPEATS Repeats (后缀数组)

A string s is called an (k,l)-repeat if s is obtained by concatenating k>=1 times some seed string t with length l>=1. For example, the string s = abaabaabaaba is a (4,3)-repeat with t = aba as its seed string. That is, the seed string t is 3 charac

POJ 3693 后缀数组+RMQ

点击打开链接 题意:问连续重复部分最多的串是什么,不能重叠,且我们要字典序最小的串如xbcabcab,有bcabca重复次数为2,cabcab重复次数也为2,那么要前边那个 思路:以前写过一个类似的,SPOJ 687,这个只是求连续重复部分最多的串的次数,并不需要将按字典序最小串输出,那么我们可以用到SPOJ687的代码,用它我们可以求出那个重复的次数和满足这个次数的串的长度,那么就只差找到字典序最小的那个串了,而我们知道后缀数组的sa数组就是按字典序来的嘛,从字典序最小开始找,找到就跳出,输出

HDU_6194 后缀数组+RMQ

好绝望的..想了五个多小时,最后还是没A...赛后看了下后缀数组瞬间就有了思路...不过因为太菜,想了将近两个小时才吧这个题干掉. 首先,应当认为,后缀数组的定义是,某字符串S的所有后缀按照字典序有小到大的顺序排列(使用下标表示后缀).因为具体过程没太看懂,但是参见刘汝佳蓝书<算法竞赛黑暗圣典>可以得到一个聪明的NLOGN的神器算法.不过这个不太重要. 之后还可以通过他在LCP问题中提到的RANK,height数组相关算法,处理出来height数组,之后其他的可以扔掉. <黑暗圣典>