hdu 3518 Boring counting(后缀数组)

Boring counting

                                                                      Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768
K (Java/Others)

Problem Description

035 now faced a tough problem,his english teacher gives him a string,which consists with n lower case letter,he must figure out how many substrings appear at least twice,moreover,such apearances can not overlap each other.

Take aaaa as an example.”a” apears four times,”aa” apears two times without overlaping.however,aaa can’t apear more than one time without overlaping.since we can get “aaa” from [0-2](The position of string begins with 0) and [1-3]. But the interval [0-2] and
[1-3] overlaps each other.So “aaa” can not take into account.Therefore,the answer is 2(“a”,and “aa”).

Input

The input data consist with several test cases.The input ends with a line “#”.each test case contain a string consists with lower letter,the length n won’t exceed 1000(n <= 1000).

Output

For each test case output an integer ans,which represent the answer for the test case.you’d better use int64 to avoid unnecessary trouble.

Sample Input

aaaa
ababcabb
aaaaaa
#

Sample Output

2
3
3

题意:求有多少个子串在字符串中出现了至少两次,且子串没有重叠。

分析:若在假设重复子串的长度最多为L的限制下有解, 则对于任意一个比L小的限制L‘< L, 也一定有解. 这就说明存在解的连续性

因为LCP(sa[i], sa[j]) = RMQ(height[i+1..j]). 由此, 若存在k, 满足height[k] < L, 则对于所有i, j 满足i < k < j, 有LCP(sa[i], sa[j]) < L.

即公共长度至少为L的两个后缀, 不会跨过一个小于L的height低谷k, 所以我们可以得到一些由这些低谷划分开的连续的段.

解题方法:枚举字串长度L

对于每一次的h,利用height数组,找出连续的height大于等于h的里面最左端和最右端得为之l和r。

如果l + L <= r的话,说明没有重叠,答案加1.

因为在同一连续height大于等于L的区间中,公共前缀至少为L,这样就不用担心重复计数了,其他长度为L的重复子串会出现在另一个连续的大于等于L的height

#include<cstdio>
#include<cstring>
#include<algorithm>
using namespace std;
const int N = 1010;
char str[N];
int *rank, r[N], sa[N], height[N];
int wa[N], wb[N], wm[N];
bool comp(int *r, int a, int b, int l)
{
    return r[a] == r[b] && r[a+l] == r[b+l];
}
void get_sa(int *r, int *sa, int n, int m)
{
    int *x = wa, *y = wb, *t, i, j, p;
    for(i = 0; i < m; ++i) wm[i] = 0;
    for(i = 0; i < n; ++i) wm[x[i] = r[i]]++;
    for(i = 1; i < m; ++i) wm[i] += wm[i-1];
    for(i = n-1; i >= 0; --i) sa[--wm[x[i]]] = i;
    for(i = 0, j = 1, p = 0; p < n; j <<= 1, m = p) {
        for(p = 0, i = n - j; i < n; ++i) y[p++] = i;
        for(i = 0; i < n; ++i) if(sa[i] >= j) y[p++] = sa[i] - j;
        for(i = 0; i < m; ++i) wm[i] = 0;
        for(i = 0; i < n; ++i) wm[x[y[i]]]++;
        for(i = 1; i < m; ++i) wm[i] += wm[i-1];
        for(i = n-1; i >= 0; --i) sa[--wm[x[y[i]]]] = y[i];
        for(t = x, x = y, y = t, i = p = 1, x[sa[0]] = 0; i < n; ++i) {
            x[sa[i]] = comp(y, sa[i], sa[i-1], j) ? p-1 : p++;
        }
    }
    rank = x;
}
void get_height(int *r, int *sa, int n)
{
    for(int i = 0, j = 0, k = 0; i < n; height[rank[i++]] = k) {
        for(k ? --k : 0, j = sa[rank[i]-1]; r[i+k] == r[j+k]; ++k);
    }
}
int main()
{
    while(~scanf("%s",str)) {
        if(str[0] == '#') break;
        int len = strlen(str);
        for(int i = 0; i < len; i++)
            r[i] = str[i];
        r[len] = 0; //要比可能出现的所有值都要小
        get_sa(r, sa, len+1, 256);
        get_height(r, sa, len);
        int ans = 0, minid, maxid;
        for(int i = 1; i <= (len+1)/2; ++i) { //查一半就好了,长度大于(len+1)/2的子串不可能重复两次
            minid = 1010, maxid = -1;
            for(int j = 1; j <= len; ++j) {
                if(height[j] >= i) {
                    if(sa[j-1] < minid) minid = sa[j-1];
                    if(sa[j-1] > maxid) maxid = sa[j-1];
                    if(sa[j] < minid) minid = sa[j];
                    if(sa[j] > maxid) maxid = sa[j];
                }
                else {
                    if(maxid != -1 && minid + i <= maxid) ans++;
                    minid = 1010, maxid = -1;
                }
            }
            if(maxid != -1 && minid + i <= maxid) ans++;
        }
        printf("%d\n", ans);
    }
    return 0;
}

hdu 3518 Boring counting(后缀数组)

时间: 2024-08-04 22:23:19

hdu 3518 Boring counting(后缀数组)的相关文章

hdu 3518 Boring counting 后缀数组LCP

题目链接 题意:给定长度为n(n <= 1000)的只含小写字母的字符串,问字符串子串不重叠出现最少两次的不同子串个数; input: aaaa ababcabb aaaaaa # output 2 3 3 思路:套用后缀数组求解出sa数组和height数组,之后枚举后缀的公共前缀长度i,由于不能重叠,所以计数的是相邻height不满足LCP >= i的. 写写对后缀数组倍增算法的理解: 1.如果要sa数组对应的值也是1~n就需要在最后加上一个最小的且不出现的字符'#',里面y[]是利用sa数

hdu 3518 Boring counting 后缀数组 height分组

题目链接 题意 对于给定的字符串,求有多少个 不重叠的子串 出现次数 \(\geq 2\). 思路 枚举子串长度 \(len\),以此作为分界值来对 \(height\) 值进行划分. 显然,对于每一组,组内子串具有一个长度为 \(len\) 的公共前缀. 至于是否重叠,只需判断 \(sa_{max}-sa_{min}\geq len\). 对于组间,显然它们的公共前缀互不相同,所以答案即为\(\sum_{len}\sum_{group}\) Code #include <bits/stdc++

后缀数组 --- HDU 3518 Boring counting

Boring counting Problem's Link:   http://acm.hdu.edu.cn/showproblem.php?pid=3518 Mean: 给你一个字符串,让你求出有多少个子串(无重叠)至少出现了两次. analyse: 后缀数组中height数组的运用,一般这个数组用得很少. 总体思路:分组统计的思想:将相同前缀的后缀分在一个组,然后对于1到len/2的每一个固定长度进行统计ans. 首先我们先求一遍后缀数组,并把height数组求出来.height数组代表的

HDU 3518 Boring counting(后缀数组啊 求字符串中不重叠的重复出现至少两次的子串的个数)

题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3518 Problem Description 035 now faced a tough problem,his english teacher gives him a string,which consists with n lower case letter,he must figure out how many substrings appear at least twice,moreover

[HDU]3518——Boring counting

zhan.jiang.ou now faced a tough problem,his english teacher quan.hong.chun gives him a string,which consists with n lower case letter,he must figure out how many substrings appear at least twice,moreover,such apearances can not overlap each other.Tak

HDOJ 题目3518 Boring counting(后缀数组,求不重叠重复次数最少为2的子串种类数)

Boring counting Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others) Total Submission(s): 2253    Accepted Submission(s): 924 Problem Description 035 now faced a tough problem,his english teacher gives him a string,whic

HDOJ 3518 Boring counting

SAM基本操作 拓扑求每个节点的  最左出现left,最右出现right,出现了几次num ...... 对于每一个出现两次以上的节点,对其所对应的一串子串的长度范围 [fa->len+1,len] 和其最大间距 right-left比较 即可...... Boring counting Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others) Total Submission(s):

hdu 4029 Distinct Sub-matrix (后缀数组)

题目大意: n*m的矩阵中,有多少个子矩阵不是同的. 思路分析: 假设这题题目只是一维的求一个串中有多少个子串是不同的. 那么也就是直接扫描height,然后减去前缀. 现在变成二维,如何降低维度. 知道hash 的作用就是将一个串映射到一个数字. 那我们就将这个矩阵hash,考虑到不同的长度和宽度都会导致不同, 所以就要枚举子矩阵的宽度. hash [i][j] 就表示在当前宽度W 下,从 第 i 行 第 j 个开始往后W长度的串的hash值. 然后将列上相同起点的hash值 子串. 然后将所

HDU 5769 Substring(后缀数组)

[题目链接] http://acm.hdu.edu.cn/showproblem.php?pid=5769 [题目大意] 在一个串中求出包含字母的子串个数, 只要存在一个字符不相等的子串即可视为不同的子串. [题解] 因为要求存在字符不相等因此我们可以利用后缀数组统计, 后缀数组的h数组可以记录前后两个后缀的最长公共前缀这样子相同的前缀就不会被多次计算, 保存每个位置之后出现的最近的要求的字母的位置, 从该后缀的包含该字母的位置往后且不在最长公共前缀的范围内的位置都可以作为子串的右端点, 统计左