[Note]后缀数组

后缀数组

代码

void rsort() {
    for (int i = 1; i <= m; ++i) tax[i] = 0;
    for (int i = 1; i <= n; ++i) ++tax[rnk[i]];
    for (int i = 1; i <= m; ++i) tax[i] += tax[i-1];
    for (int i = n; i >= 1; --i) sa[tax[rnk[tmp[i]]]--] = tmp[i];
}

void ssort() {
    for (int i = 1; i <= n; ++i) rnk[i] = a[i], tmp[i] = i;
    m = 127;
    rsort();
    for (int w = 1, p = 0; p < n; w <<= 1) {
        p = 0;
        for (int i = 1; i <= w; ++i) tmp[++p] = n - w + i;
        for (int i = 1; i <= n; ++i) if (sa[i] > w) tmp[++p] = sa[i] - w;
        rsort();
        std::swap(rnk, tmp);
        rnk[sa[1]] = p = 1;
        for (int i = 2; i <= n; ++i) {
            rnk[sa[i]] = (tmp[sa[i]] == tmp[sa[i-1]]
            && tmp[sa[i]+w] == tmp[sa[i-1]+w]) ? p : ++p;
        }
        m = p;
    }
    for (int i = 1, k = 0; i <= n; ++i) {
        while (a[i+k] == a[sa[rnk[i]-1]+k]) ++k;
        h[rnk[i]] = k;
        if (k) --k;
    }
}

应用

关于后缀数组和后缀自动机,在hihocoder上有一套很好的题(重复旋律)。

最长可重叠重复K次子串问题

(hiho1403)

h数组中长度为k的子串的最小值的最大值。

最长不可重叠重复子串问题

(hiho1407)

二分答案为k,若h数组中有连续的一段大于k的值(即有一个子串重复了),且这一段中最靠前的位置和最靠后的位置之间的差大于k(即这个子串可以不重叠),那么该答案合法。

bool check(int x) {
    int mn = N + 10, mx = 0;
    for (int i = 1, flag = 0; i <= n; ++i) {
        if (h[i] >= x) {
            if (!flag) { // mark
                mx = std::max(mx, sa[i-1]);
                mn = std::min(mn, sa[i-1]);
            }
            mx = std::max(mx, sa[i]);
            mn = std::min(mn, sa[i]);
            flag = 1;
        } else if (flag) {
            flag = 0;
            if (mx - mn >= x) {
                return true;
            }
            mn = N + 10;
            mx = 0;
        }
    }
    return false;
}

注意由于h数组的定义,我们需要标记为mark的部分。

最长公共子串问题

(hiho1415)

将两个子串拼接起来,用‘#‘分隔,那么两个串的最长公共子串就是保证sa[i]sa[i-1]不在同一个串内的最大的h[i]

连续重复次数最多的子串

(hiho1419)

枚举子串长度l和重复起点p,计算重复次数lcp(p, p+l)/l + 1,复杂度\(O(n^2)\)。

考虑优化,我们可以以l的间隔枚举p,考虑某个位置p,记lcp(p, p+l)R,那么,被我们忽略掉的位置p-1,p-2,p-3...的答案值不会超过R+1

对于\(p-R\bmod l < x < p\) 的\(x\),以x为起点的答案值不可能超过R(由公式易得),而对于\(p-l<x<p-R\bmod l\)的\(x\),以x为起点的答案值也不可能超过以p-R%l的答案值,所以只需计算成倍的pp-R%l的答案值即可。

for (int l = 1; l <= n; ++l) {
    for (int i = 1; i+l <= n; i += l) {
        int R = lcp(i, i + l);
        ans = std::max(ans, R / l + 1);
        if (i >= l - R%l) {
            ans = std::max(ans,
            lcp(i - l + R%l, i + R%l) / l + 1);
        }
    }
}

不同子串的数目问题

\(\frac{1}{2}n(n+1)-\sum_{i=1}^n h[i]\)

原文地址:https://www.cnblogs.com/wyxwyx/p/suffixarray.html

时间: 2024-08-30 05:58:04

[Note]后缀数组的相关文章

学习笔记:后缀数组

后缀数组是指对于后缀排序后,每个后缀的位置:sa[rank]=pos:排名为rank的后缀是pos->len这个后缀 note:rank[pos]=rank:位置为pos的串排名为rank 白书上的代码简洁明了,很容易理解. 核心思想:我们对于每个位置开始的后缀,不直接计算,先计算从这个位置开始,向后1位是第几小,然后向后2位,向后4位,一直到*2>n,这时就算好了后缀数组 复杂度:O(n*log(n)^2) :倍增log(n),快排log(n) 代码: bool cp(int x,int y

HDU5008 Boring String Problem(后缀数组 + 二分 + 线段树)

题目 Source http://acm.hdu.edu.cn/showproblem.php?pid=5008 Description In this problem, you are given a string s and q queries. For each query, you should answer that when all distinct substrings of string s were sorted lexicographically, which one is

hdu 5008(2014 ACM/ICPC Asia Regional Xi&#39;an Online ) Boring String Problem(后缀数组&amp;二分)

Boring String Problem Time Limit: 6000/3000 MS (Java/Others)    Memory Limit: 65536/65536 K (Java/Others) Total Submission(s): 219    Accepted Submission(s): 45 Problem Description In this problem, you are given a string s and q queries. For each que

poj 1743 Musical Theme(后缀数组)

Musical Theme Time Limit: 1000MS   Memory Limit: 30000K Total Submissions: 30544   Accepted: 10208 Description A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the

Codeforces Round #422 (Div. 2) E. Liar 后缀数组+RMQ+DP

E. Liar The first semester ended. You know, after the end of the first semester the holidays begin. On holidays Noora decided to return to Vi?kopolis. As a modest souvenir for Leha, she brought a sausage of length m from Pavlopolis. Everyone knows th

D. Match &amp; Catch 后缀数组

Police headquarter is monitoring signal on different frequency levels. They have got two suspiciously encoded strings s1 and s2 from two different frequencies as signals. They are suspecting that these two strings are from two different criminals and

[POJ1743] Musical Theme (后缀数组)

题目概述: A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It is unfortunate but true that this representation of melodies ignores the notion of musical tim

POJ 1743 Musical Theme 后缀数组 最长重复不相交子串

Musical ThemeTime Limit: 20 Sec Memory Limit: 256 MB 题目连接 http://poj.org/problem?id=1743 Description A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It

POJ - 1743 Musical Theme (后缀数组求不可重叠最长重复子串)

Description A musical melody is represented as a sequence of N (1<=N<=20000)notes that are integers in the range 1..88, each representing a key on the piano. It is unfortunate but true that this representation of melodies ignores the notion of music