UVALive 6869 Repeated Substrings

Repeated Substrings

Time Limit: 3000MS Memory Limit: Unknown 64bit IO Format: %lld & %llu

Description

String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).

Input

The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.

Output

For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.

Sample Input

3
aabaab
aaaaa
AaAaA

Sample Output

5
4
5

HINT

Source

解题:后缀数组lcp的应用,如果lcp[i] > lcp[i-1]那么累加lcp[i] - lcp[i-1]

 1 #include <bits/stdc++.h>
 2 using namespace std;
 3 const int maxn = 100010;
 4 int rk[maxn],wb[maxn],wv[maxn],wd[maxn],lcp[maxn];
 5 bool cmp(int *r,int i,int j,int k) {
 6     return r[i] == r[j] && r[i+k] == r[j+k];
 7 }
 8 void da(int *r,int *sa,int n,int m) {
 9     int i,k,p,*x = rk,*y = wb;
10     for(i = 0; i < m; ++i) wd[i] = 0;
11     for(i = 0; i < n; ++i) wd[x[i] = r[i]]++;
12     for(i = 1; i < m; ++i) wd[i] += wd[i-1];
13     for(i = n-1; i >= 0; --i) sa[--wd[x[i]]] = i;
14
15     for(p = k = 1; p < n; k <<= 1,m = p) {
16         for(p = 0,i = n-k; i < n; ++i) y[p++] = i;
17         for(i = 0; i < n; ++i) if(sa[i] >= k) y[p++] = sa[i] - k;
18         for(i = 0; i < n; ++i) wv[i] = x[y[i]];
19
20         for(i = 0; i < m; ++i) wd[i] = 0;
21         for(i = 0; i < n; ++i) wd[wv[i]]++;
22         for(i = 1; i < m; ++i) wd[i] += wd[i-1];
23         for(i = n-1; i >= 0; --i) sa[--wd[wv[i]]] = y[i];
24
25         swap(x,y);
26         x[sa[0]] = 0;
27         for(p = i = 1; i < n; ++i)
28             x[sa[i]] = cmp(y,sa[i-1],sa[i],k)?p-1:p++;
29     }
30 }
31 void calcp(int *r,int *sa,int n) {
32     for(int i = 1; i <= n; ++i) rk[sa[i]] = i;
33     int h = 0;
34     for(int i = 0; i < n; ++i) {
35         if(h > 0) h--;
36         for(int j = sa[rk[i]-1]; i+h < n && j+h < n; h++)
37             if(r[i+h] != r[j+h]) break;
38         lcp[rk[i]] = h;
39     }
40 }
41 int r[maxn],sa[maxn];
42 char str[maxn];
43 int main() {
44     int hn,x,y,cs,ret;
45     scanf("%d",&cs);
46     while(cs--) {
47         scanf("%s",str);
48         int len = strlen(str);
49         for(int i = 0; str[i]; ++i)
50             r[i] = str[i];
51         ret = r[len] = 0;
52         da(r,sa,len+1,128);
53         calcp(r,sa,len);
54         for(int i = 2; i <= len; ++i)
55             if(lcp[i] > lcp[i-1]) ret += lcp[i] - lcp[i-1];
56         printf("%d\n",ret);
57     }
58     return 0;
59 }

时间: 2024-07-30 20:24:42

UVALive 6869 Repeated Substrings的相关文章

CSU-1632 Repeated Substrings (后缀数组)

Description String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this proble

UVALive 6869(后缀数组)

传送门:Repeated Substrings 题意:给定一个字符串,求至少重复一次的不同子串个数. 分析:模拟写出子符串后缀并排好序可以发现,每次出现新的重复子串个数都是由现在的height值减去前一个height值. #include <iostream> #include <cstdio> #include <cstring> #include <algorithm> #include <queue> using namespace std

Repeated Substrings(UVAlive 6869)

题意:求出现过两次以上的不同子串有多少种. /* 用后缀数组求出height[]数组,然后扫一遍, 发现height[i]-height[i-1]>=0,就ans+=height[i]-height[i-1]. */ #include<cstdio> #include<iostream> #include<cstring> #define N 100010 using namespace std; int sa[N],rk[N],ht[N],t1[N],t2[N]

CSU-1632 Repeated Substrings[后缀数组求重复出现的子串数目]

评测地址:https://cn.vjudge.net/problem/CSU-1632 Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T<=10)表示用例组数,每组用例占一行为一个长度不超过100000的字符串 Output 对于每组用例,输出该串中所有出现至少两次的子串个数 Sample Input 3 aabaab aaaaa AaAaA Sample Output 5 4 5 Solution Ans=sum(max(height(i)-hei

【FFT】 UVALIVE 4671 K-neighbor substrings

通道:https://icpcarchive.ecs.baylor.edu/index.php?option=com_onlinejudge&Itemid=8&page=show_problem&problem=2672

湖南多校对抗5.24

据说A,B,C题都比较水这里就不放代码了 D:Facility Locations 然而D题是一个脑经急转弯的题:有m行,n列,每个位置有可能为0,也可能不为0,问最多选K行是不是可以使得每一列都至少有一个0,其中代价c有个约束条件:These costs satisfy a locality property: for two clients j and j’ and two facilities i and i’, we have cij ≤ ci’j + ci’j’ + cij’ . 一看

Repeated DNA Sequences

package cn.edu.xidian.sselab.hashtable; import java.util.ArrayList;import java.util.HashSet;import java.util.List;import java.util.Set; /** *  * @author zhiyong wang * title: Repeated DNA Sequences * content: *  All DNA is composed of a series of nuc

[LeetCode] 187. Repeated DNA Sequences 解题思路

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq

[leedcode 187] Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long seq