http://acm.hdu.edu.cn/showproblem.php?pid=1686
Oulipo
Time Limit: 3000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)
Total Submission(s): 6098 Accepted Submission(s): 2448
Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter ‘e‘. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive ‘T‘s is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {‘A‘, ‘B‘, ‘C‘, …, ‘Z‘} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {‘A‘, ‘B‘, ‘C‘, …, ‘Z‘}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {‘A‘, ‘B‘, ‘C‘, …, ‘Z‘}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
1 #include <iostream> 2 #include <stdlib.h> 3 #include <stdio.h> 4 #include <cstring> 5 using namespace std; 6 int n,m,nxt[10005],kk,t; 7 char b[10005],a[1000005]; 8 ///此题在基础的kmp上加了多次匹配。 9 ///就意味着我们在匹配完一次字串后,要跳到最适合的位置,继续查找 10 ///继续利用kmp的思想。某些位置已经匹配过,就不要匹配了。 11 /// xxxxxxxabbaab*xxxxxx 12 /// abbaaba 13 //我们跳跃之后的位置 abbaaba 而跳跃的位置与next数组有关 14 void buildnxt() 15 { 16 int j,k; 17 m=strlen(b); 18 nxt[0]=-1; 19 j=0;k=-1; 20 while(j<m) 21 { 22 if((k==-1)||b[j]==b[k]) 23 { 24 j++; 25 k++; 26 nxt[j]=k; 27 } 28 else k=nxt[k]; 29 } 30 } 31 int kmp() 32 { 33 int k=0,l=0,cou=0; 34 n=strlen(a); 35 int ans=m,kk=nxt[m];///ans在字串中下标,和起点距离ans+1 36 while(1) 37 { 38 if(kk!=0&&kk!=-1) {ans=kk;kk=nxt[ans];} 39 else break; 40 }///要找最小的跳跃点,所以从next尾端返回去找到首个非负值。 41 while(k<n) 42 { 43 if((l==-1)||a[k]==b[l]) 44 { 45 k++; 46 l++; 47 } 48 else l=nxt[l]; 49 if(l==m) 50 { 51 cou++; 52 if(kk==0) continue;///如果是尾端next数组是0的话,主串中匹配的子串中没有重复。 53 ///也就是说在匹配的主串中,没有可以跳跃的点。 54 if(k==n-1) break;///如果k已经是主串末尾了,就不能还有继续可以匹配的字串了。 55 k=k-l+ans;///k-l(起点)+ans 56 l=0; 57 } 58 } 59 return cou; 60 } 61 int main() 62 { 63 scanf("%d",&t); 64 getchar(); 65 while(t--) 66 { 67 gets(b); 68 gets(a); 69 memset(nxt,0,sizeof(nxt)); 70 buildnxt(); 71 printf("%d\n",kmp()); 72 } 73 return 0; 74 }