1293 - Document Analyzer
PDF (English) | Statistics | Forum |
Time Limit: 3 second(s) | Memory Limit: 32 MB |
You work in a leading software development company. As youare great in coding, most of the critical tasks are allotted for you. You likethe challenge and you feel excited to solve those problems.
Recently your company is developing a project named DocumentAnalyzer. In this project you are assigned a task; of course a critical task.The task is that you are given a document consisting of lowercase letters,numbers and punctuations.
You have to analyze the document and separate thewords first. Words are consecutive sequences of lower case letters. Afterlisting the words, in the order same as they occurred in the document, you haveto number them from
1, 2, ..., n. After that you have to find the range pand
q (p ≤ q) such that all kinds of words occur between pand
q (inclusive). If there are multiple such solutions you have to findthe one where the difference of
p and q is smallest. If stillthere is a tie, then find the solution where
p is smallest.
Input
Input starts with an integer T (≤ 15),denoting the number of test cases.
Each case will be denoted by one or more lines denoting adocument. Each line contains no more than
100 characters. A documentwill contain either lowercase letters or numbers or punctuations. The last lineof a document will contain the word
‘END‘ which is of course not thepart of the document. You can assume that a document will contain between
1and 50000 words (inclusive). Words may contain up to
10characters. And a document can contain up to 5000 lines.
Output
For each case, print the case number and
p and qas described above.
Sample Input |
Output for Sample Input |
3 1. a case is a case, 2. case is not a case~ END a b c d e END [email protected]#$a^%a a a b b----b b++12b END |
Case 1: 6 9 Case 2: 1 5 Case 3: 5 6 |
Note
Dataset is huge, use faster I/O methods.
题目链接:http://lightoj.com/volume_showproblem.php?problem=1293
题目大意:求最小的覆盖了所有英文字符串的区间的端点值
题目分析:先将单独的英文单词或字母map成离散的数字,然后就是两点法,先确定最小的右端点使得区间包含全部不重复的数字记录此时区间的左右端点,然后移动左端点,若此时区间不包含全部数字则再移动右端点,这样反复移动左右端点并更新区间端点值,直到再向后找不到合法区间为止
#include <cstdio> #include <string> #include <cstring> #include <map> using namespace std; int const MAX = 500005; int const INF = 0x3fffffff; char s[MAX]; bool hash[MAX]; int a[MAX]; map <string, int> mp; int main() { int T; scanf("%d", &T); for(int ca = 1; ca <= T; ca++) { mp.clear(); map <int, int> h; memset(hash, false, sizeof(hash)); memset(a, 0, sizeof(a)); int cnt = 0, num = 0; while(gets(s)) { string tmp = ""; if(strcmp(s, "END") == 0) break; int len = strlen(s); for(int i = 0; i <= len; i++) { if(s[i] >= 'a' && s[i] <= 'z') tmp += s[i]; else { if(tmp != "") { if(!hash[mp[tmp]]) { num ++; mp[tmp] = id++; hash[mp[tmp]] = true; } a[cnt++] = mp[tmp]; tmp = ""; } } } } int st = 0, ed = 0, num2 = 0; int ansst, ansed, mi = INF; while(true) { while(ed < cnt && num2 < num) if(h[a[ed++]] ++ == 0) num2 ++; if(num2 < num) break; if(ed - st < mi) { ansed = ed; ansst = st; mi = ed - st; } if(-- h[a[st++]] == 0) num2 --; } printf("Case %d: %d %d\n", ca, ansst + 1, ansed); } }