Lucy and Question Marks
Long ago Lucy had written some sentences in her textbook. She had recently found those her notes. But because of the large amount of time that had passed, some letters became difficult to read. Her notes are given to you as a string
with question marks in places of letters that are impossible to read.
Lucy remembers that those sentences definitely made some sense. So now she wants to restore them. She thinks that the best way to restore is to replace all the question marks by latin letters in such a way that the total sum of
occurrences of all the strings from her dictionary in it is maximal. And it is normal if some word occurs in her dictionary two or more times. In this case you just have to count every word as much times as it occurs in the dictionary.
You will be given the string itself and the dictionary. Please output the maximal possible number of occurrences of dictionary words and lexicographically minimal string with this number of occurrences.
Input
The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
The first line of every test case consists of two integers N and M - the length of the string, written by Lucy and the number of words in the dictionary. The second line of the test case consists of the string itself
- Ncharacters, each is either a question mark or a small latin letter.
Then, M lines follow. Each line consist of a single string of small latin letters - the word from the dictionary.
Output
For each test case, output a two lines. The first line should contain the maximal number of occurrences. The second line should contain lexicographically minimal string with the maximal number of occurrences of the words from the
dictionary.
Example
Input: 3 7 4 ??????? ab ba aba x 5 3 ?ac?? bacd cde xa 8 2 ?a?b?c?d ecxd zzz Output: 9 abababa 2 bacde 1 aaabecxd
Scoring
Subtask 1 (16 points): T = 50, 1 <= N <= 8, 1 <= M <= 10. Only the characters a, b and c and question marks occur in the string. Only the characters a, b, and c occur
in the dictionary words. All the words in the dictionary consist of no more than 10 letters.
Subtask 2 (32 points): T = 50, 1 <= N <= 100, 1 <= M <= 100. Only the characters a, b and question marks occur in the string. Only the characters a and b occur in the dictionary words. All the
words in the dictionary consist of no more than 10 letters.
Subtask 3 (52 points): T = 10, 1 <= N <= 1000, 1 <= M <= 1000. Total length of all the dictionary strings will not exceed 1000.
Time limit for the last subtask equals to 2 sec. For the first two subtasks it is 1 sec.
QMARKS - Editorial
Problem Link:
Difficulty:
Easy-Medium
Pre-requisites:
Aho-Corasick, DP
Explanation:
In order to pass the first sub task it‘s sufficient to implement exponential-time brute force solution. In order to go further some knowledge about Aho-Corasick algo will be required. A lot of articles on Aho-Corasick can be found
on the net.
Let‘s solve the inverse problem first. Consider that you have a set of strings D and a string T and now it‘s required to calculate the total number of occurences of all the strings from D in S.
This problem is a standard for Aho-Corasick algo. The standard solution builds a trie from the set of strings D with O(total length of all the strings from D) nodes. Then, suffix links are calculated and with the usage of
suffix links it‘s possible to calculate the number of strings that end in every node of a trie and in every it‘s suffix. The next step is turning a trie in the automaton with O(states*alphabet) transitions. After this, you will have an automaton on which you
can make N steps in order to calculate the number of occurences all the required substrings. This is the brief description of the inverse problem solution. More detailed description can be found in almost any Aho-Corasick tutorial, because
this "inverse" problem is actually a well known one.
Now, how to solve the original problem. There is a DP soltuion. As it was mentioned before, there‘ll be O(total length of strings from D) states in the automaton. So it‘s possible to have a DP state of the form
(number of letters already processed, current position in the automaton). The transition then is quite straightforward: if the current symbol is a question mark, then you can have 26 possible choices. Otherwise, the choice is unique - you can not use all the
symbols but the current one. This way you can get the maximal number of occurences.
In order to restore the string itself, you can act greedily. You can iterate through the symbols of the string S, starting from the first one. If the current character is a letter, then there‘s only one choice.
Otherwise, you can iterate through all the possible characters, namely ‘a‘ to ‘z‘ and choose the transition to the state with the maximal DP value in it (if there are several such transitions, you can choose the one with the minimal character). It becomes
possible if your DP state is (the size of the current suffix, the position in the automaton), because adding a symbol is just a transition from one suffix to another, smaller one and in this case, the DP will contain all the necessary information about the
remaining part of the string.
Setter‘s Solution:
Can be found here
Tester‘s Solution:
Can be found here
#include <iostream> #include <cstring> #include <cstdio> #include <algorithm> using namespace std; int T,n,m,i,num,q,ls,j,trie[1005][26],enwei[1005],G[1005][26],dp[1005][1005],c,choi,Link[1005],pv[1005],pch[1005],ew[1111]; char a[1005],s[1005]; int getlink(int k); int Go(int k,int j); int getlink(int k){ // suffix link standard calculation if(Link[k]==0) if(k==1||pv[k]==1)Link[k]=1;else Link[k]=Go(getlink(pv[k]),pch[k]); return Link[k]; } int Go(int k,int j){ // Aho-Corasick's automaton transition if(G[k][j]==0) if(trie[k][j]!=0) G[k][j]=trie[k][j]; else G[k][j]=k==1?1:Go(getlink(k),j); return G[k][j]; } int main (int argc, char * const argv[]) { scanf("%d",&T); for(;T;T--){ scanf("%d%d",&n,&m); for(i=1;i<=n;i++){ a[i]=getchar(); while((a[i]<'a'||a[i]>'z')&&(a[i]!='?'))a[i]=getchar(); } num=1; gets(s); for(i=1;i<=m;i++){ gets(s);ls=strlen(s); q=1; for(j=0;j<ls;j++)if(!trie[q][s[j]-'a']){ // building the trie trie[q][s[j]-'a']=++num; // new transition pv[num]=q;pch[num]=s[j]-'a'; // parent vertice and character for the node q=num; }else q=trie[q][s[j]-'a']; ++enwei[q]; // number of strings the end in this node } for(i=1;i<=num;i++){ // calculating the number of strings that end in the node and all it's suffixes j=i;ew[j]=0; while(j>1){ ew[i]+=enwei[j]; j=getlink(j); } } for(i=1;i<=num;i++)enwei[i]=ew[i]; for(i=0;i<=n;i++)for(j=1;j<=num;j++)dp[i][j]=-1000000000; // dp initialization // dp[i][j] - answer for the substring [i; N] when the current node of the automaton is j for(j=1;j<=num;j++)dp[n][j]=enwei[j]; for(i=n-1;i>=0;i--)for(j=1;j<=num;j++){ // dp calculation if(a[i+1]=='?') for(c=0;c<26;c++)dp[i][j]=max(dp[i][j],enwei[j]+dp[i+1][Go(j,c)]);else dp[i][j]=max(dp[i][j],enwei[j]+dp[i+1][Go(j,a[i+1]-'a')]); } printf("%d\n",dp[0][1]); // optimal result: all the characters of the string are processed and we start in the first node (like in the standard algo) for(q=1,i=1;i<=n;i++){ if(a[i]!='?')choi=a[i]-'a';else{ // if there's only one option choi=0; for(j=0;j<26;j++)if(dp[i][Go(q,j)]>dp[i][Go(q,choi)])choi=j; // otherwise we should just take the most optimal one } putchar('a'+choi); q=Go(q,choi); } puts(""); for(i=1;i<=num;i++){ enwei[i]=Link[i]=pv[i]=pch[i]=ew[i]=0; for(j=0;j<26;j++)trie[i][j]=G[i][j]=0; } } return 0; }
#include <cstdio> #include <memory.h> #include <cmath> #include <iostream> #include <algorithm> #include <string> using namespace std; const int inf = 1e8; int i, j, n, m, v, cnt; char a[1033]; int t[1033][26]; int pch[1033], pv[1033]; int terminal[1033]; int reach[1033], link[1033]; int mem[1033][26]; int f[1003][1003]; char q; int go(int v, char c); int get_link(int v) { //printf("%d\n", v); if (link[v] == 0) if (v == 1 || pv[v] == 1) link[v] = 1; else link[v] = go(get_link(pv[v]), pch[v]); return link[v]; } int go(int v, char c) { if (mem[v][c] == 0) if (t[v][c] != 0) mem[v][c] = t[v][c]; else if (v == 1) mem[v][c] = 1; else mem[v][c] = go(get_link(v), c); return mem[v][c]; } int main() { // freopen("input.txt", "r", stdin); // freopen("output.txt", "w", stdout); int tc; scanf("%d", &tc); while (tc--) { memset(mem, 0, sizeof(mem)); memset(t, 0, sizeof(t)); memset(link, 0, sizeof(link)); memset(terminal, 0, sizeof(terminal)); memset(reach, 0, sizeof(reach)); scanf("%d%d\n", &n, &m); for (i = 1; i <= n; i++) a[i] = getchar(); scanf("\n"); int cnt = 1, v; for (i = 1; i <= m; i++) { q = getchar(); v = 1; while (q != '\n') { // putchar(q); q -= 'a'; if (t[v][q] == 0) { cnt++; t[v][q] = cnt; pch[cnt] = q; pv[cnt] = v; } v = t[v][q]; q = getchar(); } terminal[v]++; // printf("\n"); } for (i = 1; i <=n; i++) for (j = 1; j <= cnt; j++) f[i][j] = - inf; for (i = 1; i <= cnt; i++) { v = i; while(v > 1) { reach[i] += terminal[v]; v = get_link(v); } } for (i = 1; i <= cnt; i++) f[n + 1][i] = reach[i]; for (i = n; i > 0; i--) for (j = 1; j <= cnt; j++) { if (a[i] != '?') f[i][j]=f[i + 1][go(j, a[i] - 'a')] + reach[j]; else { for (q = 0; q < 26; q++) f[i][j] = max(f[i][j], f[i + 1][go(j, q)] + reach[j]); } } printf("%d\n", f[1][1]); v = 1; for (i = 1; i <= n; i++) { if (a[i] == '?') { q = 'a'; int best = f[i + 1][go(v, 0)]; for (char c = 1; c < 26; c++) if (f[i + 1][go(v, c)] > best) { best = f[i + 1][go(v, c)]; q = c + 'a'; } } else q = a[i]; printf("%c", q); v = go(v, q - 'a'); } printf("\n"); } }
版权声明:本文为博主原创文章,未经博主允许不得转载。