AC自动机+DP 改变字符串中的‘?’使得在字典中匹配到的次数最多 codechef Lucy and Question Marks

Lucy and Question Marks

Long ago Lucy had written some sentences in her textbook. She had recently found those her notes. But because of the large amount of time that had passed, some letters became difficult to read. Her notes are given to you as a string
with question marks in places of letters that are impossible to read.

Lucy remembers that those sentences definitely made some sense. So now she wants to restore them. She thinks that the best way to restore is to replace all the question marks by latin letters in such a way that the total sum of
occurrences of all the strings from her dictionary in it is maximal. And it is normal if some word occurs in her dictionary two or more times. In this case you just have to count every word as much times as it occurs in the dictionary.

You will be given the string itself and the dictionary. Please output the maximal possible number of occurrences of dictionary words and lexicographically minimal string with this number of occurrences.

Input

The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.

The first line of every test case consists of two integers N and M - the length of the string, written by Lucy and the number of words in the dictionary. The second line of the test case consists of the string itself
- Ncharacters, each is either a question mark or a small latin letter.

Then, M lines follow. Each line consist of a single string of small latin letters - the word from the dictionary.

Output

For each test case, output a two lines. The first line should contain the maximal number of occurrences. The second line should contain lexicographically minimal string with the maximal number of occurrences of the words from the
dictionary.

Example

Input:
3
7 4
???????
ab
ba
aba
x
5 3
?ac??
bacd
cde
xa
8 2
?a?b?c?d
ecxd
zzz

Output:
9
abababa
2
bacde
1
aaabecxd

Scoring

Subtask 1 (16 points): T = 50, 1 <= N <= 8, 1 <= M <= 10. Only the characters a, b and c and question marks occur in the string. Only the characters a, b, and c occur
in the dictionary words. All the words in the dictionary consist of no more than 10 letters.

Subtask 2 (32 points): T = 50, 1 <= N <= 100, 1 <= M <= 100. Only the characters a, b and question marks occur in the string. Only the characters a and b occur in the dictionary words. All the
words in the dictionary consist of no more than 10 letters.

Subtask 3 (52 points): T = 10, 1 <= N <= 1000, 1 <= M <= 1000. Total length of all the dictionary strings will not exceed 1000.

Time limit for the last subtask equals to 2 sec. For the first two subtasks it is 1 sec.

QMARKS - Editorial

Problem Link:

Practice

Contest

Difficulty:

Easy-Medium

Pre-requisites:

Aho-Corasick, DP

Explanation:

In order to pass the first sub task it‘s sufficient to implement exponential-time brute force solution. In order to go further some knowledge about Aho-Corasick algo will be required. A lot of articles on Aho-Corasick can be found
on the net.

Let‘s solve the inverse problem first. Consider that you have a set of strings D and a string T and now it‘s required to calculate the total number of occurences of all the strings from D in S.
This problem is a standard for Aho-Corasick algo. The standard solution builds a trie from the set of strings D with O(total length of all the strings from D) nodes. Then, suffix links are calculated and with the usage of
suffix links it‘s possible to calculate the number of strings that end in every node of a trie and in every it‘s suffix. The next step is turning a trie in the automaton with O(states*alphabet) transitions. After this, you will have an automaton on which you
can make N steps in order to calculate the number of occurences all the required substrings. This is the brief description of the inverse problem solution. More detailed description can be found in almost any Aho-Corasick tutorial, because
this "inverse" problem is actually a well known one.

Now, how to solve the original problem. There is a DP soltuion. As it was mentioned before, there‘ll be O(total length of strings from D) states in the automaton. So it‘s possible to have a DP state of the form
(number of letters already processed, current position in the automaton). The transition then is quite straightforward: if the current symbol is a question mark, then you can have 26 possible choices. Otherwise, the choice is unique - you can not use all the
symbols but the current one. This way you can get the maximal number of occurences.

In order to restore the string itself, you can act greedily. You can iterate through the symbols of the string S, starting from the first one. If the current character is a letter, then there‘s only one choice.
Otherwise, you can iterate through all the possible characters, namely ‘a‘ to ‘z‘ and choose the transition to the state with the maximal DP value in it (if there are several such transitions, you can choose the one with the minimal character). It becomes
possible if your DP state is (the size of the current suffix, the position in the automaton), because adding a symbol is just a transition from one suffix to another, smaller one and in this case, the DP will contain all the necessary information about the
remaining part of the string.

Setter‘s Solution:

Can be found here

Tester‘s Solution:

Can be found here

#include <iostream>
#include <cstring>
#include <cstdio>
#include <algorithm>
using namespace std;

int T,n,m,i,num,q,ls,j,trie[1005][26],enwei[1005],G[1005][26],dp[1005][1005],c,choi,Link[1005],pv[1005],pch[1005],ew[1111];
char a[1005],s[1005];

int getlink(int k);
int Go(int k,int j);

int getlink(int k){ // suffix link standard calculation
	if(Link[k]==0)
		if(k==1||pv[k]==1)Link[k]=1;else Link[k]=Go(getlink(pv[k]),pch[k]);
	return Link[k];
}

int Go(int k,int j){ // Aho-Corasick's automaton transition
	if(G[k][j]==0)
		if(trie[k][j]!=0)
			G[k][j]=trie[k][j];
		else
			G[k][j]=k==1?1:Go(getlink(k),j);
	return G[k][j];
}

int main (int argc, char * const argv[]) {
	scanf("%d",&T);
	for(;T;T--){
		scanf("%d%d",&n,&m);
		for(i=1;i<=n;i++){
			a[i]=getchar();
			while((a[i]<'a'||a[i]>'z')&&(a[i]!='?'))a[i]=getchar();
		}
		num=1;
		gets(s);
		for(i=1;i<=m;i++){
			gets(s);ls=strlen(s);
			q=1;
			for(j=0;j<ls;j++)if(!trie[q][s[j]-'a']){ // building the trie
				trie[q][s[j]-'a']=++num; // new transition
				pv[num]=q;pch[num]=s[j]-'a'; // parent vertice and character for the node
				q=num;
			}else q=trie[q][s[j]-'a'];
			++enwei[q]; // number of strings the end in this node
		}
		for(i=1;i<=num;i++){ // calculating the number of strings that end in the node and all it's suffixes
			j=i;ew[j]=0;
			while(j>1){
				ew[i]+=enwei[j];
				j=getlink(j);
			}
		}
		for(i=1;i<=num;i++)enwei[i]=ew[i];
		for(i=0;i<=n;i++)for(j=1;j<=num;j++)dp[i][j]=-1000000000; // dp initialization
		// dp[i][j] - answer for the substring [i; N] when the current node of the automaton is j
		for(j=1;j<=num;j++)dp[n][j]=enwei[j];
		for(i=n-1;i>=0;i--)for(j=1;j<=num;j++){ // dp calculation
			if(a[i+1]=='?')
				for(c=0;c<26;c++)dp[i][j]=max(dp[i][j],enwei[j]+dp[i+1][Go(j,c)]);else
								 dp[i][j]=max(dp[i][j],enwei[j]+dp[i+1][Go(j,a[i+1]-'a')]);
		}
		printf("%d\n",dp[0][1]); // optimal result: all the characters of the string are processed and we start in the first node (like in the standard algo)
		for(q=1,i=1;i<=n;i++){
			if(a[i]!='?')choi=a[i]-'a';else{ // if there's only one option
				choi=0;
				for(j=0;j<26;j++)if(dp[i][Go(q,j)]>dp[i][Go(q,choi)])choi=j; // otherwise we should just take the most optimal one
			}
			putchar('a'+choi);
			q=Go(q,choi);
		}
		puts("");
		for(i=1;i<=num;i++){
			enwei[i]=Link[i]=pv[i]=pch[i]=ew[i]=0;
			for(j=0;j<26;j++)trie[i][j]=G[i][j]=0;
		}
	}
    return 0;
}
#include <cstdio>
#include <memory.h>
#include <cmath>
#include <iostream>
#include <algorithm>
#include <string>

using namespace std;

const int inf = 1e8;

int i, j, n, m, v, cnt;
char a[1033];
int t[1033][26];
int pch[1033], pv[1033];
int terminal[1033];
int reach[1033], link[1033];
int mem[1033][26];
int f[1003][1003];
char q;
int go(int v, char c);
int get_link(int v)
{
	//printf("%d\n", v);
	if (link[v] == 0)
		if (v == 1 || pv[v] == 1) link[v] = 1;
		else link[v] = go(get_link(pv[v]), pch[v]);
	return link[v];
}

int go(int v, char c)
{
	if (mem[v][c] == 0)
		if (t[v][c] != 0) mem[v][c] = t[v][c];
		else if (v == 1) mem[v][c] = 1;
		else mem[v][c] = go(get_link(v), c);
	return mem[v][c];
}
int main()
{
//	freopen("input.txt", "r", stdin);
//	freopen("output.txt", "w", stdout);
	int tc;
	scanf("%d", &tc);
	while (tc--)
	{
		memset(mem, 0, sizeof(mem));
		memset(t, 0, sizeof(t));
		memset(link, 0, sizeof(link));
		memset(terminal, 0, sizeof(terminal));
		memset(reach, 0, sizeof(reach));
		scanf("%d%d\n", &n, &m);
		for (i = 1; i <= n; i++)
			a[i] = getchar();
		scanf("\n");
		int cnt = 1, v;
		for (i = 1; i <= m; i++)
		{
			q = getchar();
			v = 1;
			while (q != '\n')
			{
			//	putchar(q);
				q -= 'a';
				if (t[v][q] == 0)
				{
					cnt++;
					t[v][q] = cnt;
					pch[cnt] = q;
					pv[cnt] = v;
				}
				v = t[v][q];
				q = getchar();
			}
			terminal[v]++;
		//	printf("\n");
		}
		for (i = 1; i <=n; i++)
			for (j = 1; j <= cnt; j++)
				f[i][j] = - inf;
		for (i = 1; i <= cnt; i++)
		{
			v = i;
			while(v > 1)
			{
				reach[i] += terminal[v];
				v = get_link(v);
			}
		}
		for (i = 1; i <= cnt; i++)
			f[n + 1][i] = reach[i];
		for (i = n; i > 0; i--)
			for (j = 1; j <= cnt; j++)
			{
				if (a[i] != '?') f[i][j]=f[i + 1][go(j, a[i] - 'a')] + reach[j];
				else
				{
					for (q = 0; q < 26; q++)
						f[i][j] = max(f[i][j], f[i + 1][go(j, q)] + reach[j]);
				}
			}

		printf("%d\n", f[1][1]);
		v = 1;
		for (i = 1; i <= n; i++)
		{
			if (a[i] == '?')
			{
				q = 'a';
				int best = f[i + 1][go(v, 0)];
				for (char c = 1; c < 26; c++)
					if (f[i + 1][go(v, c)] > best)
					{
						best = f[i + 1][go(v, c)];
						q = c + 'a';
					}
			}
			else q = a[i];
			printf("%c", q);
			v = go(v, q - 'a');
		}
		printf("\n");
	}
}

版权声明:本文为博主原创文章,未经博主允许不得转载。

时间: 2024-10-03 04:59:43

AC自动机+DP 改变字符串中的‘?’使得在字典中匹配到的次数最多 codechef Lucy and Question Marks的相关文章

hdu 2296 aC自动机+dp(得到价值最大的字符串)

Ring Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others)Total Submission(s): 3180    Accepted Submission(s): 1033 Problem Description For the hope of a forever love, Steven is planning to send a ring to Jane with a rom

HDU 3341 Lost&#39;s revenge AC自动机+dp

Lost's revenge Time Limit: 15000/5000 MS (Java/Others)    Memory Limit: 65535/65535 K (Java/Others)Total Submission(s): 3757    Accepted Submission(s): 1020 Problem Description Lost and AekdyCoin are friends. They always play "number game"(A bor

hdu 4878 ZCC loves words(AC自动机+dp+矩阵快速幂+中国剩余定理)

hdu 4878 ZCC loves words(AC自动机+dp+矩阵快速幂+中国剩余定理) 题意:给出若干个模式串,总长度不超过40,对于某一个字符串,它有一个价值,对于这个价值的计算方法是这样的,设初始价值为V=1,假如这个串能匹配第k个模式串,则V=V*prime[k]*(i+len[k]),其中prime[k]表示第k个素数,i表示匹配的结束位置,len[k]表示第k个模式串的长度(注意,一个字符串可以多次匹配同意个模式串).问字符集为'A'-'Z'的字符,组成的所有的长为L的字符串,

HDU 2457 DNA repair (AC自动机 + DP)

题目链接:DNA repair 解析:给出n个致病DNA序列,给一段DNA片段,问最少修改多少个碱基才能修复这段DNA序列中的所有致病序列. AC自动机 + DP. 将n个致病DNA序列构成一个自动机. 令DP[i][j]表示长度为i走到节点j是所需改变的最少个数. 状态转移时,枚举下一步所有可能的碱基,然后判断该碱基是否达到匹配状态,若能,则安全转移,继续枚举下一个碱基:否则在不匹配的前提下,看该碱基加入之后是否跟上一状态相同,若不同,则需修复,即计数加一.若相同,直接转移即可.然后选择其中最

HDU3341 Lost&#39;s revenge(AC自动机+DP)

题目是给一个DNA重新排列使其包含最多的数论基因. 考虑到内存大概就只能这么表示状态: dp[i][A][C][G][T],表示包含各碱基个数为ACGT且当前后缀状态为自动机第i的结点的字符串最多的数论基因数 其中ACGT可以hash成一个整数(a*C*G*T+c*G*T+g*T+T),这样用二维数组就行了,而第二维最多也就11*11*11*11个. 接下来转移依然是我为人人型,我是丢进一个队列,用队列来更新状态的值. 这题果然挺卡常数的,只好手写队列,最后4500msAC,还是差点超时,代码也

poj 1625 Censored!(AC自动机+DP+高精度)

题目链接:poj 1625 Censored! 题目大意:给定N,M,K,然后给定一个N字符的字符集和,现在要用这些字符组成一个长度为M的字符串,要求不包 括K个子字符串. 解题思路:AC自动机+DP+高精度.这题恶心的要死,给定的不能匹配字符串里面有负数的字符情况,也算是涨姿势 了,对应每个字符固定偏移128单位. #include <cstdio> #include <cstring> #include <queue> #include <vector>

HDU 2296 Ring AC自动机 + DP

题意:给你n个模式串,每个模式串有一个得分,让你构造出一个长度为N之内且分数最高的文本串;输出字典序列最小的. 解题思路:  AC自动机 + DP , 不过要输出字典序列最小,多开一个 一个三维字符串来辅助二维DP(新思路) , DP[i][j] ,表示到i位置状态为j的最大得分. 解题代码: 1 // File Name: temp.cpp 2 // Author: darkdream 3 // Created Time: 2014年09月11日 星期四 15时18分4秒 4 5 #inclu

HDU2296——Ring(AC自动机+DP)

题意:输入N代表字符串长度,输入M代表喜欢的词语的个数,接下来是M个词语,然后是M个词语每个的价值.求字符串的最大价值.每个单词的价值就是单价*出现次数.单词可以重叠.如果不止一个答案,选择字典序最小的. 题解:AC自动机+dp.dp[i][j]表示在字符串长度i,在自动机的第j个状态.因为要字典序最小,所以转移时要保持字典序最小. 想了各种转移姿势 最后还是查了题解 发现可以直接记录前缀转移…… #include <bits/stdc++.h> using namespace std; co

POJ1625 Censored!(AC自动机+DP)

题目问长度m不包含一些不文明单词的字符串有多少个. 依然是水水的AC自动机+DP..做完后发现居然和POJ2778是一道题,回过头来看都水水的... dp[i][j]表示长度i(在自动机转移i步)且后缀状态为自动机第j个结点的合法字符串数 dp[0][0]=1 转移转移... 注意要用高精度,因为答案最多5050. 还有就是要用unsigned char,题目的输入居然有拓展的ASCII码,编码128-255. 1 #include<cstdio> 2 #include<cstring&