POJ 3691 & HDU 2457 DNA repair (AC自动机,DP)

http://poj.org/problem?id=3691

http://acm.hdu.edu.cn/showproblem.php?pid=2457


DNA repair

Time Limit: 2000MS   Memory Limit: 65536K
Total Submissions: 5690   Accepted: 2669

Description

Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters ‘A‘, ‘G‘ , ‘C‘ and ‘T‘. The repairing techniques are simply
to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can
still contain only characters ‘A‘, ‘G‘, ‘C‘ and ‘T‘.

You are to help the biologists to repair a DNA by changing least number of characters.

Input

The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases.

The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease.

The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired.

The last test case is followed by a line containing one zeros.

Output

For each test case, print a line containing the test case number( beginning with 1) followed by the

number of characters which need to be changed. If it‘s impossible to repair the given DNA, print -1.

Sample Input

2
AAA
AAG
AAAG
2
A
TG
TGAATG
4
A
G
C
T
AGT
0

Sample Output

Case 1: 1
Case 2: 4
Case 3: -1

Source

2008 Asia Hefei Regional Contest Online by USTC

题意:

给出N个模式串和一个文本串,问最少修改文本串中多少个字母使得文本串中不包含模式串。

分析:

N个模式串构建AC自动机,然后文本串在AC自动机中走,其中单词结点不可达。

用dp[i][j]表示文本串第i个字母转移到AC自动机第j个结点最少修改字母的个数,状态转移方程为dp[i][j]=min(dp[i][j],dp[i-1][last]+add),last表示j的前趋,add为当前点是否修改。由于第i个只和第i-1个有关,所以可以使用滚动数组来优化空间。

/*
 *
 * Author : fcbruce <[email protected]>
 *
 * Time : Tue 18 Nov 2014 11:17:49 AM CST
 *
 */
#include <cstdio>
#include <iostream>
#include <sstream>
#include <cstdlib>
#include <algorithm>
#include <ctime>
#include <cctype>
#include <cmath>
#include <string>
#include <cstring>
#include <stack>
#include <queue>
#include <list>
#include <vector>
#include <map>
#include <set>
#define sqr(x) ((x)*(x))
#define LL long long
#define itn int
#define INF 0x3f3f3f3f
#define PI 3.1415926535897932384626
#define eps 1e-10

#ifdef _WIN32
  #define lld "%I64d"
#else
  #define lld "%lld"
#endif

#define maxm
#define maxn 1024

using namespace std;

int q[maxn];

const int maxsize = 4;
struct Acauto
{
  int ch[maxn][maxsize];
  bool val[maxn];
  int last[maxn],nex[maxn];
  int sz;
  int dp[2][maxn];

  Acauto()
  {
    memset(ch[0],0,sizeof ch[0]);
    val[0]=false;
    sz=1;
  }

  void clear()
  {
    memset(ch[0],0,sizeof ch[0]);
    val[0]=false;
    sz=1;
  }

  int idx(const char c)
  {
    if (c=='A') return 0;
    if (c=='T') return 1;
    if (c=='C') return 2;
    return 3;
  }

  void insert(const char *s)
  {
    int u=0;
    for (int i=0;s[i]!='\0';i++)
    {
      int c=idx(s[i]);
      if (ch[u][c]==0)
      {
        memset(ch[sz],0,sizeof ch[sz]);
        val[sz]=false;
        ch[u][c]=sz++;
      }
      u=ch[u][c];
    }
    val[u]=true;
  }

  void get_fail()
  {
    int f=0,r=-1;
    nex[0]=0;
    for (int c=0;c<maxsize;c++)
    {
      int u=ch[0][c];
      if (u!=0)
      {
        nex[u]=0;
        q[++r]=u;
        last[u]=0;
      }
    }

    while (f<=r)
    {
      int x=q[f++];
      for (int c=0;c<maxsize;c++)
      {
        int u=ch[x][c];
        if (u==0)
        {
          ch[x][c]=ch[nex[x]][c];
          continue;
        }
        q[++r]=u;
        int v=nex[x];
        nex[u]=ch[v][c];
        val[u]|=val[nex[u]];
      }
    }
  }

  int DP(const char *T)
  {
    memset(dp,0x3f,sizeof dp);
    dp[0][0]=0;
    int x=1;
    for (int i=0;T[i]!='\0';i++,x^=1)
    {
      memset(dp[x],0x3f,sizeof dp[x]);
      int c=idx(T[i]);
      for (int j=0;j<sz;j++)
      {
        if (dp[x^1][j]==INF) continue;
        for (int k=0;k<4;k++)
        {
          if (val[ch[j][k]]) continue;
          int add=k==c?0:1;
          dp[x][ch[j][k]]=min(dp[x][ch[j][k]],dp[x^1][j]+add);
        }
      }
    }

    int MIN=INF;
    for (int i=0;i<sz;i++)
      MIN=min(MIN,dp[x^1][i]);
    if (MIN==INF) MIN=-1;
    return MIN;
  }
}acauto;

char DNA[1024];

int main()
{
#ifdef FCBRUCE
  freopen("/home/fcbruce/code/t","r",stdin);
#endif // FCBRUCE

  int n,__=0;

  while (scanf("%d",&n),n!=0)
  {
    acauto.clear();
    for (int i=0;i<n;i++)
    {
      scanf("%s",DNA);
      acauto.insert(DNA);
    }

    acauto.get_fail();

    scanf("%s",DNA);

    printf("Case %d: %d\n",++__,acauto.DP(DNA));
  }

  return 0;
}
时间: 2024-08-24 17:24:09

POJ 3691 & HDU 2457 DNA repair (AC自动机,DP)的相关文章

HDU 2457 DNA repair AC自动机 + dp

http://acm.hdu.edu.cn/showproblem.php?pid=2457 首先把病毒串保存一下,然后对于每一个trie上的节点,跑一发AC自动机,建立一个trie图. 建立的时候,对应做一些修改. 比如,现在建立成了这个样子. 如果he是一个病毒串,那么应该相对应的,把she那个he的位置,标志上,它也是病毒串,也就是不能转移到这一个状态. 这个可以在buildfail的时候对应修改. dp, 设dp[i][j],表示处理到字符串的第i个,走到了AC自动机的第j个节点,变成了

POJ 3691 &amp;amp; HDU 2457 DNA repair (AC自己主动机,DP)

http://poj.org/problem?id=3691 http://acm.hdu.edu.cn/showproblem.php?pid=2457 DNA repair Time Limit: 2000MS   Memory Limit: 65536K Total Submissions: 5690   Accepted: 2669 Description Biologists finally invent techniques of repairing DNA that contain

POJ 3691 DNA repair AC自动机 + DP

题意:给你只包含‘A’,‘G’,‘T’,‘C’四个字母的n个模板串和1个文本串,问你文本串改变多少个字符就可以使得文本串中没有一个模板串 解题思路: 我们可以知道  dp[i][j] 为文本串到 第i 个字符  AC自动机状态为j的最少的变换次数(这里为什么要用AC自动机,因为end数组可以记录哪一个状态是结束的,而且处理以后可以知道那些后缀等于前缀--也就是不能到达,因为如果能够到达的话那么状态更新就会产生错误.),这样dp即可 解题代码: 1 // File Name: temp.cpp 2

hdu 2825 Wireless Password(ac自动机&amp;dp)

Wireless Password Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Submission(s): 4022    Accepted Submission(s): 1196 Problem Description Liyuan lives in a old apartment. One day, he suddenly found that there

HDU 2457 DNA repair (AC自动机 + DP)

题目链接:DNA repair 解析:给出n个致病DNA序列,给一段DNA片段,问最少修改多少个碱基才能修复这段DNA序列中的所有致病序列. AC自动机 + DP. 将n个致病DNA序列构成一个自动机. 令DP[i][j]表示长度为i走到节点j是所需改变的最少个数. 状态转移时,枚举下一步所有可能的碱基,然后判断该碱基是否达到匹配状态,若能,则安全转移,继续枚举下一个碱基:否则在不匹配的前提下,看该碱基加入之后是否跟上一状态相同,若不同,则需修复,即计数加一.若相同,直接转移即可.然后选择其中最

[AC自动机+dp] hdu 2457 DNA repair

题意: 给N个单词,再给一个串str (只含A.G.C.T) 问对于str要至少修改几个字符能不含有N个单词 思路: 建立trie图,做自动机dp dp[i][j] 代表走过str的i个字母在j节点至少需要修改几个字符 trie *p=node[j]->next[k]; if(p->mark) continue; //不可达 dp[i][p->id]=min(dp[i][p->id],dp[i-1][j]+(getid(fuck[i])!=k)); 就是第i步从节点j走到对应的k,

HDU - 2457 DNA repair

Description Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing tec

poj3691--DNA repair(AC自动机+dp)

DNA repair Time Limit: 2000MS   Memory Limit: 65536K Total Submissions: 5743   Accepted: 2693 Description Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a

HDU 2825 Wireless Password AC自动机+dp

训练赛第二场的I题,上完体育课回来就把这题过了,今天训练赛rank1了,还把大大队虐了,而且我还过了这道题 (虽然我也就过了这道题...),第一次在比赛中手写AC自动机还带dp的,心情大好. 给一个字符串集合,求包含该集合超过K个字符的,长度为L的字符串的个数. 显然是在AC自动机上跑dp,设dp[u][L][k]表示当前在结点u,还要走L步,当前状态为k的个数.一开始第三维表示的是包含k个字符串,但是题目要求不含重复的,那就只能状压了.转移为dp[u][L][k]+=dp[v][L-1][nk