POJ2778---DNA Sequence(AC自动机+矩阵)

Description

It’s well known that DNA Sequence is a sequence only contains A, C, T and G, and it’s very useful to analyze a segment of DNA Sequence,For example, if a animal’s DNA sequence contains segment ATC then it may mean that the animal may have a genetic disease. Until now scientists have found several those segments, the problem is how many kinds of DNA sequences of a species don’t contain those segments.

Suppose that DNA sequences of a species is a sequence that consist of A, C, T and G,and the length of sequences is a given integer n.

Input

First line contains two integer m (0 <= m <= 10), n (1 <= n <=2000000000). Here, m is the number of genetic disease segment, and n is the length of sequences.

Next m lines each line contain a DNA genetic disease segment, and length of these segments is not larger than 10.

Output

An integer, the number of DNA sequences, mod 100000.

Sample Input

4 3

AT

AC

AG

AA

Sample Output

36

Source

POJ Monthly–2006.03.26,dodo

比较简单的自动机dp,由于n很大,用矩阵来加快转移

/*************************************************************************
    > File Name: POJ2778.cpp
    > Author: ALex
    > Mail: [email protected]
    > Created Time: 2015年03月10日 星期二 21时21分26秒
 ************************************************************************/

#include <map>
#include <set>
#include <queue>
#include <stack>
#include <vector>
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <algorithm>
using namespace std;

const double pi = acos(-1);
const int inf = 0x3f3f3f3f;
const double eps = 1e-15;
typedef long long LL;
typedef pair <int, int> PLL;

const int mod = 100000;
const int MAX_NODE = 110;
const int CHILD_NUM = 4;

struct MARTIX
{
    LL mat[MAX_NODE][MAX_NODE];
};

MARTIX mul(MARTIX a, MARTIX b, int L)
{
    MARTIX c;
    for (int i = 0; i < L; ++i)
    {
        for (int j = 0; j < L; ++j)
        {
            c.mat[i][j] = 0;
            for (int k = 0; k < L; ++k)
            {
                c.mat[i][j] += a.mat[i][k] * b.mat[k][j];
                c.mat[i][j] %= mod;
            }
        }
    }
    return c;
}

MARTIX fastpow(MARTIX ret, int n, int L)
{
    MARTIX ans;
    for (int i = 0; i < L; ++i)
    {
        for (int j = 0; j < L; ++j)
        {
            ans.mat[i][j] = (i == j);
        }
    }
    while (n)
    {
        if (n & 1)
        {
            ans = mul(ans, ret, L);
        }
        n >>= 1;
        ret = mul(ret, ret, L);
    }
    return ans;
}

struct AC_Automation
{
    int next[MAX_NODE][CHILD_NUM];
    int fail[MAX_NODE];
    int end[MAX_NODE];
    int root, L;

    int newnode()
    {
        for (int i = 0; i < CHILD_NUM; ++i)
        {
            next[L][i] = -1;
        }
        end[L++] = 0;
        return L - 1;
    }

    void init()
    {
        L = 0;
        root = newnode();
    }

    int ID(char c)
    {
        if (c == ‘A‘)
        {
            return 0;
        }
        if (c == ‘G‘)
        {
            return 1;
        }
        if (c == ‘C‘)
        {
            return 2;
        }
        if (c == ‘T‘)
        {
            return 3;
        }
    }

    void Build_Trie(char buf[])
    {
        int now = root;
        int len = strlen(buf);
        for (int i = 0; i < len; ++i)
        {
            if (next[now][ID(buf[i])] == -1)
            {
                next[now][ID(buf[i])] = newnode();
            }
            now = next[now][ID(buf[i])];
        }
        end[now] = 1;
    }

    void Build_AC()
    {
        queue <int> qu;
        fail[root] = root;
        for (int i = 0; i < CHILD_NUM; ++i)
        {
            if (next[root][i] == -1)
            {
                next[root][i] = root;
            }
            else
            {
                fail[next[root][i]] = root;
                qu.push(next[root][i]);
            }
        }
        while (!qu.empty())
        {
            int now = qu.front();
            qu.pop();
            if (end[fail[now]])
            {
                end[now] = 1;
            }
            for (int i = 0; i < CHILD_NUM; ++i)
            {
                if (next[now][i] == -1)
                {
                    next[now][i] = next[fail[now]][i];
                }
                else
                {
                    fail[next[now][i]] = next[fail[now]][i];
                    qu.push(next[now][i]);
                }
            }
        }
    }

    void solve(int n)
    {
        MARTIX c;
        for (int i = 0; i < L; ++i)
        {
            for (int j = 0; j < L; ++j)
            {
                c.mat[i][j] = 0;
            }
        }
        for (int i = 0; i < L; ++i)
        {
            if(end[i])
            {
                continue;
            }
            for (int j = 0; j < CHILD_NUM; ++j)
            {
                if (end[next[i][j]])
                {
                    continue;
                }
                ++c.mat[i][next[i][j]];
            }
        }
        MARTIX x = fastpow(c, n, L);
        LL ans = 0;
        for (int i = 0; i < L; ++i)
        {
            if (!end[i])
            {
                ans += x.mat[0][i];
                ans %= mod;
            }
        }
        printf("%lld\n", ans);
    }
}AC;

char buf[20];

int main ()
{
    int m, n;
    while (~scanf("%d%d", &m, &n))
    {
        AC.init();
        for (int i = 1; i <= m; ++i)
        {
            scanf("%s", buf);
            AC.Build_Trie(buf);
        }
        AC.Build_AC();
        AC.solve(n);
    }
    return 0;
}
时间: 2024-08-06 11:25:55

POJ2778---DNA Sequence(AC自动机+矩阵)的相关文章

[poj2778]DNA Sequence(AC自动机+矩阵快速幂)

解题关键:卡时限过的,正在找原因中. 1 #include<cstdio> 2 #include<cstring> 3 #include<algorithm> 4 #include<cstdlib> 5 #include<cstring> 6 #include<iostream> 7 #include<queue> 8 using namespace std; 9 typedef long long ll; 10 cons

POJ POJ 2778 DNA Sequence AC自动机 + 矩阵快速幂

首先建立Trie和失败指针,然后你会发现对于每个节点 i 匹配AGCT时只有以下几种情况: i 节点有关于当前字符的儿子节点 j 且安全,则i 到 j找到一条长度为 1的路. i 节点有关于当前字符的儿子节点 j 且 不安全,则i 到 j没有路. i 节点没有关于当前字符的儿子节点 但是能通过失败指针找到一个安全的节点j,那么 i 到 j 找到一条长度为1的路. 关于节点安全的定义: 当前节点不是末节点且当前节点由失败指针指回跟节点的路径上不存在不安全节点,那么这个节点就是安全节点. 然后问题就

poj 2778 DNA Sequence(AC自动机+矩阵快速幂)

题目链接:poj 2778 DNA Sequence 题目大意:给定一些含有疾病的DNA序列,现在给定DNA长度,问有多少种不同的DNA序列是健康的. 解题思路:对DNA片段建立AC自动机,因为最多10个串,每个串最长为10,所以最多可能有100个节点,在长度为n时 以每个节点终止的健康字符串个数形成一个状态集,通过AC自动机形成的边可以推导出n+1的状态集,走到单词节点是 非法的,所以同样的我们可以先走到单词节点,但是从单词节点不向后转移.这样可以构造一个矩阵,剩下的就是矩阵 快速幂.注意的一

POJ 2778 DNA Sequence (AC自动机,矩阵乘法)

题意:给定n个不能出现的模式串,给定一个长度m,要求长度为m的合法串有多少种. 思路:用AC自动机,利用AC自动机上的节点做矩阵乘法. 1 #include<iostream> 2 #include<cstdio> 3 #include<cstring> 4 #include<cmath> 5 #include<string> 6 #include<algorithm> 7 #include<queue> 8 #defin

Poj 2778 DNA Sequence (AC自动机+矩阵)

题目大意: 给出N个串,问在长度为L的所有串中,不包含任一已知串的个数有多少个. 思路分析: 已知一个矩阵A,A[i][j] 表示 节点i 到 节点 j 有一条变可以到达的方法数. 那么A^2 ,这个矩阵的 [i][j] 就代表这个节点 i 到节点 j 有两条边可以到达的方法数. 那么知道这个结论,我们要做的就是求一个节点到另外一个节点,要经过L条变(对应这长度为L的单词),而又要满足任意一条边都不能经过已知单词. 所以我们要用到ac自动机处理出所有已知的单词,在ac自动机上得到这个矩阵,使得任

poj2778DNA Sequence (AC自动机+矩阵快速幂)

转载请注明出处: http://www.cnblogs.com/fraud/          ——by fraud DNA Sequence Time Limit: 1000MS   Memory Limit: 65536K Description It's well known that DNA Sequence is a sequence only contains A, C, T and G, and it's very useful to analyze a segment of DN

poj2778--DNA Sequence(AC自动机+矩阵优化)

DNA Sequence Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 12252   Accepted: 4661 Description It's well known that DNA Sequence is a sequence only contains A, C, T and G, and it's very useful to analyze a segment of DNA Sequence,For ex

POJ2778 DNA Sequence AC自动机上dp

网址:https://vjudge.net/problem/POJ-2778 题意: 给出字符集${A,C,G,T}$和一些字符串(长度不超过$10$,且数量不超过$10$个),求长度为$n(n \leq 2e9)$的字符串中不包括上面这些字符串的字符串的数量. 题解: 我们可以先考虑一种方式:设$dp(i,j)$是用了$i$个字符拼出符合题意的长度为$j$的字符串的数量,在本题中$dp(i,j)=\sum _{j' \subseteq j} dp(i-1,j')$,显然时间复杂度是指数级的,不

POJ 2778 DNA Sequence (AC自动机 + 矩阵快速幂)

题目链接:DNA Sequence 解析:AC自动机 + 矩阵加速(快速幂). 这个时候AC自动机 的一种状态转移图的思路就很透彻了,AC自动机就是可以确定状态的转移. AC代码: #include <iostream> #include <cstdio> #include <queue> #include <cstring> using namespace std; const int MOD = 100000; struct Matrix{ int ma