算法——字符串匹配之有限自动机算法

前言

上篇文章介绍《Rabin-Karp字符串匹配算法》，这里介绍有限自动机(Finite
Automata)字符串匹配算法，有限自动机(Finite Automata)字符串匹配算法最主要的是计算出转移函数。即给定一个当前状态k和一个字符x，计算下一个状态；计算方法为：找出模式pat的最长前缀prefix，同时也是pat[0...k-1]x(注意：字符串下标是从0开始)的后缀，则prefix的长度即为下一个状态。匹配的过程是比较输入文本子串和模式串的状态值，若相等则存在，若不想等则不存在。有关理论知识参考《算法导论》，这里不对理论知识进行讲解。

有限自动机字符串匹配算法

若模式串pat的长度为m，则状态值为0-m，即有m+1个状态，初始状态为0。其中numbers=NO_OF_CHARS为输入字符表的个数，从以下的源码可以知道，计算转移函数(即预处理)的时间复杂度为，匹配时间复杂度为。该算法可以根据后面介绍的KMP算法进行改进，对求解转移函数的过程进行改进可以得到比较好的时间复杂度。

以下是模式串为P=abababaca的自动机执行过程

源码实现

#include<iostream>
#include<string>

using namespace std;

const int NO_OF_CHARS = 256;//the numbers of input alphabet

int getNextState(const string &pat, int M, int state, int x)
{
    // If the character c is same as next character in pattern,
    // then simply increment state
    if (state < M && x == pat[state])
        return state+1;

    int ns, i;  // ns stores the result which is next state

    // ns finally contains the longest prefix which is also suffix
    // in "pat[0..state-1]c"

    // Start from the largest possible value and stop when you find
    // a prefix which is also suffix
    for (ns = state; ns > 0; ns--)
    {
        if(pat[ns-1] == x)
        {
            for(i = 0; i < ns-1; i++)
            {
                if (pat[i] != pat[state-ns+1+i])
                    break;
            }
            if (i == ns-1)
                return ns;
        }
    }

    return 0;
}

/* This function builds the TF table which represents Finite Automata for a
   given pattern  */
void compute_Transition_Function(const string &pat, int M, int TF[][NO_OF_CHARS])
{
    int state, x;
    for (state = 0; state <= M; ++state)
        for (x = 0; x < NO_OF_CHARS; ++x)//for each charater c in the inout alphabet table
           TF[state][x] = getNextState(pat, M,  state, x);
}

/* Prints all occurrences of pat in txt */
void Finite_Automata_search(const string &pat, const string &txt)
{
    int M = pat.length();
    int N = txt.length();

    int TF_len = M+1;
    //this is supported by C++11
	int TF[TF_len][NO_OF_CHARS];//the state of transform table, stores the states.

    compute_Transition_Function(pat, M, TF);//compute the state of Transition Function 

    // Process txt over FA.
    int state=0;//inite the state
    for (int i = 0; i < N; i++)
    {
       state = TF[state][txt[i]];
       if (state == M)
			cout<<"patterb found at index is:"<<i-M+1<<endl;

    }
}

int main()
{
   string txt = "Finite Automata Algorithm: Finite Automata";
   string pat = "Auto";
   Finite_Automata_search(pat, txt);
   system("pause");
   return 0;
}

参考资料：

《算法导论》

http://www.geeksforgeeks.org/searching-for-patterns-set-5-finite-automata/

http://my.oschina.net/amince/blog/182210

时间： 2024-11-07 08:11:21

算法——字符串匹配之有限自动机算法

前言

有限自动机字符串匹配算法

算法——字符串匹配之有限自动机算法的相关文章

算法——字符串匹配之KMP算法

算法——字符串匹配之BM算法

算法——字符串匹配之Rabin-Karp算法

字符串匹配的KMP算法

[算法系列之二十六]字符串匹配之KMP算法

字符串匹配（KMP 算法含代码）

字符串匹配与KMP算法笔记

字符串匹配的KMP算法（转）

字符串匹配之---BF算法(暴力破解法)