前言
上篇文章介绍《Rabin-Karp字符串匹配算法》,这里介绍有限自动机(Finite
Automata)字符串匹配算法,有限自动机(Finite Automata)字符串匹配算法最主要的是计算出转移函数。即给定一个当前状态k和一个字符x,计算下一个状态;计算方法为:找出模式pat的最长前缀prefix,同时也是pat[0...k-1]x(注意:字符串下标是从0开始)的后缀,则prefix的长度即为下一个状态。匹配的过程是比较输入文本子串和模式串的状态值,若相等则存在,若不想等则不存在。有关理论知识参考《算法导论》,这里不对理论知识进行讲解。
有限自动机字符串匹配算法
若模式串pat的长度为m,则状态值为0-m,即有m+1个状态,初始状态为0。其中numbers=NO_OF_CHARS为输入字符表的个数,从以下的源码可以知道,计算转移函数(即预处理)的时间复杂度为,匹配时间复杂度为。该算法可以根据后面介绍的KMP算法进行改进,对求解转移函数的过程进行改进可以得到比较好的时间复杂度。
以下是模式串为P=abababaca的自动机执行过程
源码实现
#include<iostream> #include<string> using namespace std; const int NO_OF_CHARS = 256;//the numbers of input alphabet int getNextState(const string &pat, int M, int state, int x) { // If the character c is same as next character in pattern, // then simply increment state if (state < M && x == pat[state]) return state+1; int ns, i; // ns stores the result which is next state // ns finally contains the longest prefix which is also suffix // in "pat[0..state-1]c" // Start from the largest possible value and stop when you find // a prefix which is also suffix for (ns = state; ns > 0; ns--) { if(pat[ns-1] == x) { for(i = 0; i < ns-1; i++) { if (pat[i] != pat[state-ns+1+i]) break; } if (i == ns-1) return ns; } } return 0; } /* This function builds the TF table which represents Finite Automata for a given pattern */ void compute_Transition_Function(const string &pat, int M, int TF[][NO_OF_CHARS]) { int state, x; for (state = 0; state <= M; ++state) for (x = 0; x < NO_OF_CHARS; ++x)//for each charater c in the inout alphabet table TF[state][x] = getNextState(pat, M, state, x); } /* Prints all occurrences of pat in txt */ void Finite_Automata_search(const string &pat, const string &txt) { int M = pat.length(); int N = txt.length(); int TF_len = M+1; //this is supported by C++11 int TF[TF_len][NO_OF_CHARS];//the state of transform table, stores the states. compute_Transition_Function(pat, M, TF);//compute the state of Transition Function // Process txt over FA. int state=0;//inite the state for (int i = 0; i < N; i++) { state = TF[state][txt[i]]; if (state == M) cout<<"patterb found at index is:"<<i-M+1<<endl; } } int main() { string txt = "Finite Automata Algorithm: Finite Automata"; string pat = "Auto"; Finite_Automata_search(pat, txt); system("pause"); return 0; }
参考资料:
《算法导论》
http://www.geeksforgeeks.org/searching-for-patterns-set-5-finite-automata/
http://my.oschina.net/amince/blog/182210
时间: 2024-11-07 08:11:21