Longest Common Substring

Problem Statement

Give two string $s_1$ and $s_2$, find the longest common substring (LCS). E.g: X = [111001], Y = [11011], the longest common substring is [110] with length 3.

One terse way is to use Dynamic Programming (DP) to analyze the complex problem.

Instead of dealing with irregular substring, we can first deal with substring indexed by last character.

Define $dp[i][j] =$ the length of longest common substring of $s_1[0$~$i]$ and $s_2[0$~$j]$ ending with $s1[i]$ and $s2[j]$.

Then, the maximum LCS length could be the maximum number in array $dp$.

In order to get the value of $dp[i][j]$, we need to know if $s1[i]$ == $s2[j]$. If it is, then the $dp[i][j] = dp[i-1][j-1]+1$, else it‘ll be zero. Thus:

dp[i][j] = (s1[i] == s2[j] ? (dp[i-1][j-1] + 1) : 0);

As we want to know the concrete string with LCM, we just need to do a few modifications.

When we get a larger $dp[i][j]$ than present maxLength, we‘ll update the maxLength by $dp[i][j]$.

if(dp[i][j] > maxLen)
    maxLen = dp[i][j];

At the same time, we can also record the starting index of the new longer substring. For string $s_1$, the beginning index of LCM is the present index $i$ adding 1 minus the length of LCM, i.e.

if(dp[i][j] > maxLen){
    maxLen = dp[i][j];
    maxIndex = i + 1 - maxLen;
}

Finally, we need to initialize state of $dp$. That‘s simple:

for(int i = 0; i < s1.length(); ++i)
    dp[i][0] = (s1[i] == s2[0] ? 1 : 0);

for(int j = 0; j < s2.length(); ++j)
    dp[0][j] = (s1[0] == s2[j] ? 1 : 0);


The complete code is:

void (const string s1, const string s2, int &sIndex, int &length)
{
    n1 = s1.length();
    n2 = s2.length();

    if(0 == n1 || 0 == n2)
    {
        sIndex = -1;
        length = 0;
        return;
    }

    // initialize dp
    vector<vector<int> > dp;
    for(int i = 0; i < n1; ++i){
        vector<int> tmp;
        tmp.push_back((s1[i] == s2[0] ? 1 : 0));
        for(int j = 1; j < n2; ++j)
        {
            if(0 == i){
                tmp.push_back((s1[0] == s2[j] ? 1 : 0));
            }else{
                tmp.push_back(0);
            }
        }

        dp.push_back(tmp);
    }

    // compute max length and index
    length = 0;
    for(int i = 1; i < n1; ++i){
        for(int j = 1; j < n2; ++j){
            if(st1[i] == st2[j])
                dp[i][j] = dp[i-1][j-1] + 1;

            if(dp[i][j] > length){
                length = dp[i][j];
                sIndex = i + 1 - length;
            }
        }
    }

}
时间: 2024-10-13 16:18:00

Longest Common Substring的相关文章

Longest Common Substring(最长公共子序列)

Longest Common Substring Time Limit: 8000/4000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others) Total Submission(s): 37 Accepted Submission(s): 28   Problem Description Given two strings, you have to tell the length of the Longest Common Su

hdu1403 Longest Common Substring

地址:http://acm.split.hdu.edu.cn/showproblem.php?pid=1403 题目: Longest Common Substring Time Limit: 8000/4000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)Total Submission(s): 6296    Accepted Submission(s): 2249 Problem Description Give

[Algorithms] Longest Common Substring

The Longest Common Substring (LCS) problem is as follows: Given two strings s and t, find the length of the longest string r, which is a substring of both s and t. This problem is a classic application of Dynamic Programming. Let's define the sub-pro

HDU 1403 Longest Common Substring(后缀数组,最长公共子串)

hdu题目 poj题目 参考了 罗穗骞的论文<后缀数组——处理字符串的有力工具> 题意:求两个序列的最长公共子串 思路:后缀数组经典题目之一(模版题) //后缀数组sa:将s的n个后缀从小到大排序后将 排序后的后缀的开头位置 顺次放入sa中,则sa[i]储存的是排第i大的后缀的开头位置.简单的记忆就是“排第几的是谁”. //名次数组rank:rank[i]保存的是suffix(i){后缀}在所有后缀中从小到大排列的名次.则 若 sa[i]=j,则 rank[j]=i.简单的记忆就是“你排第几”

后缀自动机(SAM) :SPOJ LCS - Longest Common Substring

LCS - Longest Common Substring no tags A string is finite sequence of characters over a non-empty finite set Σ. In this problem, Σ is the set of lowercase letters. Substring, also called factor, is a consecutive sequence of characters occurrences at

lintcode 中等题:longest common substring 最长公共子串

题目 最长公共子串 给出两个字符串,找到最长公共子串,并返回其长度. 样例 给出A=“ABCD”,B=“CBCE”,返回 2 注意 子串的字符应该连续的出现在原字符串中,这与子序列有所不同. 解题 注意: 子序列:这个序列不是在原字符串中连续的位置,而是有间隔的,如:ABCDE  和AMBMCMDMEM 最长公共子序列是ADCDE 子串:子串一定在原来字符串中连续存在的.如:ABCDEF 和SSSABCDOOOO最长公共子串是ABCD 参考链接,讲解很详细 根据子串定义,暴力破解 public

spoj 1811 LCS - Longest Common Substring (后缀自动机)

spoj 1811 LCS - Longest Common Substring 题意: 给出两个串S, T, 求最长公共子串. 限制: |S|, |T| <= 1e5 思路: dp O(n^2) 铁定超时 后缀数组 O(nlog(n)) 在spoj上没试过,感觉也会被卡掉 后缀自动机 O(n) 我们考虑用SAM读入字符串B; 令当前状态为s,同时最大匹配长度为len; 我们读入字符x.如果s有标号为x的边,那么s=trans(s,x),len = len+1; 否则我们找到s的第一个祖先a,它

spoj 1812 LCS2 - Longest Common Substring II (后缀自动机)

spoj 1812 LCS2 - Longest Common Substring II 题意: 给出最多n个字符串A[1], ..., A[n], 求这n个字符串的最长公共子串. 限制: 1 <= n <= 10 |A[i]| <= 1e5 思路: 和spoj 1811 LCS差不多的做法 把其中一个A建后缀自动机 考虑一个状态s, 如果A之外的其他串对它的匹配长度分别是a[1], a[2], ..., a[n - 1], 那么min(a[1], a[2], ..., a[n - 1]

spoj1811 Longest Common Substring,后缀自动机

spoj1811LCS 问两个字符串最长公共子串. 做法很简单.匹配成功,则tl++,失败,从父指针回退,tl=t[now].len. 从这题可以清楚了解后缀自动机fa指针的性质: 指向一个状态,这个状态的接受串s[x..x+i]是与当前状态的接受串后缀s[j-i..j]匹配是最长的一个. 这里是不是发现了一个和KMP很像的性质? KMP在失配时通过next数组回退,那么这个回退到的位置i是s[0..i]与当前串的后缀s[j-i..j]匹配最长的一个. 所以. 利用后缀自动机可以求解一个串的子串

【SPOJ】1812. Longest Common Substring II(后缀自动机)

http://www.spoj.com/problems/LCS2/ 发现了我原来对sam的理解的一个坑233 本题容易看出就是将所有匹配长度记录在状态上然后取min后再对所有状态取max. 但是不要忘记了一点:更新parent树的祖先. 为什么呢?首先如果子树被匹配过了,那么长度一定大于任意祖先匹配的长度(甚至有些祖先匹配长度为0!为什么呢,因为我们在匹配的过程中,只是找到一个子串,可能还遗漏了祖先没有匹配到,这样导致了祖先的记录值为0,那么在对对应状态去min的时候会取到0,这样就wa了.而