Dynamic Programming | Set 4 (Longest Common Subsequence)

首先来看什么是最长公共子序列:给定两个序列,找到两个序列中均存在的最长公共子序列的长度。子序列需要以相关的顺序呈现,但不必连续。例如,“abc”, “abg”, “bdf”, “aeg”, ‘”acefg”等都是“abcdefg”的子序列。因此,一个长度为n的序列拥有2^n中可能的子序列(序列中的每一个元素只有选或者不选两种可能,因此是2^n)。

Example:

LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3.
LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

该问题最普通的解法是对两个给定序列分别生成所有子序列,然后找到最长的匹配的子序列。这样的解法是指数复杂度的,显然不是我们需要的。我们来看该问题是如何拥有动态规划问题的重要性质的。

1 Optimal Substructure:

假设输入序列分别为长度为m的X[0..m-1]和长度为n的Y[0..n-1],令L(X[0..m-1], Y[0..n-1])为序列X、Y的最长公共子序列的长度,如下为L(X[0..m-1], Y[0..n-1])的递归定义:

If last characters of both sequences match (or X[m-1] == Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = 1 + L(X[0..m-2], Y[0..n-2])

If last characters of both sequences do not match (or X[m-1] != Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = MAX ( L(X[0..m-2], Y[0..n-1]), L(X[0..m-1], Y[0..n-2])

例子:

1) Consider the input strings “AGGTAB” and “GXTXAYB”. Last characters match for the strings. So length of LCS can be written as:
L(“AGGTAB”, “GXTXAYB”) = 1 + L(“AGGTA”, “GXTXAY”)

2) Consider the input strings “ABCDGH” and “AEDFHR. Last characters do not match for the strings. So length of LCS can be written as:
L(“ABCDGH”, “AEDFHR”) = MAX ( L(“ABCDG”, “AEDFHR”), L(“ABCDGH”, “AEDFH”) )

因此,LCS问题具有最优子结构性质,可以使用求解子问题的方案来解决。

2 Overlapping Subproblems:

如下是LCS问题的递归求解程序,该实现遵循了上面的递归结构:

/* A Naive recursive implementation of LCS problem */
#include<stdio.h>
#include<stdlib.h>

int max(int a, int b);

/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n )
{
   if (m == 0 || n == 0)
     return 0;
   if (X[m-1] == Y[n-1])
     return 1 + lcs(X, Y, m-1, n-1);
   else
     return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));
}

/* Utility function to get max of 2 integers */
int max(int a, int b)
{
    return (a > b)? a : b;
}

/* Driver program to test above function */
int main()
{
  char X[] = "AGGTAB";
  char Y[] = "GXTXAYB";

  int m = strlen(X);
  int n = strlen(Y);

  printf("Length of LCS is %d\n", lcs( X, Y, m, n ) );

  getchar();
  return 0;
}

以上程序的时间复杂度在最坏情况下是O(2^n),最坏情况是X与Y中的所有字符均不匹配,也就是说LCS的长度为0。

根据上面的实现,如下是当输入序列为“AXYT”和“AYZX”时的部分递归树:

不难发现,lcs(“AXY”, “AYZ”) 被计算了2次。如果我们画出完整的递归树,会找到更多被重复计算的子问题。因此,该问题具备重叠子结构性质,可以通过Memoization或者Tabulation来避免重复计算。下面是LCS问题的Tabulation实现。

/* Dynamic Programming implementation of LCS problem */
#include<stdio.h>
#include<stdlib.h>

int max(int a, int b);

/* Returns length of LCS for X[0..m-1], Y[0..n-1] */
int lcs( char *X, char *Y, int m, int n )
{
   int L[m+1][n+1];
   int i, j;

   /* Following steps build L[m+1][n+1] in bottom up fashion. Note
      that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */
   for (i=0; i<=m; i++)
   {
     for (j=0; j<=n; j++)
     {
       if (i == 0 || j == 0)
         L[i][j] = 0;

       else if (X[i-1] == Y[j-1])
         L[i][j] = L[i-1][j-1] + 1;

       else
         L[i][j] = max(L[i-1][j], L[i][j-1]);
     }
   }

   /* L[m][n] contains length of LCS for X[0..n-1] and Y[0..m-1] */
   return L[m][n];
}

/* Utility function to get max of 2 integers */
int max(int a, int b)
{
    return (a > b)? a : b;
}

/* Driver program to test above function */
int main()
{
  char X[] = "AGGTAB";
  char Y[] = "GXTXAYB";

  int m = strlen(X);
  int n = strlen(Y);

  printf("Length of LCS is %d\n", lcs( X, Y, m, n ) );

  getchar();
  return 0;
}

以上实现的时间复杂度为O(mn),相比原始递归求解的最坏情况要好太多了。

上面的程序只是返回了LCS的长度,可以参照该文章来打印LCS Printing Longest Common Subsequence

时间: 2024-12-30 20:40:49

Dynamic Programming | Set 4 (Longest Common Subsequence)的相关文章

[Algorithms] Using Dynamic Programming to Solve longest common subsequence problem

Let's say we have two strings: str1 = 'ACDEB' str2 = 'AEBC' We need to find the longest common subsequence, which in this case should be 'AEB'. Using dynamic programming, we want to compare by char not by whole words. we need memo to keep tracking th

Dynamic Programming | Set 3 (Longest Increasing Subsequence)

在 Dynamic Programming | Set 1 (Overlapping Subproblems Property) 和 Dynamic Programming | Set 2 (Optimal Substructure Property) 中我们已经讨论了重叠子问题和最优子结构性质,现在我们来看一个可以使用动态规划来解决的问题:最长上升子序列(Longest Increasing Subsequence(LIS)). 最长上升子序列问题,致力于在一个给定的序列中找到一个最长的子序列

DP(dynamic programming)之LIS(longest increasing subsequence)问题(转)

今天回顾WOJ1398,发现了这个当时没有理解透彻的算法.看了好久好久,现在终于想明白了.试着把它写下来,让自己更明白. 最长递增子序列,Longest Increasing Subsequence 下面我们简记为 LIS.排序+LCS算法 以及 DP算法就忽略了,这两个太容易理解了. 假设存在一个序列d[1..9] = 2 1 5 3 6 4 8 9 7,可以看出来它的LIS长度为5.下面一步一步试着找出它.我们定义一个序列B,然后令 i = 1 to 9 逐个考察这个序列.此外,我们用一个变

[Algorithms] Longest Common Subsequence

The Longest Common Subsequence (LCS) problem is as follows: Given two sequences s and t, find the length of the longest sequence r, which is a subsequence of both s and t. Do you know the difference between substring and subequence? Well, substring i

1143. Longest Common Subsequence

link to problem Description: Given two strings text1 and text2, return the length of their longest common subsequence. A subsequence of a string is a new string generated from the original string with some characters(can be none) deleted without chan

Longest Common Subsequence

Problem statement: Given two strings, find the longest common subsequence (LCS). Your code should return the length of LCS. Have you met this question in a real interview? Yes Clarification What's the definition of Longest Common Subsequence? https:/

uva10405 - Longest Common Subsequence(LIS,最长共同自序列)

题目:uva10405 - Longest Common Subsequence(LIS,最长共同自序列) 题目大意:找出两个字符串中的最长公共的子序列. 解题思路:这类问题是第一次接触,不知道怎么做.百度了一下,发现了递推公式:dp[i][j]:代表第一个字符串的前i个字符和第二个字符串的前j个字符比较能得到的最长的公共子序列.s[i] == s[j] ,dp[i][j] = dp[i - 1][j - 1] + 1: s[i] != s[j] , dp[i][j] = Max (dp[i][

[HackerRank] The Longest Common Subsequence

This is the classic LCS problem. Since it requires you to print one longest common subsequence, just use the O(m*n)-space version here. My accepted code is as follows. 1 #include <iostream> 2 #include <vector> 3 #include <algorithm> 4 5

HDU 2253 Longest Common Subsequence Again

其实这个题我还不会,学长给了一个代码交上去过了,据说用到了一种叫做位压缩的技术,先贴代码吧,以后看懂了再来写 #include <stdio.h> #include <string.h> #define M 30005 #define SIZE 128 #define WORDMAX 3200 #define BIT 32 char s1[M], s2[M]; int nword; unsigned int str[SIZE][WORDMAX]; unsigned int tmp1