Problem:
Given a string S and a string T, count the number of distinct subsequences of T in S.
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie,
"ACE"
is a subsequence of"ABCDE"
while"AEC"
is not).Here is an example:
S ="rabbbit"
, T ="rabbit"
Return
3
.
We maintain a matrix P[m][n], where m is the length of T plus 1 and n is the length of S plus 1. The matrix P is as follows:
In matrix P[m][n], each element P[i][j] represents the distinct subsequences of T[1...i] in S[1…j] (indexes beginning at 1).
Note that the 1st row (P[0][0...m-1]) of matrix P is 1, and the 1st column (P[1...n-1][0]) of matrix P is 0.
The 1st row of P means that if T is NULL, the number of subsequences of T in S. So they are all 1.
The 1st column of P means that if S is NULL, the number of subsequences of T in S. So they are all 0 except P[0][0].
After the definition of P, let‘s see how we can build matrix P step by step.
Assume we are going to calculate P[i][j], and the elements with lower row and column indexes of P[i][j] are already calculated before.
1.
First we consider the situation of T[i] not equal to S[j]. In this case, the subsequences of T[1…i] in S[1…j] must also be in S[1…j-1]. For example, T[1…i] is “ace”, S[1…j] is “adceb”. The subsequence identified with bold characters of “adceb” is what we need, and is also the subsequence of T[1…i] in S[1…j-1], where S[1…j-1] is “adce”.
So when T[i] is not equal to S[j], P[i][j] = P[i][j-1].
2.
Now we consider the situation of T[i] equal to S[j].
As we explained previously, P[i][j] means the number of subsequences of T[1...i] in S[1...j] (indexes beginning at 1). We divide the subsequences in S[1…j] into 2 sets: the ones not containing S[j] and the ones containing S[j].
For example, T[1…i] is “ace”, S[1…j] is “adcebe”. The subsequence identified with bold characters of “adcebe” belongs to the 1st set, which, as we defined, doesn’t contain S[j]. While the subsequence “ace” identified with bold characters of “adcebe” belongs to the 2nd set which contains S[j] (in this case S[j] is the last character ‘e’),
So in matrix P, given P[i][j], how to find the number of subsequences in these 2 sets?
To begin with, let’s see the 1st set, where subsequences of T[1…i] in S[1…j] don’t contain S[j]. It is clear that the scenario is the same as looking for the subsequences of T[1…i] in S[1…j-1]. And that is exactly the definition of P[i][j-1]. So the number of subsequences in 1st set is P[i][j-1].
On the other hand, which element (or a couple of elements) in matrix P implies the number of subsequences of the 2nd set? Let’s analyse the previous example again for a moment. In that example, S[1…j] is “adcebe” and T[1…i] is “ace”. One of the subsequences of T[1…i] in S[1…j] that belongs to the 2nd set (must containing S[j]) is “adcebe”. Since the last character ‘e’of T[i] must correspond to the last character ‘e’ of S[j], the characters before last character ‘e’ of T[i], which is “ac” (more generally T[1…i-1]) must appear in the characters before last character ‘e’ of S[j], which is “adceb” (S[1…j-1]). Then how many subsequences of T[1…i-1] are in S[1…j-1]? That has already been told by the value of P[i-1][j-1].
To sum up, when T[i] is equal to S[j], for P[i][j], which keeps track of how many subsequences of T[1…i] are in S[1…j], it can be divided into 2 number separately corresponding to 2 sets of subsequences. The 1st number is equal to P[i][j-1], recording the number of subsequences in the 1st set where S[j] must not be contained in the subsequences. The 2nd number is equal to P[i-1][j-1], telling the number of subsequences in the 2nd set where S[j] must be contained in the subsequences. To demonstrate this correlation, we have a formulation which is:
P[i][j] = P[i][j-1] + P[i-1] [j-1]
We combine the 2 situations of T[i] not equal to or equal to S[j], we derive this formulation:
P[i][j] = P[i][j-1] +
0 // if T[i] != S[j]
P[i-1][j-1] // if T[i] == S[j]
So given the 1st row and the 1st column of matrix P, with the formulation we summarised, we can construct the matrix P one element after another. And the element on the right corner element P[m-1][n-1] is what we need.