Leetcode分类解析:组合算法
所谓组合算法就是指:在解决一些算法问题时,需要产生输入数据的各种组合、排列、子集、分区等等,然后逐一确认每种是不是我们要的解。从广义上来说,组合算法可以包罗万象,甚至排序、各种搜索算法都可以算进去。最近读《The Algorithm Design Manual》时了解到这种归类,上网一查,甚至有专门的书籍讲解,而且Knuth的巨著TAOCP的第四卷就叫组合算法,看来还真是孤陋寡闻了!于是最近着重专攻了一下Leetcode中所有相关题目,在此整理一下学习心得。题量重要,质量也重要!
1.分类地图
个人以为,以组合算法为一大类是非常好的分类方式,比目前网上看到的一些类似穷举、BFS、DFS的分类方法要清晰得多。那首先来看一下组合算法在本系列所处的位置,以及它可以细分为几小块吧:
- 基础结构(Fundamentals)
1.1 数组和链表(Array&List):插入、删除、旋转等操作。
1.2 栈和队列(Stack&Queue):栈的典型应用。
1.3 树(Tree):构建、验证、遍历、转换。
1.4 字符串(String):转换、搜索、运算。
- 积木块(Building Block)
2.1 哈希表(Hashing)
2.2 分治(Divide-and-Conquer)
2.3 排序(Sorting)
2.4 二分查找(Binary Search)
- 高级算法(Advanced):
3.1 组合算法(Combinatorial Algorithm):
- 回溯(Backtracking)
- 组合(Combination)
- 子集(Subset)
- 排列(Permutation)
- 分区(Partition)
3.2 贪心算法(Greedy Algorithm):贪心的典型应用。
3.3 动态规划(Dynamic Programming):广泛应用DP求最优解。
- 其他杂项(Misc):
4.1 数学(Math)
4.2 位运算(Bit Manipulation)
4.3 矩阵(Matrix)
2.解题策略
关于组合算法的解题策略,红宝书《The Algorithm Design Manual》的第7章和第14章有详细的介绍。如果还嫌不够的话,可以参考Knuth的宏篇巨著《The Art of Computer Programming》4a卷。回溯是列举所有可能解来实现组合算法的典型技术。《The Algorithm Design Manual》除了基本问题外,还介绍了一些巧妙的剪枝技术,在此就不涉及了,还是以Leetcode为蓝本,避免跑题。
2.1 递归调用、深度搜索和回溯技术
初学时,难免对回溯、DFS、递归三者的关系理不清,感觉好像都是一个东西,其实不然。要想理清这三者的关系,先递归,再DFS/BFS,最后看一下回溯就一目了然了。
2.1.1 递归(Recursion)
递归可以用来实现各种符合递归结构的算法(正在专门写一篇递归的文章《程序设计基石:递归》)。从实现机制上来说,递归只是编程语言提供给我们的一种程序编写方式,在操作系统运行程序时用栈Frame来帮我们实现。从解决问题的方式上来说,与循环从前往后的解决问题方式类似,递归是自底向上,先得到更小问题的解,再逐步合并成大问题的解。以Leetcode习题为例,最典型的就是递归实现的Divide-and-Conquer策略,能够解决一大类问题,所以它绝不限于DFS和回溯问题。
2.1.2 深度/广度优先搜索(DFS/BFS)
而所谓的深度优先DFS、广度优先BFS,则是属于图(树)范畴的术语,特指逐步遍历图中各结点的方式。一般来说,DFS用递归实现比较方便,因为我们充分利用编程语言提供的便利,将Stack的维护问题交给OS去管理了。当然我们完全可以忽视这种遍历,自己显示用循环+Stack方式实现。而BFS则一般要用循环+Queue的方式去实现,因为没有像递归那样的简便实现机制,所以稍显麻烦一些。以Leetcode习题为例,树的前中后序遍历都属于DFS,而Level序遍历(及像ZigZag各种变种问题)都属于BFS。
2.1.3 回溯(Backtracking)
终于说到了回溯,《The Algorithm Design Manual》给出了定义:”Backtracking can be viewed as a depth-first search on an implicit graph”,用树(图)来表示递归的执行过程的话(这是很自然的研究递归的方式),回溯就是在隐式图上执行的DFS搜索来构建解的方式。所谓隐式图在第5章开头有解释:它指的就是我们不会一上来就把回溯的整个递归过程形成的树(图)完全生成出来,而是随着回溯的执行,一点点构建,有点像游戏里的打地图。其实很容易理解,因为很多问题不是要找到所有解,而是判断有解就可以结束回溯了,所以没必要走完整个搜索空间。当然,上面的定义并不绝对,为什么不用BFS呢?《The Algorithm Design Manual》的解释是:因为对大部分问题来说,执行过程树的高度不会太高,但在树中每向下走一层时,树的宽度会急剧地指数级增长,所以我们一般选择DFS来实现回溯。而且得益于每次递归时的出入参拷贝和进出函数,我们可以轻松地管理好回溯的状态。以Leetcode习题为例,比较例外必须用BFS实现否则就过不了的问题就是:127-Word Ladder和130-Surrounded Regions。前者是因为问题本身的特性,每向下一层时的同时会缩小搜索空间,所以BFS不会产生太大问题。而后者则没什么理由,不用BFS就过不了在线Judge。所以就当成一道典型的从DFS回溯向BFS回溯转换的问题来练习吧。
典型问题有:79-Word Search和130-Surrounded Regions。
2.2 典型子问题
在《The Algorithm Design Manual》中给出了一个模板式代码,可以看作是解决组合问题的标准样板。有关键几步是可以根据不同问题定制的:1)判断是否是解:例如已经访问到最深、或达到target条件等。此时根据题目要求,可能会直接返回,可能要保存好路径。2)构造候选值:在开始新一轮的递归调用前,先确认候选值的范围。此处也是剪枝的关键!3)递归调用:可以显示声明参数k表示递归深度,也可以利用path的大小或target值递减等隐式的方式确定何时返回。
void backtrack(result, path, k, input) {
if (isSolution()) { // k==input.length, target==0...
result.add(path);
return; // return, return true, return 1...
}
candidates = construct();
for (c : candidates) {
backtrack(result, path, k+1, input);
}
}
下面这些子问题基本都是按照这个套路来的。说是解决组合问题的模板,其实要是对递归理解深了也就不需要死记硬背什么模板代码。组合算法高度依赖于强大的递归,所以它的解题策略简单说来就是递归技术的特化。本着自底向上的递归通用解决问题方法,自然而然就能设计出正确的程序。关于通用的解法,留着专门讲递归时再说吧。
2.2.1 组合(Combination)和子集(Subset)
组合其实和子集问题(Subset)很类似,应该可以算作子集问题的一类。所以在《The Algorithm Design Manual》中并没有单独介绍组合的小节,而是都归做子集一类。具体解决就是:对于组合,用path表示被选中的元素。对于子集,用path布尔型数组(等于候选元素长度)表示特定位置对应的候选值是否被选中(True和False表示)。
典型问题有:77-Combinations和78-Subsets。
2.2.3 排列(Permutation)
排列问题要稍微复杂一些,当然前面提到的模板依然适用,但是排列还有一种组合和子集问题所不具有的特殊问题:Next Permutation,即计算出指定第k个排列,或给定第k个计算第k+1个。《The Algorithm Design Manual》中提供了两种方法:Rank/Unrank(康托展开,Cantor Expansion)和Incremental Change(递增法)。前者能够直接算出每个元素在第k个排列的位置,但因为要使用k!阶乘,所以对于比较大的整数需要特殊处理。而增量法则是根据第k个计算第k+1个排列。这两种方法的详细解释请参考灵魂机器中下面典型问题的解法,写的很清楚,在此就不细说了~
典型问题有:46-Permutations(通用方法)、31-Next Permutation(递增法)、60-Permutation Sequence(康托法)。
2.2.4 分区(Partition)
依旧采用通用模板就能解决,只不过用path数组保存分隔位置的索引。
典型问题有:131-Palindrome Partitioning。
3.习题列表
3.1 回溯(Backtracking)
3.1.1 深度优先(DFS)
62-Unique Paths (Medium): A robot is located at the top-left corner of a m x n grid (marked ‘Start’ in the diagram below). The robot can only move either down or right at any point in time. The robot is trying to reach the bottom-right corner of the grid (marked ‘Finish’ in the diagram below). How many possible unique paths are there?
63-Unique Paths II (Medium): Follow up for “Unique Paths”: Now consider if some obstacles are added to the grids. How many unique paths would there be? An obstacle and empty space is marked as 1 and 0 respectively in the grid.
For example, There is one obstacle in the middle of a 3x3 grid as illustrated below.
[
[0,0,0],
[0,1,0],
[0,0,0]
]
The total number of unique paths is 2.
Note: m and n will be at most 100.
70-Climbing Stairs (Easy): You are climbing a stair case. It takes n steps to reach to the top. Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?
79-Word Search (Medium): Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where “adjacent” cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.
For example, Given board =
[
[‘A’,’B’,’C’,’E’],
[‘S’,’F’,’C’,’S’],
[‘A’,’D’,’E’,’E’]
]
word = “ABCCED”, -> returns true,
word = “SEE”, -> returns true,
word = “ABCB”, -> returns false.
93-Restore IP Addresses (Medium): Given a string containing only digits, restore it by returning all possible valid IP address combinations.
For example: Given “25525511135”, return [“255.255.11.135”, “255.255.111.35”]. (Order does not matter)
Hint: 有陷阱!关于0的处理。同时,一般来说,递归调用的条件判断都可以挪到进入递归后的Base Case中,但这一题是个例外,因为递归前要调用substring(),不提前判断就会异常。但为了清晰,我依旧保持之前的结构,在Base Case做了一些处理。
3.1.2 广度优先(BFS)
127-Word Ladder (Medium): Given two words (beginWord and endWord), and a dictionary’s word list, find the length of shortest transformation sequence from beginWord to endWord, such that:
1.Only one letter can be changed at a time
2.Each intermediate word must exist in the word list
For example,
Given:
beginWord = “hit”
endWord = “cog”
wordList = [“hot”,”dot”,”dog”,”lot”,”log”]
As one shortest transformation is “hit” -> “hot” -> “dot” -> “dog” -> “cog”, return its length 5.
Note:
Return 0 if there is no such transformation sequence.
All words have the same length.
All words contain only lowercase alphabetic characters.
Hint:与其他的遍历不同,此题每个Level向下遍历时都可以删除掉已访问的元素,而不用管同Level的是否会失去访问机会。想想为什么?如果不是这样的话,那用BFS搜索的同时还要记录每条遍历路径,因为是BFS所以不能借用递归,这就太复杂了!这也是题目-126复杂之处。
126-Word Ladder II: Given two words (beginWord and endWord), and a dictionary’s word list, find all shortest transformation sequence(s) from beginWord to endWord, such that:
1.Only one letter can be changed at a time
2.Each intermediate word must exist in the word list
For example,
Given:
beginWord = “hit”
endWord = “cog”
wordList = [“hot”,”dot”,”dog”,”lot”,”log”]
Return
[
[“hit”,”hot”,”dot”,”dog”,”cog”],
[“hit”,”hot”,”lot”,”log”,”cog”]
]
Note:
All words have the same length.
All words contain only lowercase alphabetic characters.
130-Surrounded Regions (Medium): Given a 2D board containing ‘X’ and ‘O’, capture all regions surrounded by ‘X’. A region is captured by flipping all ‘O’s into ‘X’s in that surrounded region.
For example,
X X X X
X O O X
X X O X
X O X X
After running your function, the board should be:
X X X X
X X X X
X X X X
X O X X
Hint: 用DFS仔细实现好了,以为一遍能过,结果卡在大数据集上了…… 通过这个例子,能让我们了解如何用BFS实现Backtracking,从而更深刻地理解多种实现方式下回溯的本质是什么。虽然能通过Judge了,但性能依旧不好,因为像Queue和path的记录等方面肯定还有文章可做。但主要用作了解BFS的话,能通过了就可以了。
3.2 组合(Combination)
17-Letter Combinations of a Phone Number (Medium): Given a digit string, return all possible letter combinations that the number could represent. A mapping of digit to letters (just like on the telephone buttons) is given below.
Input:Digit string “23”
Output: [“ad”, “ae”, “af”, “bd”, “be”, “bf”, “cd”, “ce”, “cf”].
77-Combinations (Medium): Given two integers n and k, return all possible combinations of k numbers out of 1 … n.
For example, If n = 4 and k = 2, a solution is:
[
[2,4],
[3,4],
[2,3],
[1,2],
[1,3],
[1,4],
]
22-Generate Parentheses (Medium): Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
For example, given n = 3, a solution set is:
[
“((()))”,
“(()())”,
“(())()”,
“()(())”,
“()()()”
]
Hint: 不是简单的将子问题的解拼接就行了,这一题的合并部分还是挺复杂的。
3.3 子集(Subset)
78-Subsets (Medium): Given a set of distinct integers, nums, return all possible subsets.
Note: The solution set must not contain duplicate subsets.
For example, If nums = [1,2,3], a solution is:
[
[3],
[1],
[2],
[1,2,3],
[1,3],
[2,3],
[1,2],
[]
]
90-Subsets II (Medium): Given a collection of integers that might contain duplicates, nums, return all possible subsets.
Note: The solution set must not contain duplicate subsets.
For example, If nums = [1,2,2], a solution is:
[
[2],
[1],
[1,2,2],
[2,2],
[1,2],
[]
]
39-Combination Sum (Medium): Given a set of candidate numbers (C) and a target number (T), find all unique combinations in C where the candidate numbers sums to T. The same repeated number may be chosen from C unlimited number of times.
Note: All numbers (including target) will be positive integers. The solution set must not contain duplicate combinations.
For example, given candidate set [2, 3, 6, 7] and target 7,
A solution set is:
[
[7],
[2, 2, 3]
]
Hint:虽然叫Combination,但这其实是一道Subset题。因为允许重复,所以Subset问题中保存是否出现的bool数组,要扩展为出现次数的int数组。在这道扩展问题中犯了几个典型错误:
- 因为要找sum是target的subset并允许重复,所以k不一定要到最后一个candidates就可以终止递归。如果不提前剪枝的话,Online judge会超时。
- 在target=0要保存到最终结果时,千万不要修改状态数组,万一不小心while(present[i]– > 0)了,就会影响回溯后的执行了,切记!
- 一个小技巧:每次判断target=0都要遍历状态数组,可以利用递归时入参来避免这种消耗
40-Combination Sum II (Medium): Given a collection of candidate numbers (C) and a target number (T), find all unique combinations in C where the candidate numbers sums to T. Each number in C may only be used once in the combination.
Note: All numbers (including target) will be positive integers. The solution set must not contain duplicate combinations.
For example, given candidate set [10, 1, 2, 7, 6, 1, 5] and target 8,
A solution set is:
[
[1, 7],
[1, 2, 5],
[2, 6],
[1, 1, 6]
]
Hint:这一题对性能要求比1高,不仅要target==0剪枝,还要target<0剪枝(题目里说了,所有数包括target肯定都是正数),否则就会超时。因为之前做题目112-Path Sum时误剪了枝,因为Path可能为负数,所以这题就没敢多此一举。看来何时剪枝,如何剪,都是学问!
3.4 排列(Permutation)
30-Substring with Concatenation of All Words (Hard): You are given a string, s, and a list of words, words, that are all of the same length. Find all starting indices of substring(s) in s that is a concatenation of each word in words exactly once and without any intervening characters.
For example, given: s: “barfoothefoobarman”, words: [“foo”, “bar”]
You should return the indices: [0,9]. (order does not matter).
46-Permutations (Medium): Given a collection of distinct numbers, return all possible permutations.
For example, [1,2,3] have the following permutations:
[
[1,2,3],
[1,3,2],
[2,1,3],
[2,3,1],
[3,1,2],
[3,2,1]
]
47-Permutations II (Medium): Given a collection of numbers that might contain duplicates, return all possible unique permutations.
For example, [1,1,2] have the following unique permutations:
[
[1,1,2],
[1,2,1],
[2,1,1]
]
31-Next Permutation (Medium): Implement next permutation, which rearranges numbers into the lexicographically next greater permutation of numbers. If such arrangement is not possible, it must rearrange it as the lowest possible order (ie, sorted in ascending order). The replacement must be in-place, do not allocate extra memory.
Here are some examples. Inputs are in the left-hand column and its corresponding outputs are in the right-hand column.
1,2,3 → 1,3,2
3,2,1 → 1,2,3
1,1,5 → 1,5,1
Hint: 注意因为元素可以相等,所以第一和第二步查找i和j位置时,要考虑相等的情况。
60-Permutation Sequence (Medium): The set [1,2,3,…,n] contains a total of n! unique permutations. By listing and labeling all of the permutations in order, We get the following sequence (ie, for n = 3):
“123”
“132”
“213”
“231”
“312”
“321”
Given n and k, return the kth permutation sequence.
Note: Given n will be between 1 and 9 inclusive.
Hint: 采用所谓的康托编码,此题更像是一个数学游戏……
89-Gray Code (Medium): The gray code is a binary numeral system where two successive values differ in only one bit. Given a non-negative integer n representing the total number of bits in the code, print the sequence of gray code. A gray code sequence must begin with 0.
For example, given n = 2, return [0,1,3,2]. Its gray code sequence is:
00 - 0
01 - 1
11 - 3
10 - 2
Note: For a given n, a gray code sequence is not uniquely defined.
For example, [0,2,3,1] is also a valid gray code sequence according to the above definition.
3.5 分区(Partition)
131-Palindrome Partitioning (Medium): Given a string s, partition s such that every substring of the partition is a palindrome. Return all possible palindrome partitioning of s.
For example, given s = “aab”,
Return
[
[“aa”,”b”],
[“a”,”a”,”b”]
]
Hint: 只需对一侧进行递归,否则那就是分治了!
131-Palindrome Partitioning II (Hard): Given a string s, partition s such that every substring of the partition is a palindrome. Return the minimum cuts needed for a palindrome partitioning of s.
For example, given s = “aab”, Return 1 since the palindrome partitioning [“aa”,”b”] could be produced using 1 cut.