it can be solved by Dynamical Programming.Here are some useful link:
Tutorial and Code: http://www.cs.cornell.edu/~wdtseng/icpc/notes/dp3.pdf
A practice: http://people.csail.mit.edu/bdean/6.046/dp/ (then click Balanced Partition)
What‘s more, please note that if the scale of problem is damn large (like you have 5 million numbers etc.), you won‘t want to use DP which needs a too huge matrix. If this is the case, you want to use a kind of Monte Carlo Algorithm:
1. divide n numbers into two groups randomly (or use your method at this step if you like);
2. choose one number from each group,
3. if (to swap these two number decrease the difference of sum) swap them;
4. repeat step 2 until "no swap occurred for a long time".
You don‘t want to expect this method could always work out with the best answer, but it is the only way I know to solve this problem at very large scale within reasonable time and memory.
====== 找出一组数字中, 和为整个数组和的一半(或任意其他值)的子集 =======
Dynamic Programming: Partition
假设数组C[]特别大, n>10000这样, 这需要先计算出和, 假设和为N, 创建一个大小为N+1的boolean数组T[], 全部设为false. 在以下的处理中, 会将可以组合出的值, 比如s, 将T[s]置为true, 如果T[N/2]为true, 则出现符合要求的子集.1. 依次将C[i]与之前产生的T[j]=true的点, 组合成为新的和, 并将其下标的T置为true
2. j循环中, 需要从右往左进行, 以免因T赋值的下标出现到j的右侧, 而导致重复计算
bool T[10240]; bool partition( vector< int > C ) { // compute the total sum int n = C.size(); int N = 0; for( int i = 0; i < n; i++ ) N += C[i]; // initialize the table T[0] = true; for( int i = 1; i <= N; i++ ) T[i] = false; // process the numbers one by one for( int i = 0; i < n; i++ ) for( int j = N C[i]; j >= 0; j ) if( T[j] ) T[j + C[i]] = true; return T[N / 2]; }
优化:
1. 不需要每次从最右端开始, 记录每次j的最后值
2. 不需要计算到N, 到N/2就可以, 因为有sum=x的话, N-x也是存在的
3. 将C排序, 这样T的true下标增长会从慢->快
bool T[10240]; bool partition( vector< int > C ) { // compute the total sum and sort C int n = C.size(); int N = 0; for( int i = 0; i < n; i++ ) N += C[i]; sort( C.begin(), C.end() ); // initialize the table T[0] = true; for( int i = 1; i <= N; i++ ) T[i] = false; int R = 0; // rightmost true entry // process the numbers one by one for( int i = 0; i < n; i++ ) { for( int j = R; j >= 0; j ) if( T[j] ) T[j + C[i]] = true; R = min( N / 2, R + C[i] ); } return T[N / 2]; }
====== 如果数组每个成员的数量是无限的, 成为一个询问对于某个值, 可能的组合数量的问题 =======
比如如何组合出M这个数字.
这时候将j从左往右遍历就行了, 因为这个T下标会不断重复计算而增长同时, T不再是boolean数组, 而是int数组, 初始化为0, 每次命中, 值都增加1, 最后T[M]的值就是组合数量
int T[10240]; int coins( vector< int > C, int N ) { // initialize the table T[0] = 1; for( int i = 1; i <= N; i++ ) T[i] = 0; // process the numbers one by one for( int i = 0; i < n; i++ ) for( int j = 0; j + C[i] <= N; j++ ) T[j + C[i]] += T[j]; return T[N]; }
====== 如果数组成员有重复, 但个数有限, 依旧询问组合数量 =======
比如对于每个C[i], 其数量是D[i], 这时候要引入第三层循环k, 限制循环的次数, 方向也要改为由右至左
bool T[10240]; bool partition( vector< int > C, vector< int > D ) { // compute the total sum (value) int n = C.size(); int N = 0; for( int i = 0; i < n; i++ ) N += C[i] * D[i]; // initialize the table T[0] = true; for( int i = 1; i <= N; i++ ) T[i] = false; int R = 0; // rightmost true entry // process the numbers one by one for( int i = 0; i < n; i++ ) { for( int j = R; j >= 0; j ) if( T[j] ) for( int k = 1; k <= D[i] && j + k * C[i] <= N / 2; k++ ) T[j + k * C[i]] = true; R = min( N / 2, R + C[i] * D[i] ); } return T[N / 2]; }
====== 如果问题限制子集的集合大小, 询问组合数量 =======
这时候要做一个二维的表, k by n, k轴是数量, n轴是和, 然后用每行去生成下一行.
====== 对于标题的问题 =======
严格解
假设数组为C[], 其和为N
1. 将C[]排序
2. 构造维数组T[x], x < N+1, 初始化全部为-1, T[0]设为0
3. 从i小到大依次遍历C, 对于每个C[i]
5. 按j从N/2到0依次遍历T[], 对于每个T[j], 如果T[j]>=0, 且T[j+C[i]]<0, 则设T[j+C[i]]为i
6. 完成以上遍历, 寻找离T[N/2]最近的为true的点, 比如是T[M], 则N-2M就是最小的差绝对值.
7. 获取子集: 取T[M]的值a, 依次取T[M - C[a]]的值b, 取T[M - C[a] - C[b]]的值c....得到组合
追求性能的非严格解
假设数组为C[], 设定一个阈值为C[]数组大小, 比如M
1. 将C[]排序
2. 按顺序将C[]均分为C1[]和C2[]两个数组
3. 按相同的顺序, 依次尝试以下三种操作
a) 移动C1[i] 到 C2[]
b) 移动C2[j] 到 C1[]
c) 交换C1[i] 和 C2[j]
如果产生的新C1[]和C2[]其和的差值减小, 则重复本操作
否则i和j依次增加, 重复本次操作
4. 如果在M个上一步操作中, 无变更产生, 则调整完成