拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一,于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R.
Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠,使得非线性优化这门学科在一夜之间突飞猛进。在之后的20年里,拟牛顿方法得到了蓬勃发展,出现了大量的变形公式以及数以百计的相关论文。
拟牛顿法和最速下降法(Steepest Descent Methods)一样只要求每一步迭代时知道目标函数的梯度。通过测量梯度的变化,构造一个目标函数的模型使之足以产生超线性收敛性。这类方法大大优于最速下降法,尤其对于困难的问题。另外,因为拟牛顿法不需要二阶导数的信息,所以有时比牛顿法(Newton‘s
Method)更为有效。如今,优化软件中包含了大量的拟牛顿算法用来解决无约束,约束,和大规模的优化问题。
拟牛顿法的基本思想如下。首先构造目标函数在当前迭代
的二次模型:
这里
是一个对称正定矩阵,于是我们取这个二次模型的最优解作为搜索方向,并且得到新的迭代点
,其中我们要求步长
满足Wolfe条件。这样的迭代与牛顿法类似,区别就在于用近似的Hesse矩阵
代替真实的Hesse矩阵。所以拟牛顿法最关键的地方就是每一步迭代中矩阵
的更新。现在假设得到一个新的迭代
,并得到一个新的二次模型:
我们尽可能地利用上一步的信息来选取
。具体地,我们要求
,从而得到
DFP方法
记
,
,
,DFP公式为
其实现如下所示:
#!/usr/bin/python # -*- coding:utf8 -*- import random import numpy as np import math #f(x,y) = (x-2)^2+(y-1)^2 + 1 def solution(grad_func) : rate = 0.3 #Gk必须保证为正定矩阵 #Gk = np.eye(2) Gk = np.diag([random.uniform(0.0001, 100), random.uniform(0.0001, 100)]) x = random.uniform(-10000,10000) y = random.uniform(-10000,10000) point = np.array([[x, y]]).transpose() # numpy.array中是初始化是按行的 grad_last = grad_func(point[0][0], point[1][0]).transpose() if reduce(lambda a,b: math.sqrt(a*a+b*b), [grad_last[i][0] for i in xrange(grad_last.shape[0])]) < 0.000001 : return get_point_coordinate(point) for index in xrange(0, 10000) : pk = -1 * Gk.dot(grad_last) # 寻找最优的rate #rate = grad_last.transpose().dot(pk)[0][0] / (pk.transpose().dot(pk)[0][0]) * (-0.5);print "rate", rate point = point + rate * pk grad = grad_func(point[0][0], point[1][0]).transpose() print "grad", grad.transpose();#print "Gk", Gk if reduce(lambda a,b: math.sqrt(a*a+b*b), [grad[i][0] for i in xrange(grad.shape[0])]) < 0.000001 : break delta_k = rate * pk y_k = (grad - grad_last) Pk = delta_k.dot(delta_k.transpose()) / (delta_k.transpose().dot(y_k)) Qk= Gk.dot(y_k).dot(y_k.transpose()).dot(Gk) / (y_k.transpose().dot(Gk).dot(y_k)) * (-1.) Gk += Pk + Qk grad_last = grad print "times of iterate : %s" % index return get_point_coordinate(point) def get_point_coordinate(point): return point[0][0], point[1][0] if __name__ == "__main__" : x, y = solution(lambda a,b: np.array([[2*(a-2), 2*(b-1)]])) print "minimum point of f(x,y) = (x-2)^2+(y-1)^2 + 1 : (%s,%s)" % (x, y)
执行结果为:
grad [[-165661.89194611 -109405.68739563]] grad [[-100881.92784944 -98677.83692405]] grad [[-86031.40619151 24003.06422296]] grad [[-58462.97031009 16589.21632892]] grad [[-40924.16152279 11612.14537593]] grad [[-28646.91295682 8128.50214772]] grad [[-20052.83906991 5689.95150292]] grad [[-14036.98734894 3982.96605204]] grad [[-9825.89114426 2788.07623643]] grad [[-6878.12380098 1951.6533655 ]] grad [[-4814.68666069 1366.15735585]] grad [[-3370.28066248 956.3101491 ]] grad [[-2359.19646374 669.41710437]] grad [[-1651.43752462 468.59197306]] grad [[-1156.00626723 328.01438114]] grad [[-809.20438706 229.6100668 ]] grad [[-566.44307094 160.72704676]] grad [[-396.51014966 112.50893273]] grad [[-277.55710476 78.75625291]] grad [[-194.28997333 55.12937704]] grad [[-136.00298133 38.59056393]] grad [[-95.20208693 27.01339475]] grad [[-66.64146085 18.90937632]] grad [[-46.6490226 13.23656343]] grad [[-32.65431582 9.2655944 ]] grad [[-22.85802107 6.48591608]] grad [[-16.00061475 4.54014126]] grad [[-11.20043033 3.17809888]] grad [[-7.84030123 2.22466922]] grad [[-5.48821086 1.55726845]] grad [[-3.8417476 1.09008792]] grad [[-2.68922332 0.76306154]] grad [[-1.88245632 0.53414308]] grad [[-1.31771943 0.37390015]] grad [[-0.9224036 0.26173011]] grad [[-0.64568252 0.18321108]] grad [[-0.45197776 0.12824775]] grad [[-0.31638443 0.08977343]] grad [[-0.2214691 0.0628414]] grad [[-0.15502837 0.04398898]] grad [[-0.10851986 0.03079229]] grad [[-0.0759639 0.0215546]] grad [[-0.05317473 0.01508822]] grad [[-0.03722231 0.01056175]] grad [[-0.02605562 0.00739323]] grad [[-0.01823893 0.00517526]] grad [[-0.01276725 0.00362268]] grad [[-0.00893708 0.00253588]] grad [[-0.00625595 0.00177511]] grad [[-0.00437917 0.00124258]] grad [[-0.00306542 0.00086981]] grad [[-0.00214579 0.00060886]] grad [[-0.00150205 0.0004262 ]] grad [[-0.00105144 0.00029834]] grad [[-0.00073601 0.00020884]] grad [[-0.0005152 0.00014619]] grad [[-0.00036064 0.00010233]] grad [[ -2.52450311e-04 7.16322521e-05]] grad [[ -1.76715217e-04 5.01425765e-05]] grad [[ -1.23700652e-04 3.50998035e-05]] grad [[ -8.65904565e-05 2.45698625e-05]] grad [[ -6.06133196e-05 1.71989037e-05]] grad [[ -4.24293237e-05 1.20392326e-05]] grad [[ -2.97005266e-05 8.42746283e-06]] grad [[ -2.07903686e-05 5.89922398e-06]] grad [[ -1.45532580e-05 4.12945679e-06]] grad [[ -1.01872806e-05 2.89061975e-06]] grad [[ -7.13109643e-06 2.02343382e-06]] grad [[ -4.99176750e-06 1.41640368e-06]] grad [[ -3.49423725e-06 9.91482574e-07]] grad [[ -2.44596608e-06 6.94037802e-07]] grad [[ -1.71217625e-06 4.85826461e-07]] grad [[ -1.19852338e-06 3.40078523e-07]] grad [[ -8.38966364e-07 2.38054966e-07]] times of iterate : 73 minimum point of f(x,y) = (x-2)^2+(y-1)^2 + 1 : (1.99999958052,1.00000011903)
可以看到最终的解与实际解(2, 1)非常接近!
时间: 2024-08-05 21:36:13