optim.py cs231n

n如果有错误,欢迎指出,不胜感激

import numpy as np

"""
This file implements various first-order update rules that are commonly used for
training neural networks. Each update rule accepts current weights and the
gradient of the loss with respect to those weights and produces the next set of
weights. Each update rule has the same interface:

def update(w, dw, config=None):

Inputs:
  - w: A numpy array giving the current weights.
  - dw: A numpy array of the same shape as w giving the gradient of the
    loss with respect to w.
  - config: A dictionary containing hyperparameter values such as learning rate,
    momentum, etc. If the update rule requires caching values over many
    iterations, then config will also hold these cached values.

Returns:
  - next_w: The next point after the update.
  - config: The config dictionary to be passed to the next iteration of the
    update rule.

NOTE: For most update rules, the default learning rate will probably not perform
well; however the default values of the other hyperparameters should work well
for a variety of different problems.

For efficiency, update rules may perform in-place updates, mutating w and
setting next_w equal to w.
"""

def sgd(w, dw, config=None):
  """
  Performs vanilla stochastic gradient descent.

  config format:
  - learning_rate: Scalar learning rate.
  """
  if config is None: config = {}
  config.setdefault(‘learning_rate‘, 1e-2)
  w -= config[‘learning_rate‘] * dw
  return w, config

def sgd_momentum(w, dw, config=None):
  """
  Performs stochastic gradient descent with momentum.

  config format:
  - learning_rate: Scalar learning rate.
  - momentum: Scalar between 0 and 1 giving the momentum value.
    Setting momentum = 0 reduces to sgd.
  - velocity: A numpy array of the same shape as w and dw used to store a moving
    average of the gradients.
  """
  if config is None: config = {}
  config.setdefault(‘learning_rate‘, 1e-2)
  config.setdefault(‘momentum‘, 0.9)
  v = config.get(‘velocity‘, np.zeros_like(w))

  next_w = None
  v=v*config[‘momentum‘]-config[‘learning_rate‘]*dw
  next_w=w+v
  config[‘velocity‘] = v

  return next_w, config

def rmsprop(x, dx, config=None):
  """
  Uses the RMSProp update rule, which uses a moving average of squared gradient
  values to set adaptive per-parameter learning rates.

  config format:
  - learning_rate: Scalar learning rate.
  - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared
    gradient cache.
  - epsilon: Small scalar used for smoothing to avoid dividing by zero.
  - cache: Moving average of second moments of gradients.
  """
  if config is None: config = {}
  config.setdefault(‘learning_rate‘, 1e-2)
  config.setdefault(‘decay_rate‘, 0.99)
  config.setdefault(‘epsilon‘, 1e-8)
  config.setdefault(‘cache‘, np.zeros_like(x))

  next_x = None

  cache=config[‘cache‘]*config[‘decay_rate‘]+(1-config[‘decay_rate‘])*dx**2
  next_x=x-config[‘learning_rate‘]*dx/np.sqrt(cache+config[‘epsilon‘])
  config[‘cache‘]=cache

  return next_x, config

def adam(x, dx, config=None):
  """
  Uses the Adam update rule, which incorporates moving averages of both the
  gradient and its square and a bias correction term.

  config format:
  - learning_rate: Scalar learning rate.
  - beta1: Decay rate for moving average of first moment of gradient.
  - beta2: Decay rate for moving average of second moment of gradient.
  - epsilon: Small scalar used for smoothing to avoid dividing by zero.
  - m: Moving average of gradient.
  - v: Moving average of squared gradient.
  - t: Iteration number.
  """
  if config is None: config = {}
  config.setdefault(‘learning_rate‘, 1e-3)
  config.setdefault(‘beta1‘, 0.9)
  config.setdefault(‘beta2‘, 0.999)
  config.setdefault(‘epsilon‘, 1e-8)
  config.setdefault(‘m‘, np.zeros_like(x))
  config.setdefault(‘v‘, np.zeros_like(x))
  config.setdefault(‘t‘, 0)
  config[‘t‘]+=1
  这个方法比较综合,各种方法的好处吧
  m=config[‘beta1‘]*config[‘m‘]+(1-config[‘beta1‘])*dx  # now to change by  acc
  v=config[‘beta2‘]*config[‘v‘]+(1-config[‘beta2‘])*dx**2
  config[‘m‘]=m
  config[‘v‘]=v
  m=m/(1-config[‘beta1‘]**config[‘t‘])
  v=v/(1-config[‘beta2‘]**config[‘t‘])

  next_x=x-config[‘learning_rate‘]*m/np.sqrt(v+config[‘epsilon‘])

  return next_x, config

  

n

时间: 2024-10-11 01:43:53

optim.py cs231n的相关文章

cnn.py cs231n

n import numpy as np from cs231n.layers import * from cs231n.fast_layers import * from cs231n.layer_utils import * class ThreeLayerConvNet(object): """ A three-layer convolutional network with the following architecture: conv - relu - 2x2 m

layers.py cs231n

如果有错误,欢迎指出,不胜感激. import numpy as np def affine_forward(x, w, b): 第一个最简单的 affine_forward简单的前向传递,返回 out,cache """ Computes the forward pass for an affine (fully-connected) layer. The input x has shape (N, d_1, ..., d_k) and contains a minibat

Pytorch1.3源码解析-第一篇

pytorch$ tree -L 1 . ├── android ├── aten ├── benchmarks ├── binaries ├── c10 ├── caffe2 ├── CITATION ├── cmake ├── CMakeLists.txt ├── CODEOWNERS ├── CONTRIBUTING.md ├── docker ├── docs ├── ios ├── LICENSE ├── Makefile ├── modules ├── mypy-files.txt

『cs231n』Faster_RCNN(待续)

前言 研究了好一阵子深度学习在计算机视觉方面的实际应用意义不大的奇技淫巧,感觉基本对研究生生涯的工作没啥直接的借鉴意义,硬说收获的话倒是加深了对tensorflow的理解,是时候回归最初的兴趣点--物体检测了,实际上对cs231n的Faster RCNN讲解理解的不是很好,当然这和课上讲的比较简略也是有关系的,所以特地重新学习一下,参考文章链接在这,另: Faster RCNN github : https://github.com/rbgirshick/py-faster-rcnn Faste

CS231n - CNN for Visual Recognition Assignment1 ---- KNN

CS231n - CNN for Visual Recognition Assignment1 -- KNN 这作业怎么这么难,特别是对于我这种刚接触Python的- 反正能做出来的就做,做不出来的我就先抄别人的-.就当加深下对课程理解吧-. k_nearest_neighbor.py中主要有: compute_distances_two_loops compute_distances_one_loop compute_distances_no_loops predict_labels # -*

深度学习斯坦福cs231n 课程笔记

前言 对于深度学习,新手我推荐先看UFLDL,不做assignment的话,一两个晚上就可以看完.毕竟卷积.池化啥的并不是什么特别玄的东西.课程简明扼要,一针见血,把最基础.最重要的点都点出来 了. cs231n这个是一个完整的课程,内容就多了点,虽然说课程是computer vision的,但80%还是深度学习的内容.图像的工作暂时用不上,我就先略过了. 突然发现这两个课程都是斯坦福的,牛校就是牛. 课程主页 http://vision.stanford.edu/teaching/cs231n

CS231n - CNN for Visual Recognition Assignment1 ---- SVM

CS231n - CNN for Visual Recognition Assignment1 -- SVM 做不出来, 我抄别人的--O(∩_∩)O~ linear_svm.py import numpy as np from random import shuffle def svm_loss_naive(W, X, y, reg): """ Structured SVM loss function, naive implementation (with loops).

CS231n assignment2 Q4 Convolutional Networks

终于来到了卷积网络 首先完成最基本的前向传播: def conv_forward_naive(x, w, b, conv_param): """ A naive implementation of the forward pass for a convolutional layer. The input consists of N data points, each with C channels, height H and width W. We convolve each

使用TDD理解views.py与urls.py的关系

首先必须对MVC的概念有初步的认识,django也遵循这样一套规范,views.py相当于视图函数,是整个架构中的处理引擎,而urls.py的作用就是将用户请求送入这样的引擎. 项目结构: urls.py: from django.conf.urls import include, url from django.contrib import admin urlpatterns = [ # Examples: #url(r'^$', 'mysite.views.home', name='home