CNN简单构建

第二次作业基本部分包含三部分,Q1: Two-layer Neural Network,Q2: Modular Neural Network,Q3: ConvNet on CIFAR-10。



Q1: Two-layer Neural Network

这部分将要实现一个两层的神经网络,包括前向传播与BP后向传播,以及梯度下降法的几种改进变型。

首先看neural_net.py,里面定义了两个函数:init_two_layer_model()与two_layer_net()。在后者中补充完整,前向传播、loss计算、梯度计算三个部分。

  1 import numpy as np
  2 import matplotlib.pyplot as plt
  3
  4 def init_two_layer_model(input_size, hidden_size, output_size):
  5   """
  6   Initialize the weights and biases for a two-layer fully connected neural
  7   network. The net has an input dimension of D, a hidden layer dimension of H,
  8   and performs classification over C classes. Weights are initialized to small
  9   random values and biases are initialized to zero.
 10
 11   Inputs:
 12   - input_size: The dimension D of the input data
 13   - hidden_size: The number of neurons H in the hidden layer
 14   - ouput_size: The number of classes C
 15
 16   Returns:
 17   A dictionary mapping parameter names to arrays of parameter values. It has
 18   the following keys:
 19   - W1: First layer weights; has shape (D, H)
 20   - b1: First layer biases; has shape (H,)
 21   - W2: Second layer weights; has shape (H, C)
 22   - b2: Second layer biases; has shape (C,)
 23   """
 24   # initialize a model
 25   model = {}
 26   model[‘W1‘] = 0.00001 * np.random.randn(input_size, hidden_size)
 27   model[‘b1‘] = np.zeros(hidden_size)
 28   model[‘W2‘] = 0.00001 * np.random.randn(hidden_size, output_size)
 29   model[‘b2‘] = np.zeros(output_size)
 30   return model
 31
 32 def two_layer_net(X, model, y=None, reg=0.0):
 33   """
 34   Compute the loss and gradients for a two layer fully connected neural network.
 35   The net has an input dimension of D, a hidden layer dimension of H, and
 36   performs classification over C classes. We use a softmax loss function and L2
 37   regularization the the weight matrices. The two layer net should use a ReLU
 38   nonlinearity after the first affine layer.
 39
 40   Inputs:
 41   - X: Input data of shape (N, D). Each X[i] is a training sample.
 42   - model: Dictionary mapping parameter names to arrays of parameter values.
 43     It should contain the following:
 44     - W1: First layer weights; has shape (D, H)
 45     - b1: First layer biases; has shape (H,)
 46     - W2: Second layer weights; has shape (H, C)
 47     - b2: Second layer biases; has shape (C,)
 48   - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is
 49     an integer in the range 0 <= y[i] < C. This parameter is optional; if it
 50     is not passed then we only return scores, and if it is passed then we
 51     instead return the loss and gradients.
 52   - reg: Regularization strength.
 53
 54   Returns:
 55   If y not is passed, return a matrix scores of shape (N, C) where scores[i, c]
 56   is the score for class c on input X[i].
 57
 58   If y is not passed, instead return a tuple of:
 59   - loss: Loss (data loss and regularization loss) for this batch of training
 60     samples.
 61   - grads: Dictionary mapping parameter names to gradients of those parameters
 62     with respect to the loss function. This should have the same keys as model.
 63   """
 64
 65   # unpack variables from the model dictionary
 66   W1,b1,W2,b2 = model[‘W1‘], model[‘b1‘], model[‘W2‘], model[‘b2‘]
 67   N, D = X.shape
 68
 69   # compute the forward pass
 70   scores = None
 71   #############################################################################
 72   # TODO: Perform the forward pass, computing the class scores for the input. #
 73   # Store the result in the scores variable, which should be an array of      #
 74   # shape (N, C).                                                             #
 75   #############################################################################
 76   N1 = np.dot(X, W1) + b1
 77   H1 = np.maximum(0, N1)
 78   scores = np.dot(H1, W2) + b2  #  output layer without activate function
 79   #############################################################################
 80   #                              END OF YOUR CODE                             #
 81   #############################################################################
 82
 83   # If the targets are not given then jump out, we‘re done
 84   if y is None:
 85     return scores
 86
 87   # compute the loss
 88   loss = None
 89   #############################################################################
 90   # TODO: Finish the forward pass, and compute the loss. This should include  #
 91   # both the data loss and L2 regularization for W1 and W2. Store the result  #
 92   # in the variable loss, which should be a scalar. Use the Softmax           #
 93   # classifier loss. So that your results match ours, multiply the            #
 94   # regularization loss by 0.5                                                #
 95   #############################################################################
 96   probs = np.exp(scores)
 97   probs /= np.sum(probs, axis = 1, keepdims = True)
 98   loss = -np.sum(np.log(probs[np.arange(N), y]))/N+0.5*reg*np.sum(W1*W1)+0.5*reg*np.sum(W2*W2)
 99   #############################################################################
100   #                              END OF YOUR CODE                             #
101   #############################################################################
102
103   # compute the gradients
104   grads = {}
105   #############################################################################
106   # TODO: Compute the backward pass, computing the derivatives of the weights #
107   # and biases. Store the results in the grads dictionary. For example,       #
108   # grads[‘W1‘] should store the gradient on W1, and be a matrix of same size #
109   #############################################################################
110   dscores = probs.copy()                              # N*C size
111   dscores[np.arange(N), y] -= 1
112   grads[‘W2‘] = (np.dot(H1.transpose(), dscores))/N + reg*W2  # H*C size
113   grads[‘b2‘] = np.sum(dscores, axis=0) / N
114   dH1 = (np.dot(dscores, W2.transpose())) / N         # N*H
115
116   delta_relu = N1.copy()        # N*H
117   delta_relu[delta_relu>=0] = 1
118   delta_relu[delta_relu<0] = 0
119   grads[‘W1‘] = np.dot(X.transpose(), dH1*delta_relu)+ reg*W1 # D*H
120   grads[‘b1‘] = np.sum(dH1*delta_relu, axis = 0)
121   #############################################################################
122   #                              END OF YOUR CODE                             #
123   #############################################################################
124
125   return loss, grads

forward pass部分,采取ReLu激活函数:

# compute the forward pass
  scores = None
  #############################################################################
  # TODO: Perform the forward pass, computing the class scores for the input. #
  # Store the result in the scores variable, which should be an array of      #
  # shape (N, C).                                                             #
  #############################################################################
  N1 = np.dot(X, W1) + b1
  H1 = np.maximum(0, N1)
  scores = np.dot(H1, W2) + b2  #  output layer without activate function
  #############################################################################
  #                              END OF YOUR CODE                             #
  #############################################################################

loss计算部分,加上L2正则:

# compute the loss
  loss = None
  #############################################################################
  # TODO: Finish the forward pass, and compute the loss. This should include  #
  # both the data loss and L2 regularization for W1 and W2. Store the result  #
  # in the variable loss, which should be a scalar. Use the Softmax           #
  # classifier loss. So that your results match ours, multiply the            #
  # regularization loss by 0.5                                                #
  #############################################################################
  probs = np.exp(scores)
  probs /= np.sum(probs, axis = 1, keepdims = True)
  loss = -np.sum(np.log(probs[np.arange(N), y]))/N+0.5*reg*np.sum(W1*W1)+0.5*reg*np.sum(W2*W2)
  #############################################################################
  #                              END OF YOUR CODE                             #
  #############################################################################

gradients计算部分,ReLu激活函数 max(x,0)的导数,是离散delta(x) = 1 if x>=0, 0 else

# compute the gradients
  grads = {}
  #############################################################################
  # TODO: Compute the backward pass, computing the derivatives of the weights #
  # and biases. Store the results in the grads dictionary. For example,       #
  # grads[‘W1‘] should store the gradient on W1, and be a matrix of same size #
  #############################################################################
  dscores = probs.copy()                              # N*C size
  dscores[np.arange(N), y] -= 1
  grads[‘W2‘] = (np.dot(H1.transpose(), dscores))/N + reg*W2  # H*C size
  grads[‘b2‘] = np.sum(dscores, axis=0) / N
  dH1 = (np.dot(dscores, W2.transpose())) / N         # N*H

  delta_relu = N1.copy()        # N*H
  delta_relu[delta_relu>=0] = 1
  delta_relu[delta_relu<0] = 0
  grads[‘W1‘] = np.dot(X.transpose(), dH1*delta_relu)+ reg*W1 # D*H
  grads[‘b1‘] = np.sum(dH1*delta_relu, axis = 0)
  #############################################################################
  #                              END OF YOUR CODE                             #
  #############################################################################

接下来,打开classifier_trainer.py,里面是一个名为ClassifierTrainer的class,是对分类器的训练过程。

import numpy as np

class ClassifierTrainer(object):
  """ The trainer class performs SGD with momentum on a cost function """
  def __init__(self):
    self.step_cache = {} # for storing velocities in momentum update

  def train(self, X, y, X_val, y_val,
            model, loss_function,
            reg=0.0,
            learning_rate=1e-2, momentum=0, learning_rate_decay=0.95,
            update=‘momentum‘, sample_batches=True,
            num_epochs=30, batch_size=100, acc_frequency=None,
            verbose=False):
    """
    Optimize the parameters of a model to minimize a loss function. We use
    training data X and y to compute the loss and gradients, and periodically
    check the accuracy on the validation set.

    Inputs:
    - X: Array of training data; each X[i] is a training sample.
    - y: Vector of training labels; y[i] gives the label for X[i].
    - X_val: Array of validation data
    - y_val: Vector of validation labels
    - model: Dictionary that maps parameter names to parameter values. Each
      parameter value is a numpy array.
    - loss_function: A function that can be called in the following ways:
      scores = loss_function(X, model, reg=reg)
      loss, grads = loss_function(X, model, y, reg=reg)
    - reg: Regularization strength. This will be passed to the loss function.
    - learning_rate: Initial learning rate to use.
    - momentum: Parameter to use for momentum updates.
    - learning_rate_decay: The learning rate is multiplied by this after each
      epoch.
    - update: The update rule to use. One of ‘sgd‘, ‘momentum‘, or ‘rmsprop‘.
    - sample_batches: If True, use a minibatch of data for each parameter update
      (stochastic gradient descent); if False, use the entire training set for
      each parameter update (gradient descent).
    - num_epochs: The number of epochs to take over the training data.
    - batch_size: The number of training samples to use at each iteration.
    - acc_frequency: If set to an integer, we compute the training and
      validation set error after every acc_frequency iterations.
    - verbose: If True, print status after each epoch.

    Returns a tuple of:
    - best_model: The model that got the highest validation accuracy during
      training.
    - loss_history: List containing the value of the loss function at each
      iteration.
    - train_acc_history: List storing the training set accuracy at each epoch.
    - val_acc_history: List storing the validation set accuracy at each epoch.
    """

    N = X.shape[0]

    if sample_batches:
      iterations_per_epoch = N / batch_size # using SGD
    else:
      iterations_per_epoch = 1 # using GD
    num_iters = num_epochs * iterations_per_epoch
    epoch = 0
    best_val_acc = 0.0
    best_model = {}
    loss_history = []
    train_acc_history = []
    val_acc_history = []
    for it in xrange(num_iters):
      if it % 10 == 0:  print ‘starting iteration ‘, it

      # get batch of data
      if sample_batches:
        batch_mask = np.random.choice(N, batch_size)
        X_batch = X[batch_mask]
        y_batch = y[batch_mask]
      else:
        # no SGD used, full gradient descent
        X_batch = X
        y_batch = y

      # evaluate cost and gradient
      cost, grads = loss_function(X_batch, model, y_batch, reg)
      loss_history.append(cost)

      # perform a parameter update
      for p in model:
        # compute the parameter step
        if update == ‘sgd‘:
          dx = -learning_rate * grads[p]
        elif update == ‘momentum‘:
          if not p in self.step_cache:
            self.step_cache[p] = np.zeros(grads[p].shape)
          #dx = np.zeros_like(grads[p]) # you can remove this after
          #####################################################################
          # TODO: implement the momentum update formula and store the step    #
          # update into variable dx. You should use the variable              #
          # step_cache[p] and the momentum strength is stored in momentum.    #
          # Don‘t forget to also update the step_cache[p].                    #
          #####################################################################
          dx = momentum * self.step_cache[p] - learning_rate*grads[p]
          self.step_cache[p] = dx
          #####################################################################
          #                      END OF YOUR CODE                             #
          #####################################################################
        elif update == ‘rmsprop‘:
          decay_rate = 0.99 # you could also make this an option
          if not p in self.step_cache:
            self.step_cache[p] = np.zeros(grads[p].shape)
          #dx = np.zeros_like(grads[p]) # you can remove this after
          #####################################################################
          # TODO: implement the RMSProp update and store the parameter update #
          # dx. Don‘t forget to also update step_cache[p]. Use smoothing 1e-8 #
          #####################################################################
          self.step_cache[p] = decay_rate * self.step_cache[p] + (1 - decay_rate)*grads[p]*grads[p]
          rms = np.sqrt(self.step_cache[p] + 1e-8)
          dx = - learning_rate*grads[p]/rms
          #####################################################################
          #                      END OF YOUR CODE                             #
          #####################################################################
        else:
          raise ValueError(‘Unrecognized update type "%s"‘ % update)

        # update the parameters
        model[p] += dx

      # every epoch perform an evaluation on the validation set
      first_it = (it == 0)
      epoch_end = (it + 1) % iterations_per_epoch == 0
      acc_check = (acc_frequency is not None and it % acc_frequency == 0)
      if first_it or epoch_end or acc_check:
        if it > 0 and epoch_end:
          # decay the learning rate
          learning_rate *= learning_rate_decay
          epoch += 1

        # evaluate train accuracy
        if N > 1000:
          train_mask = np.random.choice(N, 1000)
          X_train_subset = X[train_mask]
          y_train_subset = y[train_mask]
        else:
          X_train_subset = X
          y_train_subset = y
        scores_train = loss_function(X_train_subset, model)
        y_pred_train = np.argmax(scores_train, axis=1)
        train_acc = np.mean(y_pred_train == y_train_subset)
        train_acc_history.append(train_acc)

        # evaluate val accuracy
        scores_val = loss_function(X_val, model)
        y_pred_val = np.argmax(scores_val, axis=1)
        val_acc = np.mean(y_pred_val ==  y_val)
        val_acc_history.append(val_acc)

        # keep track of the best model based on validation accuracy
        if val_acc > best_val_acc:
          # make a copy of the model
          best_val_acc = val_acc
          best_model = {}
          for p in model:
            best_model[p] = model[p].copy()

        # print progress if needed
        if verbose:
          print (‘Finished epoch %d / %d: cost %f, train: %f, val %f, lr %e‘
                 % (epoch, num_epochs, cost, train_acc, val_acc, learning_rate))

    if verbose:
      print ‘finished optimization. best validation accuracy: %f‘ % (best_val_acc, )
    # return the best model and the training history statistics
    return best_model, loss_history, train_acc_history, val_acc_history

这文件里面,其实主要就一个函数:

def train(self, X, y, X_val, y_val,
            model, loss_function,
            reg=0.0,
            learning_rate=1e-2, momentum=0, learning_rate_decay=0.95,
            update=‘momentum‘, sample_batches=True,
            num_epochs=30, batch_size=100, acc_frequency=None,
            verbose=False):

这里的train的输入参数里面除了数据与参数以外,最重要的就是模型model与损失函数loss_function。由于不同的网络结构会有五花八门的损失函数及其梯度形式,因此剥离出来,这样分类器的训练只需要简单的进行梯度迭代即可。这里需要实现两种方式:momentum与rmsprop

# perform a parameter update
      for p in model:
        # compute the parameter step
        if update == ‘sgd‘:
          dx = -learning_rate * grads[p]
        elif update == ‘momentum‘:
          if not p in self.step_cache:
            self.step_cache[p] = np.zeros(grads[p].shape)
          #dx = np.zeros_like(grads[p]) # you can remove this after
          #####################################################################
          # TODO: implement the momentum update formula and store the step    #
          # update into variable dx. You should use the variable              #
          # step_cache[p] and the momentum strength is stored in momentum.    #
          # Don‘t forget to also update the step_cache[p].                    #
          #####################################################################
          dx = momentum * self.step_cache[p] - learning_rate*grads[p]
          self.step_cache[p] = dx
          #####################################################################
          #                      END OF YOUR CODE                             #
          #####################################################################
        elif update == ‘rmsprop‘:
          decay_rate = 0.99 # you could also make this an option
          if not p in self.step_cache:
            self.step_cache[p] = np.zeros(grads[p].shape)
          #dx = np.zeros_like(grads[p]) # you can remove this after
          #####################################################################
          # TODO: implement the RMSProp update and store the parameter update #
          # dx. Don‘t forget to also update step_cache[p]. Use smoothing 1e-8 #
          #####################################################################
          self.step_cache[p] = decay_rate * self.step_cache[p] + (1 - decay_rate)*grads[p]*grads[p]
          rms = np.sqrt(self.step_cache[p] + 1e-8)
          dx = - learning_rate*grads[p]/rms
          #####################################################################
          #                      END OF YOUR CODE                             #
          #####################################################################
        else:
          raise ValueError(‘Unrecognized update type "%s"‘ % update)

关于学习率,参见Hinton的一页slides:

momentum方法中,不是沿着最速下降方向走,而是会在上一次的方向上结合梯度进行调整,它的特点是 consistent gradient



Q2: Modular Neural Network

在layers.py文件中,将要完成卷积神经网络的每一层的forward pass与backward pass,包括:affine layer,relu layer,convolution layer, max pool layer。

affine layer:

def affine_forward(x, w, b):
  """
  Computes the forward pass for an affine (fully-connected) layer.

  The input x has shape (N, d_1, ..., d_k) where x[i] is the ith input.
  We multiply this against a weight matrix of shape (D, M) where
  D = \prod_i d_i

  Inputs:
  x - Input data, of shape (N, d_1, ..., d_k)
  w - Weights, of shape (D, M)
  b - Biases, of shape (M,)

  Returns a tuple of:
  - out: output, of shape (N, M)
  - cache: (x, w, b)
  """
  out = None
  #############################################################################
  # TODO: Implement the affine forward pass. Store the result in out. You     #
  # will need to reshape the input into rows.                                 #
  #############################################################################
  N = x.shape[0]
  out = np.dot(x.reshape(N, -1), w) + b
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = (x, w, b)
  return out, cache

def affine_backward(dout, cache):
  """
  Computes the backward pass for an affine layer.

  Inputs:
  - dout: Upstream derivative, of shape (N, M)
  - cache: Tuple of:
    - x: Input data, of shape (N, d_1, ... d_k)
    - w: Weights, of shape (D, M)

  Returns a tuple of:
  - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
  - dw: Gradient with respect to w, of shape (D, M)
  - db: Gradient with respect to b, of shape (M,)
  """
  x, w, b = cache
  dx, dw, db = None, None, None
  #############################################################################
  # TODO: Implement the affine backward pass.                                 #
  #############################################################################
  dim = x.shape
  dx = np.dot(dout, w.transpose()).reshape(dim)
  dw = np.dot((x.reshape(dim[0], -1)).transpose(), dout)
  db = np.sum(dout, axis=0)
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx, dw, db

relu layer:

def relu_forward(x):
  """
  Computes the forward pass for a layer of rectified linear units (ReLUs).

  Input:
  - x: Inputs, of any shape

  Returns a tuple of:
  - out: Output, of the same shape as x
  - cache: x
  """
  out = None
  #############################################################################
  # TODO: Implement the ReLU forward pass.                                    #
  #############################################################################
  out = np.maximum(0, x)
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = x
  return out, cache

def relu_backward(dout, cache):
  """
  Computes the backward pass for a layer of rectified linear units (ReLUs).

  Input:
  - dout: Upstream derivatives, of any shape
  - cache: Input x, of same shape as dout

  Returns:
  - dx: Gradient with respect to x
  """
  dx, x = None, cache
  #############################################################################
  # TODO: Implement the ReLU backward pass.                                   #
  #############################################################################
  dx = dout
  dx[cache<0] = 0
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx

convolution layer:

  • 将卷积写为线性形式,则要将输入 x (对应的是x[i,j,:,:])进行块提取,形成一个矩阵,大小为(out_H*out_W) * (HH * WW),每一行为一个块,大小为滤波器尺寸HH*WW。讲这么个过程记为 $ \phi (\textbf{x})$
  • 卷积:$\textrm{vec} \textbf{y} = \textrm{vec}( \phi(\textbf{x})F)$, 其中F是滤波器的vec化(这里F写成矩阵,是因为其还可以是包含K个滤波器)
  • 卷积的导数:$$ \frac{dz}{dF} = \phi(\textbf{x})^T \frac{dz}{dY} $$$$\frac{dz}{dX} = \phi^*(\frac{dz}{dY}F^T)$$

注意到,$Y \in \mathcal{R}^{(out_H \cdot out_W) \times K}$, $ F \in \mathcal{R}^{(HH \cdot WW) \times K}$

def conv_forward_naive(x, w, b, conv_param):
  """
  A naive implementation of the forward pass for a convolutional layer.

  The input consists of N data points, each with C channels, height H and width
  W. We convolve each input with F different filters, where each filter spans
  all C channels and has height HH and width HH.

  Input:
  - x: Input data of shape (N, C, H, W)
  - w: Filter weights of shape (F, C, HH, WW)
  - b: Biases, of shape (F,)
  - conv_param: A dictionary with the following keys:
    - ‘stride‘: The number of pixels between adjacent receptive fields in the
      horizontal and vertical directions.
    - ‘pad‘: The number of pixels that will be used to zero-pad the input.

  Returns a tuple of:
  - out: Output data.
  - cache: (x, w, b, conv_param)
  """
  out = None
  #############################################################################
  # TODO: Implement the convolutional forward pass.                           #
  # Hint: you can use the function np.pad for padding.                        #
  #############################################################################
  N, C, H, W = x.shape
  F, _, HH, WW = w.shape
  stride = conv_param[‘stride‘]
  p = conv_param[‘pad‘]
  x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode=‘constant‘)

  # assign memory for the convolved features
  out_H = (H + p*2 - HH)/stride+1
  out_W = (W + p*2 - WW)/stride+1
  out = np.zeros((N, F, out_H, out_W))

  for img_num in xrange(N):
    for ftr_num in xrange(F):
      conv_img = np.zeros((out_H, out_W))
      for cnl_num in xrange(C):
        img = x_padded[img_num, cnl_num, :, :]
        flt = w[ftr_num, cnl_num, :, :]
        for conv_row in xrange(out_H):
          row_start = conv_row * stride
          row_end = row_start + HH
          for conv_col in xrange(out_W):
            col_start = conv_col * stride
            col_end = col_start + WW
            conv_img[conv_row, conv_col] += np.sum(img[row_start:row_end, col_start:col_end]*flt)
      out[img_num, ftr_num, :, :] = conv_img + b[ftr_num]
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = (x, w, b, conv_param)
  return out, cache

def conv_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a convolutional layer.

  Inputs:
  - dout: Upstream derivatives.
  - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

  Returns a tuple of:
  - dx: Gradient with respect to x
  - dw: Gradient with respect to w
  - db: Gradient with respect to b
  """
  dx, dw, db = None, None, None
  #############################################################################
  # TODO: Implement the convolutional backward pass.                          #
  #############################################################################
  x, w, b, conv_param = cache   # unpack the cache
  N, C, H, W = x.shape
  F, _, HH, WW = w.shape
  _, _, out_H, out_W = dout.shape
  stride = conv_param[‘stride‘]
  p = conv_param[‘pad‘]
  x_padded = np.pad(x, ((0, 0), (0, 0), (p, p), (p, p)), mode=‘constant‘)

  # dw => F*C*HH*WW
  dw = np.zeros(w.shape)
  for ftr_num in xrange(F):
    for cnl_num in xrange(C):
      w_hat = np.zeros((HH, WW))
      for img_num in xrange(N):
        img = x_padded[img_num, cnl_num, :, :]
        delta = dout[img_num, ftr_num, :, :]
        # using delta to element-wisely multiply a sampled input img in size out_H*out_W
        for w_row in xrange(HH):
          for w_col in xrange(WW):
            tmp = img[w_row:w_row + out_H*stride:stride, w_col:w_col + out_W*stride:stride]
            w_hat[w_row, w_col] += np.sum(tmp*delta)
      dw[ftr_num, cnl_num, :, :] += w_hat
  # dx => N*C*H*W
  dx = np.zeros(x.shape)
  for img_num in xrange(N):
    for cnl_num in xrange(C):
      for ftr_num in xrange(F):
        x_hat = np.dot(dout[img_num, ftr_num, :, :].reshape(-1, 1), w[ftr_num, cnl_num, : ,:].reshape(1, -1))
        x_hat = x_hat.reshape(out_H, out_W, HH, WW)
        #print x_hat.shape
        dx_hat = np.zeros((H+2*p, W+2*p))     # temporally store the padded input
        #print dx_hat.shape
        for conv_row in xrange(out_H):
          row_start = conv_row * stride
          row_end = row_start + HH
          for conv_col in xrange(out_W):
            col_start = conv_col * stride
            col_end = col_start + WW
            dx_hat[row_start:row_end, col_start:col_end] += x_hat[conv_row, conv_col, :, :]
        dx[img_num, cnl_num, :, :] += dx_hat[p:-p, p:-p]

  db = np.sum(dout, axis = (0, 2, 3))   # F*1 size
  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx, dw, db

max pool layer:

  • pool层的输入为 x ,尺寸为 N*C*H*W,经过一个大小为pool_height*pool_width的block取最大值,并且步长为stride,最终的输出为 out, 尺寸为 N*C*((H-pool_height)/stride + 1)*((W-pool_width)/stride + 1)。
  • pool层计算,对某一个输入的某一个channel而言(x[i,j,:,:]),可以写为:$$\textrm{vec}(\textbf{y}) = S(x)\textrm{vec}(\textbf{x}),S(x) \in {0,1}^{H_1W_1 \times HW}$$
  • pool层相应地导数形式则为:$$\frac{dz}{d \textrm{vec} \textbf{x}} = S(x)^T \frac{dz}{d \textrm{vec} \textbf{y}}$$

当然计算的时候,只需要针对S(x)的值,将dout累计回相应的input位置即可。

def max_pool_forward_naive(x, pool_param):
  """
  A naive implementation of the forward pass for a max pooling layer.

  Inputs:
  - x: Input data, of shape (N, C, H, W)
  - pool_param: dictionary with the following keys:
    - ‘pool_height‘: The height of each pooling region
    - ‘pool_width‘: The width of each pooling region
    - ‘stride‘: The distance between adjacent pooling regions

  Returns a tuple of:
  - out: Output data
  - cache: (x, pool_param)
  """
  out = None
  #############################################################################
  # TODO: Implement the max pooling forward pass                              #
  #############################################################################
  N, C, H, W = x.shape
  pool_height = pool_param[‘pool_height‘]
  pool_width = pool_param[‘pool_width‘]
  stride = pool_param[‘stride‘]

  # assign memory for the max_pooling features
  out_H = (H-pool_height)/stride + 1
  out_W = (W-pool_width)/stride + 1
  out = np.zeros((N, C, out_H, out_W))

  for img_num in xrange(N):
    for cnl_num in xrange(C):
      for pool_row in xrange(out_H):
        row_start = pool_row * stride
        row_end = row_start + pool_height

        for pool_col in xrange(out_W):
          col_start = pool_col * stride
          col_end = col_start + pool_width

          patch = x[img_num, cnl_num, row_start:row_end, col_start:col_end]
          out[img_num, cnl_num, pool_row, pool_col] = patch.max()

  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  cache = (x, pool_param)
  return out, cache

def max_pool_backward_naive(dout, cache):
  """
  A naive implementation of the backward pass for a max pooling layer.

  Inputs:
  - dout: Upstream derivatives
  - cache: A tuple of (x, pool_param) as in the forward pass.

  Returns:
  - dx: Gradient with respect to x
  """
  dx = None
  #############################################################################
  # TODO: Implement the max pooling backward pass                             #
  #############################################################################
  x, pool_param = cache
  N, C, H, W = x.shape
  pool_height = pool_param[‘pool_height‘]
  pool_width = pool_param[‘pool_width‘]
  stride = pool_param[‘stride‘]
  _, _, out_H, out_W = dout.shape

  dx = np.zeros_like(x)
  for img_num in xrange(N):
    for cnl_num in xrange(C):
      # processing for each element in dout
      for pool_row in xrange(out_H):
        row_start = pool_row * stride
        row_end = row_start + pool_height
        for pool_col in xrange(out_W):
          col_start = pool_col * stride
          col_end = col_start + pool_width

          patch = x[img_num, cnl_num, row_start:row_end, col_start:col_end]
          ind = np.unravel_index(patch.argmax(), patch.shape) # ind is the index of the max value in patch
          dx[img_num, cnl_num, row_start+ind[0], col_start+ind[1]] += dout[img_num, cnl_num, pool_row, pool_col]

  #############################################################################
  #                             END OF YOUR CODE                              #
  #############################################################################
  return dx


Q3: ConvNet on CIFAR-10

第三部分,就是利用第二部分的模块构建两层CNN,并进行实验。代码:

import numpy as np

from cs231n.layers import *
from cs231n.fast_layers import *
from cs231n.layer_utils import *

def two_layer_convnet(X, model, y=None, reg=0.0):
  """
  Compute the loss and gradient for a simple two-layer ConvNet. The architecture
  is conv-relu-pool-affine-softmax, where the conv layer uses stride-1 "same"
  convolutions to preserve the input size; the pool layer uses non-overlapping
  2x2 pooling regions. We use L2 regularization on both the convolutional layer
  weights and the affine layer weights.

  Inputs:
  - X: Input data, of shape (N, C, H, W)
  - model: Dictionary mapping parameter names to parameters. A two-layer Convnet
    expects the model to have the following parameters:
    - W1, b1: Weights and biases for the convolutional layer
    - W2, b2: Weights and biases for the affine layer
  - y: Vector of labels of shape (N,). y[i] gives the label for the point X[i].
  - reg: Regularization strength.

  Returns:
  If y is None, then returns:
  - scores: Matrix of scores, where scores[i, c] is the classification score for
    the ith input and class c.

  If y is not None, then returns a tuple of:
  - loss: Scalar value giving the loss.
  - grads: Dictionary with the same keys as model, mapping parameter names to
    their gradients.
  """

  # Unpack weights
  W1, b1, W2, b2 = model[‘W1‘], model[‘b1‘], model[‘W2‘], model[‘b2‘]
  N, C, H, W = X.shape

  # We assume that the convolution is "same", so that the data has the same
  # height and width after performing the convolution. We can then use the
  # size of the filter to figure out the padding.
  conv_filter_height, conv_filter_width = W1.shape[2:]
  assert conv_filter_height == conv_filter_width, ‘Conv filter must be square‘
  assert conv_filter_height % 2 == 1, ‘Conv filter height must be odd‘
  assert conv_filter_width % 2 == 1, ‘Conv filter width must be odd‘
  conv_param = {‘stride‘: 1, ‘pad‘: (conv_filter_height - 1) / 2}
  pool_param = {‘pool_height‘: 2, ‘pool_width‘: 2, ‘stride‘: 2}

  # Compute the forward pass
  a1, cache1 = conv_relu_pool_forward(X, W1, b1, conv_param, pool_param)
  scores, cache2 = affine_forward(a1, W2, b2)

  if y is None:
    return scores

  # Compute the backward pass
  data_loss, dscores = softmax_loss(scores, y)

  # Compute the gradients using a backward pass
  da1, dW2, db2 = affine_backward(dscores, cache2)
  dX,  dW1, db1 = conv_relu_pool_backward(da1, cache1)

  # Add regularization
  dW1 += reg * W1
  dW2 += reg * W2
  reg_loss = 0.5 * reg * sum(np.sum(W * W) for W in [W1, W2])

  loss = data_loss + reg_loss
  grads = {‘W1‘: dW1, ‘b1‘: db1, ‘W2‘: dW2, ‘b2‘: db2}

  return loss, grads

def init_two_layer_convnet(weight_scale=1e-3, bias_scale=0, input_shape=(3, 32, 32),
                           num_classes=10, num_filters=32, filter_size=5):
  """
  Initialize the weights for a two-layer ConvNet.

  Inputs:
  - weight_scale: Scale at which weights are initialized. Default 1e-3.
  - bias_scale: Scale at which biases are initialized. Default is 0.
  - input_shape: Tuple giving the input shape to the network; default is
    (3, 32, 32) for CIFAR-10.
  - num_classes: The number of classes for this network. Default is 10
    (for CIFAR-10)
  - num_filters: The number of filters to use in the convolutional layer.
  - filter_size: The width and height for convolutional filters. We assume that
    all convolutions are "same", so we pick padding to ensure that data has the
    same height and width after convolution. This means that the filter size
    must be odd.

  Returns:
  A dictionary mapping parameter names to numpy arrays containing:
    - W1, b1: Weights and biases for the convolutional layer
    - W2, b2: Weights and biases for the fully-connected layer.
  """
  C, H, W = input_shape
  assert filter_size % 2 == 1, ‘Filter size must be odd; got %d‘ % filter_size

  model = {}
  model[‘W1‘] = weight_scale * np.random.randn(num_filters, C, filter_size, filter_size)
  model[‘b1‘] = bias_scale * np.random.randn(num_filters)
  model[‘W2‘] = weight_scale * np.random.randn(num_filters * H * W / 4, num_classes)
  model[‘b2‘] = bias_scale * np.random.randn(num_classes)
  return model

pass

时间: 2024-11-05 14:45:24

CNN简单构建的相关文章

.net core下简单构建高可用服务集群

原文:.net core下简单构建高可用服务集群 一说到集群服务相信对普通开发者来说肯定想到很复杂的事情,如zeekeeper ,反向代理服务网关等一系列的搭建和配置等等:总得来说需要有一定经验和规划的团队才能应用起来.在这文章里你能看到在.net core下的另一种集群构建方案,通过Beetlex即可非常便捷地构建高可用的集群服务. 简述 Beetlex的Webapi集群应用并没有依赖于第三方服务,而是由Beetlex自身完成:它主要是通过Client和策略监控服务相结合的方式来实现集群化的服

Deep learning:三十八(Stacked CNN简单介绍)

http://www.cnblogs.com/tornadomeet/archive/2013/05/05/3061457.html 前言: 本节主要是来简单介绍下stacked CNN(深度卷积网络),起源于本人在构建SAE网络时的一点困惑:见Deep learning:三十六(关于构建深度卷积SAE网络的一点困惑).因为有时候针对大图片进行recognition时,需要用到无监督学习的方法去pre-training(预训练)stacked CNN的每层网络,然后用BP算法对整个网络进行fin

只需2分钟,简单构建velocity web项目

Velocity是一个基于java的模板引擎(template engine).它允许任何人仅仅简单的使用模板语言(template language)来引用由java代码定义的对象 velocity的语法非常简单.这里不多介绍. 我们平时的web项目,通常的开发流程是前端写好静态页面.后端将静态页面改成jsp,在相应的需要替换数据的地方,使用 jstl.EL表达式等接收action模块传递过来的数据.一般使用action来处处理参数,调用service层来处理业务逻辑,service层调用da

gradle学习系列之eclipse中简单构建android项目

看不到图片可以去访问这个网址看看:http://pan.baidu.com/s/1o6FrFkA 一.什么是Gradle 官网www.gradle.org上介绍Gradle是升级版(evolved)的自动化构建工具.它可以自动构建,测试,发布,部署,同时使更多的软件包或其他类型诸如生成静态网站,文档等项目自动化. Gradle 将Ant的功能和伸缩性与Maven的依赖管理及约定结合成一种更加高效的方式去完成构建.它采用了Groovy 特定领域语言和诸多创新方法,提供了一种声明式的方式用合理的默认

Gulp 项目简单构建,自动刷新页面

/** * Created by 1900 on 12/18/2015. */ var plugins={ fs:require("fs"), gulp:require("gulp"), uglify:require("gulp-uglify"), connect :require('gulp-connect'), notify:require("gulp-notify") } var gulp = plugins.gulp;

什么是EF, 和 Entity Framework Demo简单构建一个良好的发展环境

 Entity Framwork(实体框架.缩写EF)这是ORM(Object Relational Mapping.对象映射关系)一个解决方案. EF的表映射为实体.并封装了操作方法.方便开发者直接操作数据库. EF有三种开发模式.各自是: Database First(数据库先行): 将已存在的数据库中的表映射为实体: Code First(代码先行):先编写代码,再通过EF反向生成数据库的表. Model First(模型先行):通过一个可视化模型,分别生成数据库和代码. 这里演示样例

使用CNN和LSTM构建图像字幕标题生成器

感谢参考原文-http://bjbsair.com/2020-04-01/tech-info/18508.html 当您看到一个图像,您的大脑可以轻松分辨出图像的含义,但是计算机可以分辨出图像的含义吗?计算机视觉研究人员为此做了很多工作,他们认为直到现在都不可能!随着深度学习技术的进步,海量数据集的可用性和计算机功能的增强,我们可以构建可以为图像生成字幕的模型. 这就是我们将在这个项目中实现的目标,在该项目中,我们将一起使用卷积神经网络和一种循环神经网络(LSTM)的深度学习技术. 什么是图像字

DeepLearning tutorial(5)CNN卷积神经网络应用于人脸识别(详细流程+代码实现)

DeepLearning tutorial(5)CNN卷积神经网络应用于人脸识别(详细流程+代码实现) @author:wepon @blog:http://blog.csdn.net/u012162613/article/details/43277187 本文代码下载地址:我的github 本文主要讲解将CNN应用于人脸识别的流程,程序基于python+numpy+theano+PIL开发,采用类似LeNet5的CNN模型,应用于olivettifaces人脸数据库,实现人脸识别的功能,模型的

Light Weight CNN模型的分析与总结

本文选择了4个light weight CNN模型,并对它们的设计思路和性能进行了分析与总结,目的在于为在完成图像识别任务时模型的选择与设计方面提供相关的参考资料. 1 简介 自AlexNet[1]在LSVRC-2010 ImageNet[22]图像分类任务上取得突破性进展之后,构建更深更大的convolutional neural networks(CNN)几乎成了一种主要的趋势[2-9].通常,获得state-of-the-art准确率的模型都有成百上千的网路层以及成千上万的中间特征通道,这