机器学习之TensorFlow介绍

TensorFlow的概念很简单:使用python定义一个计算图,然后TensorFlow根据计算图生成高性能的c++代码。

如上图所示,使用图的方式实现了函数$f(x,y)=x^2y+y+2$的计算,在图中可以定义操作符和输入输出变量,基于此特性,TensorFlow能够实现分布式的计算,可以实现大量特征和实例的训练任务。

上图,显示了多个GPU计算的过程,TensorFlow有一下几个优点:

  • 支持多平台,Windows, Linux,macOS,iOS,Android
  • 提供了简单的python api
  • 有大量的其他的基于TensorFlow的高一级的库
  • 可扩展性
  • 高性能的c++实现
  • 提供了很多方便计算代价函数的节点,带有自动求导功能
  • 提供了强大的可视化工具TensorBoard
  • 提供了云计算能力
  • 开发社区比较活跃

下边是,目前比较常见的深度学习开源库:

Creating Your First Graph and Running It in a Session

我们用TensorFlow来写代码实现$f(x,y)=x^2y+y+2$

import tensorflow as tf

reset_graph()

x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x * x * y + y + 2

值得注意的是,此时TensorFlow并没有真正的创建这些变量,只创建了这样一幅计算图,要想执行这个计算,需要调用下边的代码,TensorFlow会自动把计算调度到cup或gpu。

sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result = sess.run(f)
print(result)
sess.close()

也可以使用python中的with关键词,简化代码:

with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()

"""
42
"""

在with代码块内,session被设置为了默认的,当调用x.initializer.run()就相当于调用了tf.get_default_session().run(x.initial izer),f.eval()就相当于调用了tf.get_default_session().run(f),这么做的目的是让程序更加易读。

有时候,整个图的参数可能会有多个,TensorFlow提供了初始化这些变量的快捷方法:

init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    result1 = f.eval()

TensorFlow程序一般分为两个步骤:首先创建计算图,其次运行。

Managing Graphs

创建的任何节点默认的都会被添加到默认的图中,我们用代码验证下:

reset_graph()

x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

'''
True
'''

在一般情况下,这基本上没问题,但是如果需要处理多个图,我们更希望往不同的图中添加变量,要实现这个想法,需要在创建一个新的图后,使用with把它暂时赋值为默认的图。

graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
x2.graph is graph

'''
True
'''

x2.graph is tf.get_default_graph()

'''
False
'''

Lifecycle of a Node Value

当evaluate一个节点的时候,TensorFlow自动判断改节点的依赖节点,并先执行依赖节点。举个例子:

w = tf.constant(3)
x=w+2
y=x+5
z=x*3

with tf.Session() as sess:
  print(y.eval()) # 10
  print(z.eval()) # 15

当执行y.eval()这行代码的时候,TensorFlow自动判断出它依赖x,x又依赖w,因此它首先计算w,然后再计算x,再计算y,计算z同上,但是默认的,它会计算x和w两次。

All node values are dropped between graph runs, except variable values, which are maintained by the session across graph runs (queues and readers also maintain some state, as we will see in Chapter 12). A variable starts its life when its initializer is run, and it ends when the session is closed.

对于上边的代码,如果想x和w只执行一次,可以这么写:

with tf.Session() as sess:
    y_val, z_val = sess.run([y, z])
    print(y_val) # 10
        print(z_val) # 15

In single-process TensorFlow, multiple sessions do not share any state, even if they reuse the same graph (each session would have its own copy of every variable). In distributed TensorFlow (see Chap‐ ter 12), variable state is stored on the servers, not in the sessions, so multiple sessions can share the same variables.

Linear Regression with TensorFlow

TensorFlow operations简称为ops,能够接受任何数量的输入和任何数量的输出,比如加法和乘法操作符,他们可以接受2个输入,并产生一个输出,Constants和variables不需要输人,它输出一个值。如果输入和输出是多维数组,则成为“tensor(张量)”。

下边的代码使用TensorFlow实现了线性回归:

import numpy as np
from sklearn.datasets import fetch_california_housing

reset_graph()

housing = fetch_california_housing()
m, n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

X = tf.constant(housing_data_plus_bias, dtype=tf.float32, name='X')
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name='y')

XT = tf.transpose(X)

theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    theta_value = theta.eval()

"""
array([[-3.68962631e+01],
       [ 4.36777472e-01],
       [ 9.44449380e-03],
       [-1.07348785e-01],
       [ 6.44962370e-01],
       [-3.94082872e-06],
       [-3.78797273e-03],
       [-4.20847952e-01],
       [-4.34020907e-01]], dtype=float32)
"""

我们用最原始的数学表达式编码如下:

X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

print(theta_numpy)

"""
[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]
"""

使用Scikit-Learn:

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))

print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

"""
[[-3.69419202e+01]
 [ 4.36693293e-01]
 [ 9.43577803e-03]
 [-1.07322041e-01]
 [ 6.45065694e-01]
 [-3.97638942e-06]
 [-3.78654265e-03]
 [-4.21314378e-01]
 [-4.34513755e-01]]
"""

可以看出,结果都是一样的,TensorFlow的主要优点是它会自动把数据的计算放到GPU卡。

Implementing Gradient Descent

实现上一小节中的线性回归,也可以使用梯度下降算法,在这里,我们使用Batch Gradient Descent。在使用这个算法之前,一定要先对数据做正则化。

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

Manually Computing the Gradients

我们首先采用手动计算梯度的方式编码,由于原理比较简单,这里只贴出代码:

reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)

    best_theta = theta.eval()

"""

Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.7145004
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.5488112
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.5396291
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
"""

best_theta

"""
array([[ 2.0685523 ],
       [ 0.8874027 ],
       [ 0.14401656],
       [-0.34770882],
       [ 0.36178368],
       [ 0.00393811],
       [-0.04269556],
       [-0.6614529 ],
       [-0.6375279 ]], dtype=float32)
"""

可以明显的看出来,随着迭代的进行,MSE逐渐收敛。

Using autodiff

使用TensorFlow的tf.gradients()可以自动求导,代码变得更加简洁:

reset_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")

gradients = tf.gradients(mse, [theta])[0]

training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)

    best_theta = theta.eval()

print("Best theta:")
print(best_theta)

"""
Epoch 0 MSE = 9.161542
Epoch 100 MSE = 0.71450037
Epoch 200 MSE = 0.56670487
Epoch 300 MSE = 0.5555718
Epoch 400 MSE = 0.54881126
Epoch 500 MSE = 0.5436363
Epoch 600 MSE = 0.53962916
Epoch 700 MSE = 0.5365092
Epoch 800 MSE = 0.53406775
Epoch 900 MSE = 0.5321473
Best theta:
[[ 2.0685523 ]
 [ 0.8874027 ]
 [ 0.14401656]
 [-0.3477088 ]
 [ 0.36178365]
 [ 0.00393811]
 [-0.04269556]
 [-0.66145283]
 [-0.6375278 ]]
"""

TensorFlow采用reverse-mode autodiff,这个模式比较使用于有大量输入和少量输出的情况。下图显示了其他模式:

Using an Optimizer

TensorFlow还提供了更方便的optimizer功能,Gradient Descent optimizer属于optimizer的一个特例,只需要改动很少的代码就能实现。

只需要把上边代码中的

gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)

替换成下边的代码就可以了

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

如果我们想要使用其他的optimizer,只需要修改一行代码就可以。

optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
                                       momentum=0.9)

Feeding Data to the Training Algorithm

要想实现Mini-batch Gradient Descent,每次需要重新设置X和y,TensorFlow提供了placeholder(),可以理解为,它只起到了一个占位的作用,在真正调用的地方需要通过feed_dict这个参数,传递给node。

举个简单的例子:

reset_graph()

A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print(B_val_1)
"""
[[6. 7. 8.]]
"""

print(B_val_2)
"""
[[ 9. 10. 11.]
 [12. 13. 14.]]
"""

实现Mini-batch Gradient Descent的代码也比较简单,就是不断的获取数据后,再训练:

n_epochs = 1000
learning_rate = 0.01

reset_graph()

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

n_epochs = 10

batch_size = 100
n_batches = int(np.ceil(m / batch_size))

def fetch_batch(epoch, batch_index, batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch, y_batch

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

"""
array([[ 2.0703337 ],
       [ 0.8637145 ],
       [ 0.12255152],
       [-0.31211877],
       [ 0.38510376],
       [ 0.00434168],
       [-0.0123295 ],
       [-0.83376896],
       [-0.8030471 ]], dtype=float32)
"""

Saving and Restoring Models

有时候,我们需要把训练好的模型保存到硬盘,或者当训练中断后,能够重新恢复训练,这些情况都需要能够提供保存和恢复的功能。TensorFlow使用Saver来实现。

  • 创建Saver
  • 调用saver.save()保存
  • 调用saver。restore()恢复

保存的代码:

saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())                                # not shown
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)

    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

恢复的代码:

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    best_theta_restored = theta.eval() # not shown in the book

By default the saver also saves the graph structure itself in a second file with the extension .meta. You can use the function tf.train.import_meta_graph() to restore the graph structure. This function loads the graph into the default graph and returns a Saver that can then be used to restore the graph state (i.e., the variable values):

reset_graph()
# notice that we start with an empty graph.

saver = tf.train.import_meta_graph("/tmp/my_model_final.ckpt.meta")  # this loads the graph structure
theta = tf.get_default_graph().get_tensor_by_name("theta:0") # not shown in the book

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")  # this restores the graph's state
    best_theta_restored = theta.eval() # not shown in the book

Visualizing the Graph and Training Curves Using TensorBoard

TensorBoard是一个强大的基于web的工具,它的原理是:根据保存在本地的日志数据进行绘图,可以显示图的结果和训练效果。

实现TensorBoard需要4步:

  1. 定义需要保存日志的文件夹

    from datetime import datetime
    
    now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
    root_logdir = "tf_logs"
    logdir = "{}/run-{}/".format(root_logdir, now)
    
  2. 在construction phase之后,写下边的代码
    mse_summary = tf.summary.scalar('MSE', mse)
    file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())
    
  3. 在需要写入的地方写入数据
    with tf.Session() as sess:                                                        # not shown in the book
        sess.run(init)                                                                # not shown
    
        for epoch in range(n_epochs):                                                 # not shown
            for batch_index in range(n_batches):
                X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
                if batch_index % 10 == 0:
                    summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                    step = epoch * n_batches + batch_index
                    file_writer.add_summary(summary_str, step)
                sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
    
        best_theta = theta.eval()
    
  4. 运行TensorBoard,键入下边命令
    python3 -m tensorboard.main --logdir=tf_logs
    

运行后的效果图:

Name Scopes

当处理复杂模型的时候,比如神经网络,图中会有大量的node,就会看起来很杂乱,为了解决这个问题,可以使用TensorFlow的name scopes。

with tf.name_scope("loss") as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")

Modularity

Suppose you want to create a graph that adds the output of two rectified linear units(ReLU). A ReLU computes a linear function of the inputs, and outputs the result if it is positive, and 0 otherwise,

如果是两个relu,我们可以用下边的代码实现:

reset_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

z1 = tf.add(tf.matmul(X, w1), b1, name="z1")
z2 = tf.add(tf.matmul(X, w2), b2, name="z2")

relu1 = tf.maximum(z1, 0., name="relu1")
relu2 = tf.maximum(z1, 0., name="relu2")  # Oops, cut&paste error! Did you spot it?

output = tf.add(relu1, relu2, name="output")

这样的代码是很糟糕的,有太多的重复代码,如果现在想扩展更多的relu,该怎么办? TensorFlow提供了add_n()方法,可以把多个值:

reset_graph()

def relu(X):
    w_shape = (int(X.get_shape()[1]), 1)
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    z = tf.add(tf.matmul(X, w), b, name="z")
    return tf.maximum(z, 0., name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

TensorFlow在创建node的时候。会为该node创建一个唯一的name,因此,我们最好在函数中使用name scopes,这样图的结构更加清晰。

def relu(X):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")    # not shown
        b = tf.Variable(0.0, name="bias")                             # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                      # not shown
        return tf.maximum(z, 0., name="max")                          # not shown

Sharing Variables

还是以上边的relu为例,如果要在多个组件之中分享变量,该怎么办?有一下几个可能:

可以在relu函数中,多传递一个参数threshold,这样在调用函数的时候,就可以把该值传递到每个组件中:

reset_graph()

def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = (int(X.get_shape()[1]), 1)                        # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")

上边的方法的缺点是,在遇到需要分享的参数有很多的时候,会不太友好,relu函数就会有很多的参数。当然,为了解决这个问题,可以给函数传递一个字典或对象也能克服这个缺点。

另一种方式是把变量保存到函数对象本身上,作为函数的属性进行传递:

reset_graph()

def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown in the book
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, relu.threshold, name="max")

TensorFlow提供了get_variable()函数来获取变量,它依赖variable_scope(),变量域,

reset_graph()

with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))

上边的代码,创建了一个叫做relu的变量域,因此在该域下的变量threshold的name就是relu/threshold。注意,上边的代码中,如果threshold变量如果在该代码调用之前就已经创建了,该代码会抛出异常。

如果要使用已经创建的变量,需要使用下边的代码:

with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

或者:

with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

Once reuse is set to True, it cannot be set back to False within the block. Moreover, if you define other variable scopes inside this one, they will automatically inherit reuse=True. Lastly, only variables created by get_variable() can be reused this way.

reset_graph()

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold")
        w_shape = int(X.get_shape()[1]), 1                          # not shown
        w = tf.Variable(tf.random_normal(w_shape), name="weights")  # not shown
        b = tf.Variable(0.0, name="bias")                           # not shown
        z = tf.add(tf.matmul(X, w), b, name="z")                    # not shown
        return tf.maximum(z, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

Variables created using get_variable() are always named using the name of their variable_scope as a prefix (e.g., "relu/thres hold"), but for all other nodes (including variables created withtf.Variable()) the variable scope acts like a new name scope. In particular, if a name scope with an identical name was already cre‐ ated, then a suffix is added to make the name unique. For example, all nodes created in the preceding code (except the threshold vari‐ able) have a name prefixed with "relu_1/" to "relu_5/"

Extra material

reset_graph()

with tf.variable_scope("my_scope"):
    x0 = tf.get_variable("x", shape=(), initializer=tf.constant_initializer(0.))
    x1 = tf.Variable(0., name="x")
    x2 = tf.Variable(0., name="x")

with tf.variable_scope("my_scope", reuse=True):
    x3 = tf.get_variable("x")
    x4 = tf.Variable(0., name="x")

with tf.variable_scope("", default_name="", reuse=True):
    x5 = tf.get_variable("my_scope/x")

print("x0:", x0.op.name)
print("x1:", x1.op.name)
print("x2:", x2.op.name)
print("x3:", x3.op.name)
print("x4:", x4.op.name)
print("x5:", x5.op.name)
print(x0 is x3 and x3 is x5)

"""

x0: my_scope/x
x1: my_scope/x_1
x2: my_scope/x_2
x3: my_scope/x
x4: my_scope_1/x
x5: my_scope/x
True
"""

Exercises

  1. What are the main benefits of creating a computation graph rather than directly executing the computations? What are the main drawbacks?

    Main benefits and drawbacks of creating a computation graph rather than directly executing the computations:

    • Main benefits:

      • TensorFlow can automatically compute the gradients for you (using reverse-mode autodiff).
    • TensorFlow can take care of running the operations in parallel in different threads.
    • It makes it easier to run the same model across different devices.
    • It simplifies introspection—for example, to view the model in TensorBoard.
    • Main drawbacks:
      • It makes the learning curve steeper.
    • It makes step-by-step debugging harder.
  2. Is the statement a_val = a.eval(session=sess) equivalent to a_val = sess.run(a)?

    Yes, the statementa_val=a.eval(session=sess)is indeed equivalent toa_val = sess.run(a).

  3. Is the statement a_val, b_val = a.eval(session=sess), b.eval(ses sion=sess) equivalent to a_val, b_val = sess.run([a, b])?

  4. Can you run two graphs in the same session?

  5. If you create a graph g containing a variable w, then start two threads and open a session in each thread, both using the same graph g, will each session have its own copy of the variable w or will it be shared?

  6. When is a variable initialized? When is it destroyed?

  7. What is the difference between a placeholder and a variable?

  8. What happens when you run the graph to evaluate an operation that depends on a placeholder but you don’t feed its value? What happens if the operation does not depend on the placeholder?

    If you run the graph to evaluate an operation that depends on a placeholder but you don’t feed its value, you get an exception. If the operation does not depend on the placeholder, then no exception is raised.

  9. When you run a graph, can you feed the output value of any operation, or just the value of placeholders?
    When you run a graph, you can feed the output value of any operation, not just the value of placeholders. In practice, however, this is rather rare (it can be useful, for example, when you are caching the output of frozen layers;
    
  10. How can you set a variable to any value you want (during the execution phase)?
    You can specify a variable’s initial value when constructing the graph, and it will be initialized later when you run the variable’s initializer during the execution phase. If you want to change that variable’s value to anything you want during the execution phase, then the simplest option is to create an assignment node (dur‐ ing the graph construction phase) using the tf.assign() function, passing the variable and a placeholder as parameters. During the execution phase, you can run the assignment operation and feed the variable’s new value using the place‐ holder.
    
    import tensorflow as tf
    x = tf.Variable(tf.random_uniform(shape=(), minval=0.0, maxval=1.0))
    x_new_val = tf.placeholder(shape=(), dtype=tf.float32)
    x_assign = tf.assign(x, x_new_val)
    with tf.Session():
      x.initializer.run() # random number is sampled *now*
      print(x.eval()) # 0.646157 (some random number)
      x_assign.eval(feed_dict={x_new_val: 5.0})
      print(x.eval()) # 5.0
    
  11. How many times does reverse-mode autodiff need to traverse the graph in order to compute the gradients of the cost function with regards to 10 variables? What about forward-mode autodiff? And symbolic differentiation?
    Reverse-mode autodiff (implemented by TensorFlow) needs to traverse the graph only twice in order to compute the gradients of the cost function with regards to any number of variables. On the other hand, forward-mode autodiff would need to run once for each variable (so 10 times if we want the gradients with regards to 10 different variables). As for symbolic differentiation, it would build a different graph to compute the gradients, so it would not traverse the original graph at all (except when building the new gradients graph). A highly optimized symbolic differentiation system could potentially run the new gradients graph only once to compute the gradients with regards to all variables, but that new graph may be horribly complex and inefficient compared to the original graph.
    
  12. Implement Logistic Regression with Mini-batch Gradient Descent using Tensor‐ Flow. Train it and evaluate it on the moons dataset (introduced in Chapter 5). Try adding all the bells and whistles:
    • Define the graph within a logistic_regression() function that can be reused easily.
    • Save checkpoints using a Saver at regular intervals during training, and save the final model at the end of training.
    • Restore the last checkpoint upon startup if training was interrupted.
    • Define the graph using nice scopes so the graph looks good in TensorBoard.
    • Add summaries to visualize the learning curves in TensorBoard.
    • Try tweaking some hyperparameters such as the learning rate or the mini- batch size and look at the shape of the learning curve.

这里重点讲一下该练习题

首先我们先获取数据:

from sklearn.datasets import make_moons

m = 1000
X_moons, y_moons = make_moons(m, noise=0.1, random_state=42)

看看数据的图形:

plt.plot(X_moons[y_moons == 1, 0], X_moons[y_moons == 1, 1], 'go', label="Positive")
plt.plot(X_moons[y_moons == 0, 0], X_moons[y_moons == 0, 1], 'r^', label="Negative")
plt.legend()
plt.show()

记得要为训练集X加一个偏置项:

X_moons_with_bias = np.c_[np.ones((m, 1)), X_moons]

重置y的维度

y_moons_column_vector = y_moons.reshape(-1, 1)

分隔数据为训练集和测试集

test_ratio = 0.2
test_size = int(m * test_ratio)
X_train = X_moons_with_bias[:-test_size]
X_test = X_moons_with_bias[-test_size:]
y_train = y_moons_column_vector[:-test_size]
y_test = y_moons_column_vector[-test_size:]

写一个生成batch的函数,该函数每次随机使用一定数量的数据,因此数据可能会重复

def random_batch(X_train, y_train, batch_size):
    rnd_indices = np.random.randint(0, len(X_train), batch_size)
    X_batch = X_train[rnd_indices]
    y_batch = y_train[rnd_indices]
    return X_batch, y_batch

生成模型

reset_graph()
n_inputs = 2
X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n_inputs + 1, 1], -1.0, 1.0, seed=42), name="theta")
logits = tf.matmul(X, theta, name="logits")
y_proba = 1 / (1 + tf.exp(-logits))

实际上TensorFlow提供了一个` `tf.sigmoid()函数

y_proba = tf.sigmoid(logits)

代价函数为:$J(\boldsymbol{\theta}) = -\dfrac{1}{m} \sum\limits_{i=1}^{m}{\left[ y^{(i)} \log\left(\hat{p}^{(i)}\right) + (1 - y^{(i)}) \log\left(1 - \hat{p}^{(i)}\right)\right]}$

epsilon = 1e-7  # to avoid an overflow when computing the log
loss = -tf.reduce_mean(y * tf.log(y_proba + epsilon) + (1 - y) * tf.log(1 - y_proba + epsilon))

也可以使用tf.losses.log_loss()

loss = tf.losses.log_loss(y, y_proba)  # uses epsilon = 1e-7 by default
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

init = tf.global_variables_initializer()

n_epochs = 1000
batch_size = 50
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = random_batch(X_train, y_train, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        loss_val = loss.eval({X: X_test, y: y_test})
        if epoch % 100 == 0:
            print("Epoch:", epoch, "\tLoss:", loss_val)

    y_proba_val = y_proba.eval(feed_dict={X: X_test, y: y_test})

输出:

Epoch: 0    Loss: 0.792602
Epoch: 100  Loss: 0.343463
Epoch: 200  Loss: 0.30754
Epoch: 300  Loss: 0.292889
Epoch: 400  Loss: 0.285336
Epoch: 500  Loss: 0.280478
Epoch: 600  Loss: 0.278083
Epoch: 700  Loss: 0.276154
Epoch: 800  Loss: 0.27552
Epoch: 900  Loss: 0.274912

看一下模型的效果如何?

y_pred = (y_proba_val >= 0.5)
from sklearn.metrics import precision_score, recall_score

precision_score(y_test, y_pred)

"""
0.86274509803921573
"""

recall_score(y_test, y_pred)
"""
0.88888888888888884
"""

通过画图看看效果:

看一看出来,回归效果并不是很好,为了解决这个问题,我们首先想到,给训练数据增加新的维度,把数据映射到更高维的空间中

X_train_enhanced = np.c_[X_train,
                         np.square(X_train[:, 1]),
                         np.square(X_train[:, 2]),
                         X_train[:, 1] ** 3,
                         X_train[:, 2] ** 3]
X_test_enhanced = np.c_[X_test,
                        np.square(X_test[:, 1]),
                        np.square(X_test[:, 2]),
                        X_test[:, 1] ** 3,
                        X_test[:, 2] ** 3]

为了方便,我们定义一个函数

def logistic_regression(X, y, initializer=None, seed=42, learning_rate=0.01):
    n_inputs_including_bias = int(X.get_shape()[1])
    with tf.name_scope("logistic_regression"):
        with tf.name_scope("model"):
            if initializer is None:
                initializer = tf.random_uniform([n_inputs_including_bias, 1], -1.0, 1.0, seed=seed)
            theta = tf.Variable(initializer, name="theta")
            logits = tf.matmul(X, theta, name="logits")
            y_proba = tf.sigmoid(logits)
        with tf.name_scope("train"):
            loss = tf.losses.log_loss(y, y_proba, scope="loss")
            optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
            training_op = optimizer.minimize(loss)
            loss_summary = tf.summary.scalar('log_loss', loss)
        with tf.name_scope("init"):
            init = tf.global_variables_initializer()
        with tf.name_scope("save"):
            saver = tf.train.Saver()
    return y_proba, loss, training_op, loss_summary, init, saver
from datetime import datetime

def log_dir(prefix=""):
    now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
    root_logdir = "tf_logs"
    if prefix:
        prefix += "-"
    name = prefix + "run-" + now
    return "{}/{}/".format(root_logdir, name)

真正的代码在这里

n_inputs = 2 + 4
logdir = log_dir("logreg")

X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

y_proba, loss, training_op, loss_summary, init, saver = logistic_regression(X, y)

file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

训练

n_epochs = 10001
batch_size = 50
n_batches = int(np.ceil(m / batch_size))

checkpoint_path = "/tmp/my_logreg_model.ckpt"
checkpoint_epoch_path = checkpoint_path + ".epoch"
final_model_path = "./my_logreg_model"

with tf.Session() as sess:
    if os.path.isfile(checkpoint_epoch_path):
        # if the checkpoint file exists, restore the model and load the epoch number
        with open(checkpoint_epoch_path, "rb") as f:
            start_epoch = int(f.read())
        print("Training was interrupted. Continuing at epoch", start_epoch)
        saver.restore(sess, checkpoint_path)
    else:
        start_epoch = 0
        sess.run(init)

    for epoch in range(start_epoch, n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = random_batch(X_train_enhanced, y_train, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        loss_val, summary_str = sess.run([loss, loss_summary], feed_dict={X: X_test_enhanced, y: y_test})
        file_writer.add_summary(summary_str, epoch)
        if epoch % 500 == 0:
            print("Epoch:", epoch, "\tLoss:", loss_val)
            saver.save(sess, checkpoint_path)
            with open(checkpoint_epoch_path, "wb") as f:
                f.write(b"%d" % (epoch + 1))

    saver.save(sess, final_model_path)
    y_proba_val = y_proba.eval(feed_dict={X: X_test_enhanced, y: y_test})
    os.remove(checkpoint_epoch_path)

看一下现在的效果

y_pred = (y_proba_val >= 0.5)
precision_score(y_test, y_pred)
"""
0.97979797979797978
"""

recall_score(y_test, y_pred)
"""
0.97979797979797978
"""

效果图

用tensorboard看一下:

上边代码中的参数,还可以优化,可以使用grid search和randomized search,我们下边演示使用randomized search方法,我们需要优化的参数是batch_size和learning_rate

from scipy.stats import reciprocal

n_search_iterations = 10

for search_iteration in range(n_search_iterations):
    batch_size = np.random.randint(1, 100)
    learning_rate = reciprocal(0.0001, 0.1).rvs(random_state=search_iteration)

    n_inputs = 2 + 4
    logdir = log_dir("logreg")

    print("Iteration", search_iteration)
    print("  logdir:", logdir)
    print("  batch size:", batch_size)
    print("  learning_rate:", learning_rate)
    print("  training: ", end="")

    reset_graph()

    X = tf.placeholder(tf.float32, shape=(None, n_inputs + 1), name="X")
    y = tf.placeholder(tf.float32, shape=(None, 1), name="y")

    y_proba, loss, training_op, loss_summary, init, saver = logistic_regression(
        X, y, learning_rate=learning_rate)

    file_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

    n_epochs = 10001
    n_batches = int(np.ceil(m / batch_size))

    final_model_path = "./my_logreg_model_%d" % search_iteration

    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(n_epochs):
            for batch_index in range(n_batches):
                X_batch, y_batch = random_batch(X_train_enhanced, y_train, batch_size)
                sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
            loss_val, summary_str = sess.run([loss, loss_summary], feed_dict={X: X_test_enhanced, y: y_test})
            file_writer.add_summary(summary_str, epoch)
            if epoch % 500 == 0:
                print(".", end="")

        saver.save(sess, final_model_path)

        print()
        y_proba_val = y_proba.eval(feed_dict={X: X_test_enhanced, y: y_test})
        y_pred = (y_proba_val >= 0.5)

        print("  precision:", precision_score(y_test, y_pred))
        print("  recall:", recall_score(y_test, y_pred))

输出如下

Iteration 0
  logdir: tf_logs/logreg-run-20170606195328/
  batch size: 19
  learning_rate: 0.00443037524522
  training: .....................
  precision: 0.979797979798
  recall: 0.979797979798
Iteration 1
  logdir: tf_logs/logreg-run-20170606195605/
  batch size: 80
  learning_rate: 0.00178264971514
  training: .....................
  precision: 0.969696969697
  recall: 0.969696969697
Iteration 2
  logdir: tf_logs/logreg-run-20170606195646/
  batch size: 73
  learning_rate: 0.00203228544324
  training: .....................
  precision: 0.969696969697
  recall: 0.969696969697
Iteration 3
  logdir: tf_logs/logreg-run-20170606195730/
  batch size: 6
  learning_rate: 0.00449152382514
  training: .....................
  precision: 0.980198019802
  recall: 1.0
Iteration 4
  logdir: tf_logs/logreg-run-20170606200523/
  batch size: 24
  learning_rate: 0.0796323472178
  training: .....................
  precision: 0.980198019802
  recall: 1.0
Iteration 5
  logdir: tf_logs/logreg-run-20170606200726/
  batch size: 75
  learning_rate: 0.000463425058329
  training: .....................
  precision: 0.912621359223
  recall: 0.949494949495
Iteration 6
  logdir: tf_logs/logreg-run-20170606200810/
  batch size: 86
  learning_rate: 0.0477068184194
  training: .....................
  precision: 0.98
  recall: 0.989898989899
Iteration 7
  logdir: tf_logs/logreg-run-20170606200851/
  batch size: 87
  learning_rate: 0.000169404470952
  training: .....................
  precision: 0.888888888889
  recall: 0.808080808081
Iteration 8
  logdir: tf_logs/logreg-run-20170606200932/
  batch size: 61
  learning_rate: 0.0417146119941
  training: .....................
  precision: 0.980198019802
  recall: 1.0
Iteration 9
  logdir: tf_logs/logreg-run-20170606201026/
  batch size: 92
  learning_rate: 0.000107429229684
  training: .....................
  precision: 0.882352941176
  recall: 0.757575757576

很直观的就发现了当前的最优参数,看看tensorboard

可以看出,不同参数,学习曲线是不同的。

原文地址:https://www.cnblogs.com/machao/p/11473794.html

时间: 2024-10-07 21:18:29

机器学习之TensorFlow介绍的相关文章

任何人都能看懂的TensorFlow介绍

本文经机器之心(微信公众号:almosthuman2014)授权转载,禁止二次转载 原文链接:任何人都能看懂的TensorFlow介绍 深度 | 机器学习敲门砖:任何人都能看懂的TensorFlow介绍 2016-08-21 机器之心 选自 kdnuggets 作者:Soon Hin Khor 机器之心编译 参与:Rick.吴攀.李亚洲 本文是日本东京 TensorFlow 聚会联合组织者 Hin Khor 所写的 TensorFlow 系列介绍文章的前两部分,自称给出了关于 TensorFlo

Google发布机器学习平台Tensorflow游乐场~带你玩神经网络(转载)

Google发布机器学习平台Tensorflow游乐场-带你玩神经网络 原文地址:http://f.dataguru.cn/article-9324-1.html> 摘要: 昨天,Google发布了Tensorflow游乐场.Tensorflow是Google今年推出的机器学习开源平台.而有了Tensorflow游乐场,我们在浏览器中就可以训练自己的神经网络,还有酷酷的图像让我们更直观地了解神经网络的工作原理.今 ... 网络 工具 机器学习 神经网络 Tensorflow 昨天,Google发

谷歌开源第二代机器学习系统 TensorFlow

深度学习对计算机科学而言,是有相当深远的影响的.它让尖端科技研究.开发数千万人日常使用的产品成为可能.Research宣布推出第二代机器学习系统TensorFlow,针对先前的DistBelief的短板有了各方面的加强,更重要的是,它是开源的,任何人都可以用. 谷歌内部深度学习结构DistBelief开发于2011年,它让谷歌能够针对数据中心的数千核心,构建更为大型的神经网络和规模训练,典型的应用像是提升谷歌应用中的语音识别能力,以及为谷歌搜索加入图片搜索功能. 不过DistBelief存在一些

机器学习入门 - 1. 介绍与决策树(decision tree)

机器学习(Machine Learning) 介绍与决策树(Decision Tree) 机器学习入门系列 是 个人学习过程中的一些记录与心得.其主要以要点形式呈现,简洁明了. 1.什么是机器学习? 一个比较概括的理解是: 根据现有的数据,预测未来 2.核心思想 : Generalization 可以理解为,归纳.概括.就像是人的学习一样,找出一件事物与与一件事物的联系 3.归纳性的机器学习(Inductive machine learning) 其核心思想是使用训练数据,并从其中摸索出一套适用

机器学习与Tensorflow(1)——机器学习基本概念、tensorflow实现简单线性回归

一.机器学习基本概念 1.训练集和测试集 训练集(training set/data)/训练样例(training examples): 用来进行训练,也就是产生模型或者算法的数据集 测试集(testing set/data)/测试样例 (testing examples):用来专门进行测试已经学习好的模型或者算法的数据集 2.特征向量 特征向量(features/feature vector):属性的集合,通常用一个向量来表示,附属于一个实例 3.分类问题和回归问题 分类 (classific

斯坦福大学 机器学习 相关知识点介绍

1 机 器 学 习 介 绍 1 机器学习介绍 "机器学习研究的是如何赋予计算机在没有被明确编程的情况下仍能够学习的能力. (Field of study that fives computers the ability to learn without being explicitly programmed.)" "对于某个任务 T 和表现的衡量 P,当计算机程序在该任务 T 的表现上,经过 P 的衡量,随着经验 E 而增长,我们便称计算机程序能够通过经验 E 来学习该任务.

【机器学习 & 数据挖掘 通俗介绍】

如何向小白介绍何谓机器学习和数据挖掘?买回芒果他就懂了 JasonZheng • 2013-01-07 22:18 买芒果 嘴馋的你想吃芒果了,于是你走到水果摊,挑了几个让老板过过秤,然后你再根据芒果的斤两付钱走人. 显然,买芒果你当然是挑着最甜.最熟的来买(因为你是根据重量而不是质量来掏钱的).怎么个挑法才靠谱呢? 对了,你奶奶说过,金黄色的要比浅黄的更甜些.于是你就做了一个简单的规定:只挑金黄色的买,过磅.付钱.回家.就这么简单? 不尽然. 生活没那么简单 拎着芒果回到了家,尝了尝,你发现有

机器学习框架Tensorflow数字识别MNIST

SoftMax回归  http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92 我们的训练集由  个已标记的样本构成: ,其中输入特征.(我们对符号的约定如下:特征向量  的维度为 ,其中  对应截距项 .) 由于 logistic 回归是针对二分类问题的,因此类标记 .假设函数(hypothesis function) 如下: 我们将训练模型参数 ,使其能够最小化代价函数 : 在 softmax回归中,我们解决的是多分

TensorFlow介绍(官网资料)

关于 TensorFlow TensorFlow? 是一个采用数据流图(data flow graphs),用于数值计算的开源软件库.节点(Nodes)在图中表示数学操作,图中的线(edges)则表示在节点间相互联系的多维数据数组,即张量(tensor).它灵活的架构让你可以在多种平台上展开计算,例如台式计算机中的一个或多个CPU(或GPU),服务器,移动设备等等.TensorFlow 最初由Google大脑小组(隶属于Google机器智能研究机构)的研究员和工程师们开发出来,用于机器学习和深度