自己写个 Prisma

Sirajology的视频链接

前一段时间特别火的 Prisma 大家都玩了么，看了这篇文章后，你也可以自己写一个 Prisma 迷你版了。

这个 idea 最开始起源于 Google Research Blog

Here’s the initial Google DeepDream blog post:

他们用大量的图片数据来训练深度神经网络，使这个网络可以判断出图片中的事物，然后投入一个新的图片，让图片识别，不仅仅是识别，还要把图片修正为网络学到的东西。

然后另一个团队发表了一篇相似的论文

他们用名画来训练模型，然后投入一个生活中的图片，通过强化一些 feature，将这个图片修正为更像名画风格的图片。

原理就是用一个 Convolutional Neural Network 学习一张图片的 style ，然后把另一张图片转换成这种 style。

用到的工具是 python 和 keras 包，文章后面有作者的源码的地址。

引入需要的包

from scipy.misc import imread, imresize, imsave
from scipy.optimize import fmin_l_bfgs_b
from sklearn.preprocessing import normalize
import numpy as np
import time
import os
import argparse
import h5py

from keras.models import Sequential
from keras.layers.convolutional import Convolution2D, ZeroPadding2D, AveragePooling2D
from keras import backend as K

定义三个图片变量

#Define base image, style image, and result image paths
args = parser.parse_args()
base_image_path = args.base_image_path
style_reference_image_path = args.style_reference_image_path
result_prefix = args.result_prefix

引用事先计算好的 weights vgg16

这是提前训练好的，可以识别生活中的图片，以它作为模型的起点。

#Get the weights file
weights_path = r"vgg16_weights.h5"

定义 booleans 决定是否 reshape 图片

#Init bools to decide whether or not to resize
rescale_image = strToBool(args.rescale_image)
maintain_aspect_ratio = strToBool(args.maintain_aspect_ratio)

然后初始化 style－content weights

什么是style－content weights？

在神经网络学习的过程中，不同的层学到的东西是不一样的，例如识别一个小狗，一层学到的是 edge，下一层学到的是 shape，再下一层是更复杂的 shape，最后学到的是整个的 dog。

在学习艺术风格的网络中发现，低层次学到的是 style，如纹理颜色框架等，高层次学到的是 content，如太阳等具体的物体，CNN会把 content 和 style 分离开，所以要达到不同的效果，需要不同的权重分配。

# Init variables for style and content weights.
total_variation_weight = args.tv_weight
style_weight = args.style_weight * args.style_scale
content_weight = args.content_weight

然后设定图片维度，定义tensor代表三个图片 base image，style image，output image。

# Init dimensions of the generated picture.
img_width = img_height = args.img_size
assert img_height == img_width, ‘Due to the use of the Gram matrix, width and height must match.‘
img_WIDTH = img_HEIGHT = 0
aspect_ratio = 0

# get tensor representations of our images
base_image = K.variable(preprocess_image(base_image_path, True))
style_reference_image = K.variable(preprocess_image(style_reference_image_path))

# this will contain our generated image
combination_image = K.placeholder((1, 3, img_width, img_height))

再组合到一个 tensor 中

# combine the 3 images into a single Keras tensor
input_tensor = K.concatenate([base_image,
                              style_reference_image,
                              combination_image], axis=0)

放在一个 tensor 中，因为更容易被神经网络解析，这样一个高维的图片也可以有可以计算的复杂度。

建立 31 层的神经网络

# build the VGG16 network with our 3 images as input
first_layer = ZeroPadding2D((1, 1))
first_layer.set_input(input_tensor, shape=(3, 3, img_width, img_height))

model = Sequential()
model.add(first_layer)
model.add(Convolution2D(64, 3, 3, activation=‘relu‘, name=‘conv1_1‘))
model.add(ZeroPadding2D((1, 1)))
model.add(Convolution2D(64, 3, 3, activation=‘relu‘))
model.add(AveragePooling2D((2, 2), strides=(2, 2)))

。。。

一共有3种：

convolution2D layer：拥有可学习的filters，这些filters有receptive field，用来将神经元连接到下一层的一个局部的区域，而不是连接到每一个神经元

ZeroPadding layer：用来控制 output 的大小

Pooling layer：只用图片的子集来计算，减少参数数量，用来避免 overfitting。

激活函数用的是 ReLU，比sigmoid更快一些。

各个层的参数分别是：

定义完模型后，引入 vgg16 的权重

# load the weights of the VGG16 networks
load_weights(weights_path, model)

定义 Loss Function：计算预测和实际的差别

# get the symbolic outputs of each "key" layer (we gave them unique names).
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

# get the loss (we combine style, content, and total variation loss into a single scalar)
loss = get_total_loss(outputs_dict)

得到 gradients

# get the gradients of the generated image wrt the loss
grads = K.gradients(loss, combination_image)

最后用 back propagation 训练模型，此处用到的算法是 limit－memory BFGS，可以最小化 loss function 而且空间效率较高。

#combine loss and gradient
f_outputs = combine_loss_and_gradient(loss, grads)

# Run scipy-based optimization (L-BFGS) over the pixels of the generated image to minimize the neural style loss
# 5 Step process
x, num_iter = prepare_image()
for i in range(num_iter):

    #Step 1 - Record iterations
    print(‘Start of iteration‘, (i+1))
    start_time = time.time()

    #Step 2 - Perform l_bfgs optimization function using loss and gradient
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x.flatten(),
                                     fprime=evaluator.grads, maxfun=20)
    print(‘Current loss value:‘, min_val)

    #Step 3 - Get the generated image
    img = deprocess_image(x.reshape((3, img_width, img_height)))

    #Step 4 - Maintain aspect ratio
    if (maintain_aspect_ratio) & (not rescale_image):
        img_ht = int(img_width * aspect_ratio)
        print("Rescaling Image to (%d, %d)" % (img_width, img_ht))
        img = imresize(img, (img_width, img_ht), interp=args.rescale_method)
    if rescale_image:
        print("Rescaling Image to (%d, %d)" % (img_WIDTH, img_HEIGHT))
        img = imresize(img, (img_WIDTH, img_HEIGHT), interp=args.rescale_method)

最后，rescale 并且保存图片

    #Step 5 - Save the generated image
    fname = result_prefix + ‘_at_iteration_%d.png‘ % (i+1)
    imsave(fname, img)
    end_time = time.time()
    print(‘Image saved as‘, fname)
    print(‘Iteration %d completed in %ds‘ % (i+1, end_time - start_time))

这个算法也可以用到视频中。

另外还找到一篇《我是如何用TensorFlow 做出属于自己的Prisma的？》

感兴趣就动手写一下吧。

The code for this video is here:

Here’s the initial Google DeepDream blog post:

A Deepdream web app:

The Neural Style Paper:

时间： 2025-01-10 06:27:43

自己写个 Prisma的相关文章

Python：hashlib加密模块，flask模块写登录接口

hashlib模块主要用于加密相关的操作,(比如说加密字符串)在python3的版本里,代替了md5和sha模块,主要提供 sha1, sha224, sha256, sha384, sha512 ,md5 这些加密方式 import hashlib m = hashlib.md5() #用md5加密的方式(md5加密后无法解密),创建一个md5的对象 m.update(b"Hello") #b代表二进制字节bytes,把字符串hello转成字节,然后加密:用b给一个变量转换

三行写出莫比乌斯函数(HDU1695)

莫比乌斯函数是可以在三行内写出来的 1 #include<bits/stdc++.h> 2 using namespace std; 3 typedef long long ll; 4 const int maxn=1000000; 5 int mu[maxn+10],T; 6 void Mobius(){ 7 for(int d=1,k;d<=maxn;++d) 8 for(mu[1]=1,k=d<<1;k<=maxn;mu[k]=mu[k]-mu[d],k+=d);

在windows 下使用eclipse进行编译和烧写

eclipse IDE是一款开源的前端编程软件,它提供了编写,编译和调试ESP-IDF项目的图形集成开发环境. 首先在https://www.obeo.fr/en/eclipse-download?INSTALLER-WIN64中选择需要的对应位数的eclipse. 然后在http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html下载eclipse运行所需的java环境. 在安装是选择点

昨天没写今天补上

恩因为今天要考试所以昨天晚上在复习,没来得及写,现在补上. 昨天依旧是循环,被虐了整整一天! 到现在仍然不知道for里的循环体怎么写.... 感到人生无望...(不想说什么了粘题吧) p1032;#include〈iostream〉 using namespace std; int main () { long long a,b＝0,c,d,sum＝0; cin>>a; while (a>0) { d=a/10; sum++; for(int i＝1;i<＝sum;i++) { c

.net 自己写的操作Excel 导入导出类（以供大家参考和自己查阅）

由于现在网页很多都关系到Excel 的操作问题,其中数据的导入导出更是频繁,作为一个菜鸟,收集网上零散的知识,自己整合,写了一个Excel导入到GridView ,以及将GridView的数据导出到EXCEL的类方法,以供参考和方便自己以后查阅. 1 #region 引用部分 2 using System; 3 using System.Collections.Generic; 4 using System.Linq; 5 using System.Web; 6 using System.Dat

架构师写给工程师的一封信（很有价值）【转】

下面的邮件是某Architect发给他的Engineering团队的(来源),我觉得挺不错的,翻译过来,我相信我们所有的程序员都能从中学到很多东西.下面是这封邮件-- 每次当我开始做新的东西是我就会很兴奋.就算在软件圈里做了20年以后,每当开始新的旅程里,我都觉得我心中有一些东西不吐不快.这是我们大家一起的旅程.我强烈地相信我们详细规划的过程是很有乐趣的,富有挑战的和丰富多彩的.我想让这个旅程让你们难忘,并且能增添你们所有人的阅历. 这看起来有些唯心主义,不过,我想制订我的工作日程,我们的技术策

一个缓存容灾写的样例

背景有时我们能够使用缓存进行容灾的处理.场景例如以下:我们当前有一个专门提供各种数据的应用DataCore,该应用开放多个RFC方法供其它应用使用. 我们平时在读写数据时,会在Cache备份一份(为平时DataCore提高响应速度.减少DB.CPU压力所用),当DB挂掉的时候.Cache还能够用来容灾.使用缓存容灾的优点是:性能足够好,坏处是缓存可比数据库成本高多了. 让我们想象得更猛烈些,当DataCore整个挂掉的时候,A.B.C.D方怎么才干安然的执行下去? 我们能够在A.B.

Mysql占用大量写I/O

早上收到zabbix告警,发现某台存放监控数据的数据库主机CPU的IOwait较高,一直持续较长时间. 登录服务器查看磁盘IO发现队列高达90%多,而且经常反复如此通过iotop查看发现占用io较大的进程是mysql 登录mysql查看show processlist,发现基本上每次io队列较高时都是在insert时,以为是插入语句有问题,于是打开mysql慢查询日志,观察一段时间磁盘io仍然较高,但是发现并没有任何慢查询语句: 查找关于mysql IO问题优化资料,<[转载]sync_bin

自写原生jq滚轮插件

自己仿bootStarp插件写的,思路局限,仅供交流,有好的建议还请不吝赐教//使用方法: //html:需要ul>li>a的模式,a需要类名,需要设置page-scroll属性,page-scroll属性需要1以数字结尾,容器为所变化的最大页面 //js:请传入一个事件对象,对象需要传入属性:pageAnchorName(锚点id)及contianer(容器类) //下标jq对象传入$navIndex属