Mastering the game of Go with deep neural networks and tree search

Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.

Alphago的论文,主要使用了RL的技术,不知道之前有没有用RL做围棋的。

提出了两个网络,一个是策略网络,一个是价值网络,均是通过自我对战实现。

策略网络:

策略网络就是给定当前棋盘和历史信息,给出下一步每个位置的概率。以前的人似乎是用棋手下的棋做有监督训练,这里用RL代替,似乎效果比有监督训练要好。策略网络的参数初始化是用有监督训练网络的参数。

价值网络:

价值网络就是给定当前棋盘和历史信息,给出对己方的优势概率。本来是用来代替蒙特卡洛的随机模拟估计的,但是发现把价值网络和随机模拟估计结合起来效果比较好。个人觉得要是价值网络如果训练得足够好,说不定也就不需要模拟估计了。当然这里的模拟也不是完全随机,好像是用的一个有监督训练出来的浅层网络进行模拟下棋。

策略网络可以降低蒙特卡洛搜索树的宽度,价值网络减小其深度。

该论文第一次打败了人类职业选手(五段的Fan Hui)

另外,该方法有分布式版本和单机版,官方给单机版的判断是和Fan Hui一个水平,分布式版本的可以达到职业5段以上水平。分布式版本用了40个搜索线程, 1,202 个CPU以及176个GPU。单机版是40个搜索线程,48个CPU和8个GPU。按照这个配置,应该10年之内,单台笔记本电脑能跑个职业3段以上的围棋程序,这对围棋学习者是个很好的消息。

Alphgo让RL火了,让围棋火了,让柯洁火了,威力还是巨大的。围棋比较容易形式化,规则也比较简单,只是搜索空间有点大,但现实中还有很多问题规则复杂,信息不完全,状态空间大,决策空间大,需要联合决策等。Alphago还在不断发展,后续应该还有论文。

时间: 2024-07-29 11:53:26

Mastering the game of Go with deep neural networks and tree search的相关文章

AlphaGo论文的译文,用深度神经网络和树搜索征服围棋:Mastering the game of Go with deep neural networks and tree search

转载请声明 http://blog.csdn.net/u013390476/article/details/50925347 前言: 围棋的英文是 the game of Go,标题翻译为:<用深度神经网络和树搜索征服围棋>.译者简单介绍:大三,211,计算机科学与技术专业,平均分92分,专业第一.为了更好地翻译此文.译者查看了非常多资料.译者翻译此论文已尽全力,不足之处希望读者指出. 在AlphaGo的影响之下,全社会对人工智能的关注进一步提升. 3月12日,AlphaGo 第三次击败李世石

On Explainability of Deep Neural Networks

On Explainability of Deep Neural Networks « Learning F# Functional Data Structures and Algorithms is Out! On Explainability of Deep Neural Networks During a discussion yesterday with software architect extraordinaire David Lazar regarding how everyth

(转)Understanding, generalisation, and transfer learning in deep neural networks

Understanding, generalisation, and transfer learning in deep neural networks FEBRUARY 27, 2017 This is the first in a series of posts looking at the 'top 100 awesome deep learning papers.' Deviating from the normal one-paper-per-day format, I'll take

Training Deep Neural Networks

http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html  //转载于 Training Deep Neural Networks Published: 09 Oct 2015  Category: deep_learning Tutorials Popular Training Approaches of DNNs?—?A Quick Overview https://medium.com/@asjad/po

Why are Eight Bits Enough for Deep Neural Networks?

Why are Eight Bits Enough for Deep Neural Networks? Deep learning is a very weird technology. It evolved over decades on a very different track than the mainstream of AI, kept alive by the efforts of a handful of believers. When I started using it a

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling or clustering raw

论文阅读--Scalable Object Detection using Deep Neural Networks

Scalable Object Detection using Deep Neural Networks 作者: Dumitru Erhan, Christian Szegedy, Alexander Toshev, and Dragomir Anguelov 引用: Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Confere

为什么深度神经网络难以训练Why are deep neural networks hard to train?

Imagine you're an engineer who has been asked to design a computer from scratch. One day you're working away in your office, designing logical circuits, setting out AND gates, OR gates, and so on, when your boss walks in with bad news. The customer h

Classifying plankton with deep neural networks

Classifying plankton with deep neural networks The National Data Science Bowl, a data science competition where the goal was to classify images of plankton, has just ended. I participated with six other members of my research lab, the Reservoir lab o