Is self-play a bottleneck in theory for AlphaGo to improve? My perspective is not! The real problem with AlphaGo (and any other AI and human) is the state space of Go is much larger than the state space of its neural network, therefore no matter how we train it, it still suffers from the underfitting problem. Which means there is always a problem with its value network and policy network that, when some cases are trained very well, other cases pop up.
But in terms of supervised learning vs unsupervised learning, they only differ in training set, which means they make AlphaGo‘s neural network bias to a certain style and handles certain cases very well. Unsupervised learning can provide all the information that supervised learning can provide, imagine the board is 9*9, unsupervised learning is absolutely enough to provide a good training set. So, unsupervised learning is not really a bottleneck in theory, but in practice supervised learning makes AlphaGo bias to a certain style, and have a better chance to win a certain opponent. But when its neural network gets larger to be able to accommodates more states, the value of supervised learning is also decreased.
Because of the underfitting problem, the value network may get wrong on who‘s winning the game on some states which may look simple to human. This is why AlphaGo use MCTS to rollout for many steps for validation, only when after playing down some steps, the game is still in favor of AlphaGo, the original state is considered truly good. So, AlphaGo is really a mixture of "intuition + logic", this is very similar to human.
This design makes it very hard to catch AlphaGo‘s weakness, but it does exists. Based on the analysis above, the weakness of AlphaGo is clear to me now: its value network gets wrong on not only one state, but also many steps following the state. Although the probability is very low, but it did happened in Game 4. Brilliant Lee Sedol!