AlphaGo's AI upgrade gets round the need for human input

Thursday, 19 Oct, 2017

Go was considered a hard game for computers to master because, besides being complex, the number of possible moves - more than chess at 10 - is greater than the number of atoms in the universe. There are ten to the power of 170 possible board configurations in Go - more than the number of atoms in the known Universe. (Things like poker are good examples.) Now, a Google-owned AI developer has taken this approach to the game Go, in which AIs only recently became capable of consistently beating humans. The game of Go is constrained to a fixed and strict environment: there's no randomness, luck, or chance affecting the outcome.

The next move from the best possible chain is then played, and the computer players repeat the above steps, coming up with chains of moves ranked by strength. In a head-to-head matchup, AlphaGo Zero defeated the original program by 100 games to none. And that's why it was such a big deal when AlphaGo defeated Go champion Lee Sedol by 4-1 past year. "AlphaGo Zero is now the strongest version of our program and shows how much progress we can make even with less computing power and zero use of human data".

He enthuses about an idea some may find rather scary - that in just a few days a machine has surpassed the knowledge of this game acquired by humanity over thousands of years.

It's official; artificial intelligence no longer needs to learn from humans.

"The previous version of AlphaGo was also an unbelievable achievement, but in some ways, this now feels complete", says Martin Mueller, a computer scientist at the University of Alberta, in Edmonton, Canada, who also studies Go programs. The only human input is to tell it the rules of the game. AlphaGo Zero, however, uses a single neural network. The AI learned without supervision-it simply played against itself, and soon was able to anticipate its own moves and how they would affect a game's outcome.

Roger Huyshe, president of the British Go Association, said: "AlphaGo wasn't only a big moment for the Go community it was a big leap forward for AI". David Silver, a lead researcher on AlphaGo, said it's an effective technique because the opponent is always the right level of difficulty.

"For us AlphaGo wasn't just about winning the game of Go, it was also a big step for us towards building general goal learning algorithms", he said.

Both AlphaGo and AlphaGo Zero use a machine-learning approach known as reinforcement learning (see "10 Breakthrough Technologies 2017: Reinforcement Learning") as well as deep neural networks. Using the results of those games, it then taught a separate game-prediction network to predict whether a given move would lead to a win.

The AI learns the moves that will maximize its chance of winning through trial and error - called "reinforcement learning" - and was trained exclusively by playing games against itself. Researchers didn't feed its neural network any data from past games played by humans. "So we end up with a new version of AlphaGo Zero that's even stronger than what came before and in turn as this process is repeated, it gives rise to ever better quality data which is used to train even better neural networks, and the process repeats". The software is a distillation of previous systems DeepMind has built: It's more powerful, but simpler and doesn't require the computational power of a Google server to run. But its predecessors used ten times that number. So, he says, it remains to be seen how well AlphaGo Zero's techniques can work in less structured domains. AI agents today can typically excel at one task (such as a game) but they'd struggle to do multiple tasks at the same time, especially if those tasks are in different domains.

"Ultimately we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems like protein folding or designing new materials".

"We're trying to build general goal learning algorithms here and this is just one step towards that, but quite an exciting step", said Hassabis during the press briefing.

Numerous team have now moved on to new projects where they are trying to take this technique to new areas.