又到了每周一次的Nature Podcast时刻了!欢迎收听本周由Benjamin Thompson和 Shamini Bundell 带来的一周科学故事,本期播客片段评论怎样教会AI玩《星际争霸II》。欢迎前往iTunes或你喜爱的其他播客渠道下载完整版,随时随地收听一周科研新鲜事。
音频文本:
Interviewer: Nick Howe
When you were young, how did you learn? School, of course, but also likely through playing games. Whether it was a friendly game of football or an unfriendly game of Monopoly, play allows humans to acquire skills that they’ll go on to use in later life. But it’s not just humans who need to learn life skills – artificial intelligences do as well. By getting them to learn by playing, researchers think that this could be a better way for AIs to understand the world around them. AIs have mastered board games like chess, Go and even video games like Super Mario, but these games have relatively simple rules. For researchers to develop a more complex AI, they need a bit more of a challenge. Enter StarCraft II. In basic terms, the aim of StarCraft II is to build an army to overwhelm your opponent, who is trying to do the same thing. To actually accomplish that, you need to direct your units to gather resources to allow you to build buildings, which in turn will allow you to build more units for your army. Now, beating an opponent in StarCraft is particularly difficult for AIs because there’s lots of choices that can made at any one time. Do you want to gather more resources, build a unit, move some of your army to attack an enemy, develop a new technology? The list goes on. At any one time, there’s around 1026 different choices that could be made, and this is all happening in real-time. There’s no turns to think about things, like in chess. To further complicate matters, in a game of StarCraft, you don’t know what your enemy is up to unless you have eyes on them with a scouting unit. Also, whilst playing in real-time, you need to be planning – something you do in the first few minutes of the game may have impacts near the end.
Interviewee: Oriol Vinyals
And so, it was kind of somewhat clear that StarCraft would be a great grand challenge for us to tackle using the techniques we love, namely neural networks, deep reinforcement learning and so on and so forth.
Interviewer: Nick Howe
This is Oriol Vinyals of Google’s DeepMind Technologies. This week in Nature, Oriol is showcasing an AI called AlphaStar, which has been learning how to play StarCraft and it’s gotten pretty good at it. How good? Well, Oriol and the team tested it against real human players online.
Interviewee: Oriol Vinyals
We actually calibrated at a very kind of high level. So, StarCraft II has a few leagues that players get ranked as, from Bronze, Silver, Gold, up to Diamond, Master and Grandmaster. Grandmaster is the very top 200 players of a region, and we played in Europe because that’s where DeepMind is based, and we actually ranked amongst the Grandmasters.
Interviewer: Nick Howe
AlphaStar swept aside many top players to reach the pinnacle of the ranking leagues. But what does it mean to be a Grandmaster? Here’s professional StarCraft II player, Dario Wünsch, better known as TLO, with an explanation.
Interviewee: Dario Wünsch
So, yeah, that’s basically the crème-de-la-crème. Probably the top 50 Grandmaster players are people that are competing at international tournaments, so AlphaStar is very close to the professional level, looking at that.
Interviewer: Nick Howe
So, how did AlphaStar reach these lofty heights? Well, the first step was to train the AI by letting it watch human players playing against each other. By being fed in data from hundreds of human matches, AlphaStar got a sense of how to play the game. The next step was then for the AI to play against itself, to determine which strategies were the most effective. This made the AI improve, but as Oriol explained, it wasn’t quite enough to get the AI to Grandmaster level.
Interviewee: Oriol Vinyals
So, if you just take this agent that’s created from human behaviour or imitation learning and make it play against itself, it improves a little bit, but it loses its diversity in strategies. There’s so many different things that can happen in this game and essentially, self-play focuses too much on a single strategy and then it’s not robust against all the sorts of different creative ways that you can play the game. So, what we did to improve self-play was to create this notion of exploiters that are just agents whose sole purpose is to beat the main agent to show what you are weak against.
Interviewer: Nick Howe
These exploiters played in bizarre and unexpected ways, which exposed the holes in AlphaStar’s strategy.
Interviewee: Oriol Vinyals
And so, that, plus then some notions of keeping a bit of this imitation behaviour and just this human, if you will, prior knowledge about the strategies that exist, those two key components – the imitation and the league – then created AlphaStar final, the agent that was essentially Grandmaster.
Interviewer: Nick Howe
Now, you may be thinking okay, the AI was a Grandmaster, but isn’t a computer always going to be better at a computer game? And you have a point. While humans playing the game are physically limited to speed they can click the mouse or press the keyboard, the AI doesn’t have fingers that can only click so fast. To help make sure the AlphaStar played fair, Dario, who you heard from earlier, advised the DeepMind team.
Interviewee: Dario Wünsch
They basically put some limitations on the agents so they could act just about the same speed as a human with some reaction delay and so on. Then I was in the offices, I was assessing if now that they’ve put these restrictions in, if the agent is actually competing on a fair level to a human.
Interviewer: Nick Howe
After adding the limitations to the AI, Dario was happy with how the AI played. In the paper, he said: “Overall, it feels very fair”. This was important to the DeepMind team, as it meant that AlphaStar couldn’t brute force a victory by performing a superhuman number of actions simultaneously. Instead, it had to improve its skill and strategy, much like a human would have to do. But whilst hammering humans at video games is all well and good, how does it help develop AI technologies and move us towards more broad-ranging intelligences and even human-like Artificial General Intelligence or AGI? Well, Oriol hopes that understanding how AlphaStar got better at StarCraft could help train future AIs. But whilst Oriol is optimistic about how his team’s approach will help build AGI, Michael Rovatsos, an AI researcher not associated with this work, has concerns that we may miss something by designing AIs in this way. He thinks Oriol’s approach is great for solving problems that have an optimal solution, like being awesome at StarCraft II, but for more general-purpose applications it falls short.
Interviewee: Michael Rovatsos
So, it seems to me that, to some extent, in the current landscape, we’re very much kind of looking at very hard puzzle-like problems rather than what humans do, which they’re mostly rubbish at most things but they kind of do okay across a very broad range of tasks.
Interviewer: Nick Howe
To move past this narrow specialisation and reach that human-like level of general intelligence, Michael thinks that the next thing AIs need to do is to learn how to listen. AlphaStar basically tried millions of different strategies until it found something that worked, which isn’t really what humans do. If an architect had to build a million houses until they built one that didn’t fall down, well, they wouldn’t get a lot of work. Instead humans learn from each other, from books and schools, and that’s what Michael thinks AIs should do more of.
Interviewee: Michael Rovatsos
My personal view is that it has to be more about communication. The products of human intelligence, we accumulate them culturally over time as a society and through collective intelligence. No single person can get enough experience to solve the really big problems by themselves, and right now, even on the simplest level, you couldn’t, let’s say, take this system and easily tell it this whole area of strategies you’re considering is not useful – ignore it. There is no way for this agent or algorithm to actually consider what information it might be given.
Host: Benjamin Thompson
That was Michael Rovatsos from the University of Edinburgh in the UK. You also heard from Oriol Vinyals from DeepMind Technologies, also here in the UK, and Dario Wünsch from Team Liquid in the Netherlands. You can find Oriol’s paper over at nature.com, and we’ve also got a video showing the AI StarCraft army being put through its paces. You’ll find that over at youtube.com/NatureVideoChannel.
Nature Podcast每周为您带来科学国际的全球新闻故事,掩盖很多科研范畴,要点叙述Nature期刊上激动人心的研讨故事。咱们将话筒递给研讨背面的科学家,出现来自Nature记者和修改的深度剖析。在2017年,来自我国的收听和下载超越50万次,居全球第二。