2023 Author: Bryan Walter | [email protected]. Last modified: 2023-05-21 22:24
Unity Technologies has announced a competition for machine learning algorithms for playing computer games. They will compete in a gaming environment consisting of 100 levels, in each of which the algorithm will need to solve tasks in order to advance further. The total prize pool of the competition is over one hundred thousand dollars, and its start is scheduled for February 11, according to the Unity blog. The description of the gaming environment and the results of the first experiments were published in an article presented at the AAAI-2019 conference.
For a person, learning to play a computer game at a good level is a simple task that requires relatively little time. However, for computer algorithms, even simple 2D platformers are challenging. In recent years, researchers have begun to use machine learning algorithms to solve it, when using which the agent learns the game, and not just executes the scripts embedded in it. But to train such algorithms, the game must be adapted and an API must be created for the agent to interact with the game. In addition, the very mechanics of games created for humans may not be adapted for the passage of a bot.
Unity Technologies, the developer of one of the most popular game engines Unity, has created the Obstacle Tower game environment specifically for training algorithms. It is a game world consisting of a tower with one hundred floors. Each floor consists of several rooms, in one of which the agent starts the game, and in the other one can move to the next level, and the number of rooms increases with each floor of the tower. To access the next floor, the algorithm needs to solve certain tasks at the current level, for example, cope with puzzles or defeat enemies. The time for passing the level is limited, but the agent can increase it by collecting time capsules and passing the levels. The peculiarity of the game is that the levels are created using procedural generation, due to which the game checks the generalization of the skills learned by the algorithm.
During the game, the agent receives two types of data: a color image with a resolution of 168 by 168 pixels, as well as an auxiliary vector created from the number of keys that the agent collected at this level, as well as the time remaining to complete the level. The agent can move in four directions, turn in two directions, and also jump. Since the developers see Obstacle Tower as a framework for reinforcement learning algorithms, they have implemented two reward functions. When playing in one of them, the agent receives a reward only for arriving at the last door leading to the next floor, and in the second mode, he receives a reward for intermediate events, such as collecting keys and opening doors between rooms.
Level maps of varying difficulty
The developers tested three types of algorithms that were trained with different levels of variation in the gaming environment. All agents learned to play at a significantly lower level than humans, but the most interesting was a different result. The best performance was achieved by an algorithm that was trained on one fixed version of the environment. The developers explain this by the fact that thanks to the use of a stable environment, the algorithm managed to make more progress in this version of the game world and learn more behaviors that are potentially applicable to other versions.
The developers have published the first version of the environment, consisting of the first 25 levels, on GitHub, and they are also going to publish the full version. The Obstacle Tower Challenge on Wednesday's first edition starts February 11th and runs through March 31st, with the full 100-level competition phase running from April 15th to June 14th.
In 2017, Blizzard announced an open API for StarCraft II multiplayer, allowing third-party developers to train their algorithms on the game. At the end of January 2019, a competition was held between the AlphaStar neural network developed by DeepMind and StarCraft II players who are among the top 100 players in the world. At the end of the competition, two versions of the neural network defeated two players, winning five matches out of five.