So much is possible in robotics when individual agents — robots or drones — are able to work together to complete a task. If they are not equipped with the right hardware, however, or if signals are blocked, communication may be impossible. This seemingly impossible state is where researchers at the University of Illinois Urbana-Champagne began as they developed a method to train multiple agents to work together utilizing a type of artificial intelligence known as multi-agent reinforcement learning, or MARL.
It may be simpler when agents can talk to one another, but this decentralized method posed an interesting challenge, as it may not even be obvious what the role of each individual agent should be. The challenge becomes one of learning to accomplish a task together over time. This was accomplished by Huy Tran, an aerospace engineer at Illinois, and his colleagues through the creation of a utility function to tell the agents when an action it has taken has been useful to or made progress for the team.
With team goals, like with sports, it can be difficult to determine who contributed to the win, who assisted, or who scored. Delayed effects like assists can be difficult to understand, as can actions that don’t contribute to the end goal. The proposed algorithm can identify all of these successor features, or SFs, and disentangle them to place values on their impact. To test these algorithms, the researchers simulated games like StarCraft and Capture the Flag, the latter of which is demonstrated in a video posted to the aerospace engineering department’s YouTube channel.
This deep reinforcement learning, over time, helps the individual agents evaluate how their next actions will contribute to the team goal. Testing results reported by the researchers show improved performance and training time relative to existing methods, making multi-agent reinforcement learning a promising approach to coordination. Though testing was done by simulating games, the algorithm is applicable to real-life situations, including military surveillance, warehouse automation, traffic control, UAVs coordinating deliveries, and intelligent control of electric power grids.
Overall, the testing shows SFs can be used to disentangle the impact of individual agents on a global value function, resulting in individual value functions that have high learnability. Suggestions for future work building on these findings include exploring learning SF disentanglements that account for specific agent types. The results in these initial tests suggest that heterogeny may not be necessary, but further developments may be able to leverage particular agent-specific actions of specializations.