Federated learning (FL) is a machine learning technique that trains models by using any number of decentralized devices. Training data sets are local to each device, and are not shared with any of the other devices — only certain parameters of the trained model are shared. These properties of FL allow it to be used in building robust models from vast training sets, without the privacy concerns that would come along with centralizing the data. For example, FL has applications in training models on medical data across institutions without raising concerns about patient privacy.
Several software tools exist that have support for FL, however, they lack some features that limit their usefulness in FL research applications. Most existing tools do not support diverse FL computing paradigms, restricting the scenarios under which models can be deployed. Moreover, FL tools tend to be inflexible in their support for varied network topologies, types of information exchanged between devices, and training procedures.
These are among the problems that a group of researchers set out to correct with a new FL library called FedML. FedML supports three distinct computing paradigms: 1) on-device training for edge devices, 2) distributed computing in the cloud, and 3) single-machine simulation. The library also supports a worker/client-oriented programming interface for interacting with diverse network topologies, as well as flexible data interchange and training procedures.
FedML is partitioned into two primary components, FedML-API and FedML-core. FedML-API is the high-level API through which a user can implement new algorithms with a client-oriented programming approach. FedML-core is the low-level API that handles communication among different workers/clients via MPI and MQTT.
The additional modules, FedML-Mobile and FedML-IoT, add FL support for real-world hardware, including Android smartphones, Raspberry Pi 4, and the NVIDIA Jetson Nano. Support for these platforms enables the evaluation of realistic system performance, including training time and computation cost.