Data collection is pivotal to creating an accurate machine learning model. Without samples for a model to learn from, well, there just will not be much learning going on. But the effort, expense, and time required to build a sufficiently large and diverse dataset creates bottlenecks that prevent the use of machine learning for some applications that could otherwise benefit greatly from its use. These bottlenecks are especially pronounced when the training dataset consists of private information that cannot be shared outside of a single organization. Consider the case of a model that is being trained to recognize tumors in medical images. Access to this sort of data is highly regulated, so in general, hospital systems would be limited to using their own data, which may not be sufficient to train an accurate, well-generalized model.
One approach that has been used to circumvent privacy issues is federated learning. Using this technique, a number of entities each train their own model, using only their own data. Then all of the trained models are transferred to a central location where they can be combined and distributed back to each entity. In this way, all participating parties can reap the benefits of a much larger set of data, but none of them actually need to share any of their private information. Federated learning has had some successes, but some features of the approach have prevented it from being adopted more widely. Namely, transferring large, trained models to a central server (perhaps hundreds of times!) has high communication costs, each dataset is collected using different standards, which can hinder combined model performance, and the final model is generated by taking the average values, so it is not personalized for each individual entity.
Researchers from MIT, and a startup called DynamoFL, have recently reported on their work which seeks to solve all three of the aforementioned problems with federated learning in a single stroke. The method, dubbed FedLTN, takes its inspiration from the lottery ticket hypothesis for slashing the size of neural networks to improve transmission speeds. This hypothesis states that within any sufficiently large neural network, there are smaller subnetworks that perform just as well as the full network. Of course the trick to shrinking a model in this way is to find these subnetworks so that the unnecessary portions can be pruned away.
They accomplished this by iteratively trimming the network, then checking the new network’s accuracy. If the new accuracy is above a predetermined threshold, the nodes and connections are permanently removed, leaving a leaner network behind. By training the model before the pruning, and removing some steps typically found in the pruning process, the team was able to both speed up pruning and improve the accuracy of the final model.
The researchers also took care to ensure that they were not trimming away any layers of the network that contained important statistical information that was specific to any individual contributor. This step helped to ensure that the model would be more personalized for each individual user, even though it was produced with data from many users.
Validating their methods in simulations, it was found that model size was reduced by nearly an order of magnitude. In one case, a 45 megabyte model was reduced to just 5 megabytes in size. Despite this reduction in size, model accuracies were shown to have actually improved. Accuracies were routinely observed to have improved by more than 10%. As a next step, the team would like to extend the work they have done in supervised learning to work with unsupervised algorithms. By taking a holistic approach that improved multiple measures at once, they have given themselves a good chance to produce a technique that will have real world applicability.