Microsoft's Highly-Scalable MMLSpark Machine Learning Library Gets a Relaunch as SynapseML

Rebranded library is now ready for production, the company claims, and comes with a wealth of features — including full ONNX support.

Microsoft has launched a big update for its machine learning library MMLSpark, now known as SynapseML, which it claims simplifies making ML pipelines which can scale to thousands of worker machines — and is providing a selection of pre-built models to get users started.

"Today, we’re excited to announce the release of SynapseML (previously MMLSpark), an open source library that simplifies the creation of massively scalable machine learning (ML) pipelines," says software engineer Mark Hamilton. "Building production-ready distributed ML pipelines can be difficult, even for the most seasoned developer.

"Composing tools from different ecosystems often requires considerable 'glue' code, and many frameworks aren’t designed with thousand-machine elastic clusters in mind. SynapseML resolves this challenge by unifying several existing ML frameworks and new Microsoft algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java."

Having been polished for production over the past five years, SynapseML is built with parallel processing and scalability in mind — with Microsoft, naturally, hoping that users will pick its Azure cloud platform to act as an elastic compute cluster for their projects.

"Developers who use Azure Synapse Analytics will be pleased to learn that SynapseML is now generally available on this service with enterprise support," Hamilton explains. "They can now build large-scale ML pipelines using Azure Cognitive Services, LightGBM, ONNX, and other selected SynapseML features."

At the same time, Microsoft has confirmed that SynapseML includes the ability to embed more than 45 "state-of-the-art ML services" into their systems, including conversation transcription, translation, form recognition, and more, using pre-built models to avoid the need for a large labelled training dataset. The library can also handle models from other machine learning ecosystems, providing they're compatible with the Open Neural Network Exchange (ONNX) framework, and comes with tools designed to explain the behavior of AI systems.

"We hope developers and others who build production-ready scalable ML systems find that SynapseML simplifies the process," says Hamilton. "SynapseML standardizes a variety of ML frameworks, such as those mentioned in this blog post, to enable new classes of ML systems that compose pieces from different ML ecosystems.

"Our goal is to free developers from the hassle of worrying about the distributed implementation details and enable them to deploy them into a variety of databases, clusters, and languages without needing to change their code."

More information is available on the SynapseML website, while the source code has been published to GitHub under the permissive MIT License.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire:
Latest articles
Sponsored articles
Related articles
Latest articles
Read more
Related articles