The Consortium for Python Data API Standards Aims to Fix Fragmentation, Make Python Data Work Easier
Led by Quansight, the Consortium is looking to create a central standard for arrays — tensors — in Python data work.
Quansight Labs, in partnership with Intel, Microsoft, Google Research, and other community contributors, has launched an initiative to standardize Python data application programming interfaces: the Consortium for Python Data API Standards.
"Over the past few years, Python has exploded in popularity for data science, machine learning, deep learning and numerical computing. New frameworks pushing forward the state of the art in these fields are appearing every year," Quansight's Ralf Gommers explains. "One unintended consequence of all this activity and creativity has been fragmentation in the fundamental building blocks — multidimensional array (tensor) and dataframe libraries — that underpin the whole Python data ecosystem."
"Today, we are announcing the Consortium for Python Data API Standards, which aims to tackle this fragmentation by developing API standards for arrays (a.k.a. tensors) and dataframes. We aim to grow this Consortium into an organization where cross-project and cross-ecosystem alignment on APIs, data exchange mechanisms and other such topics happens. These topics require coordination and communication to a much larger extent than they require technical innovation. We aim to facilitate the former, while leaving the innovating to current and future individual libraries."
The Consortium was founded with a view to involving the community in any decision making which takes place: The Consortium has already been assembled, formed a working group, and created drafts of an API standard — but it's this draft which is being placed out for community approval and input, before being released as a formal Request for Comments (RFC) with a view to producing a finished v1.0 standard release.
"Such a gradual RFC process is a bit of an experiment," Gommers admits. "Community projects like NumPy and Pandas aren’t used to this; however, it’s similar to successful models in other communities (e.g. the Open Geospatial Consortium, or C++ standardization) and we think the breadth of projects involved and complexity of the challenge makes this the most promising and likely to succeed approach. The approach will certainly evolve over time though, based on experience and feedback from the many stakeholders."
More information on the Consortium and its goals, along with information on how to get involved, can be found on the announcement post; a GitHub repository has also been set up to provide a central point for discussion and the raising of issues along with the Python Record API module for data collection.
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.