Codon Compiles Your Python Programs — and Can Make Them a Hundred Times Faster
Covering a healthy subset of Python proper, Codon aims to boost its use in performance-critical scientific computing and machine learning.
Researchers from the Massachusetts Institute of Technology and the University of Victoria have released a compiler dubbed Codon that, they say, can make a Python program run at the speed of of a C program — or even faster, with domain-specific optimizations.
"We realized that people don’t necessarily want to learn a new language, or a new tool, especially those who are non-technical. So we thought, let’s take Python syntax, semantics, and libraries and incorporate them into a new system built from the ground up," explains lead author Ariya Shajii of the project. "The user simply writes Python like they’re used to, without having to worry about data types or performance, which we handle automatically — and the result is that their code runs 10 to 100 times faster than regular Python."
Traditionally, Python is an interpreted language: its programs are written as text and saved as a script, which is then executed line-by-line by a Python interpreter — in contrast with compiled languages like C and C++, where the textual source code is fed through a compiler and turned into a binary executable. Codon, though, takes the Python script and compiles it — like a C compiler — into native machine code, running anything from 10 to 100 times faster than when interpreted.
The team behind Codon is positioning it more as a spin-off of Python, and admits that it's not completely compatible with vanilla Python — though it does cover a sizeable subset of the language. "We took more of a bottom-up approach, where we implemented everything from the ground up, which came with limitations, but a lot more flexibility," Shajii explains.
"So, for example, we can’t support certain dynamic features, but we can play with optimizations and other static compilation techniques that you couldn’t do starting with the standard Python implementation. That was the key difference — not much effort had been put into a bottom-up approach, where large parts of the Python infrastructure are built from scratch."
"Python is the language of choice for domain experts that are not programming experts. If they write a program that gets popular, and many people start using it and run larger and larger datasets, then the lack of performance of Python becomes a critical barrier to success," says Saman Amarasinghe, MIT professor of electrical engineering and computer science, of Codon's potential.
"Instead of needing to rewrite the program using a C-implemented library like NumPy or totally rewrite in a language like C, Codon can use the same Python implementation and give the same performance you'll get by rewriting in C. Thus, I believe Codon is the easiest path forward for successful Python applications that have hit a limit due to lack of performance."
In addition to compilation, Codon has a few other tricks up its sleeve — including native multi-threading and natural parallelization, which allows the compiled programs to easily exploit modern multi-core processors or general-purpose GPU (GPGPU) offload for even larger speed boosts. "Codon is already being used commercially," Shajii says, "in fields like quantitative finance, bioinformatics, and deep learning."
The paper introducing Codon has been published under open-access terms in the Proceedings of the 32nd ACM SIGPLAN International Conference of Compiler Construction, with the source code and binary releases published on GitHub under the Business Source License 1.1 by Exaloop — a startup founded by Codon authors.