Cracking the Code

Natural language embedded programs improve LLMs by generating Python code to provide precise answers where they may otherwise struggle.

Computers have traditionally been explicitly programmed to do each task that is asked of them. A human programmer devises the set of logical rules that are needed to accomplish a goal, then they encode them in a programming language that instructs the computer to carry them out. This is done in painstaking detail, and as a result (in an ideal case, anyway) the operation of the computer is perfectly well defined, and we know exactly what we can expect of the software.

We do not want undefined or unexpected behavior from our computer systems, so this paradigm has served us well. But as time went by, we began asking our computers to do ever more complicated things. Building a spreadsheet or video game is one thing, but how could we explicitly program software that recognizes a specific object in an image, for example? Can you imagine the nightmare of an if-then block that that would entail?

Answering a question accurately by generating a Python program (📷: T. Zhang et al.)

For cases such as these, we have turned to machine learning. Broadly speaking, these techniques enable computers to program themselves. We provide examples, then they generate the algorithms. Machine learning has been hugely successful in recent years, but we have also seen that it lacks the precision of explicitly programmed software. Sure, we have excellent object detectors, natural language processors, and image generators, but along with them we get hallucinations and other inaccuracies.

It would seem that we can hardly do without either machine learning or explicit programming, so a team led by researchers at MIT decided to develop a system that leverages the best of both worlds. Their approach, called natural language embedded programs (NLEPs), utilizes modern generative artificial intelligence models to do what they do best, and also to write their own computer code to provide precise answers to questions that they are likely to be uncertain about.

Specifically, the team is using large language models (LLMs), such as those that power Meta AI’s Llama, to parse a user’s prompts and answer their questions. When the user’s questions get into a gray area where the algorithm may be inaccurate, the LLM is instructed to write a Python program to answer them. The output of the program is then fed back into the LLM so that the results can be reported to the user in natural language. From the user’s perspective, the generation and execution of code is completely transparent.

The generated programs often outperform LLM-only performance (📷: T. Zhang et al.)

Using this novel approach, an LLM can be leveraged to provide more accurate answers in areas where they typically struggle, such as in math, data analysis, and symbolic reasoning. The generated source code also provides developers with insights into the operation of the model that can assist them in understanding its reasoning and also in improving and fine-tuning the system.

When looking at a variety of symbolic reasoning, instruction-following, and text classification tasks, the researchers found that NLEPs achieve a better than 90 percent level of accuracy. This was a significant boost in performance over standard LLMs lacking NLEP capabilities. The technique also helps to avoid the need to retrain a model for a specific task, which can be very costly.

This is a relatively simple method to implement, so it could be very useful in a wide range of use cases. It was noted, however, that NLEP does not work well on smaller models that were trained on more limited datasets, so it does require the use of large LLMs, which can be expensive to operate. The team is hoping to address this present issue, however, and bring NLEPs to even small LLMs in the near future.

machine learning

artificial intelligence

python

Nick Bild

R&D, creativity, and building the next big thing you never knew you wanted are my specialties.

Cracking the Code

Natural language embedded programs improve LLMs by generating Python code to provide precise answers where they may otherwise struggle.

Latest articles

Sponsored articles

Related articles

Latest articles

Related articles