GPT-4 Sees the Big Picture
The next big thing in LLMs has arrived today with OpenAI's GPT-4 that accepts both text and image inputs, and offers improved performance.
Unless you have been living under a rock for the past few months, you are at the very least aware of ChatGPT and the shockwaves it has sent through the world of machine learning and beyond. This language model has been well received for its generally very good ability to answer questions asked in plain language, with a response that sounds like it was written by a human. Since the initial release of the model, many have even begun to question what the future of long-established technologies, like internet search engines, will look like in the years to come.
Today, ChatGPT is powered by a large language model in the GPT-3 family — GPT-3.5 to be specific. But today, OpenAI announced the release of the next major version of their GPT models with GPT-4, which promises to improve on the successes of GPT-3.5 in many ways. OpenAI claims that this model is better equipped for solving difficult problems with better accuracy, and that it has a broader knowledge base to draw from. Moreover, GPT-4 understands more than just text, it can also “see” by examining images that are provided by the user along with their text prompt.
Sounds good, but how does GPT-4 measure up against GPT-3.5? Testing showed that this latest iteration of the model is capable of passing a simulated bar exam with a score in the top 10% of all test takers. In contrast, GPT-3.5 scored in the bottom 10% on the same exam. It is in complex reasoning skills, like those required in taking the bar exam, that GPT-4 really shines. For more casual conversations, the differences between the models will be more subtle, and perhaps even difficult to detect.
The vision capabilities are probably the most eye-catching feature of this new release. Users can now include images inline with text prompts and the model will assess the content of both before responding with text. It has been shown to be capable of, for example, describing in detail what is present in an image. However, the image input capability is still only available as a research preview, and the exact use cases that this function will excel at are still being discovered.
In addition to the normal user prompts, the new interface also allows for the specification of system messages. These messages let the user steer the direction of a chat session, and specify things like the style that the chatbot should respond in. This can provide for significant customization of the user’s experience.
While many improvements have been made, the model is still certainly not perfect. It is not fully reliable, and can still confidently invent facts that are untrue. Notably, GPT-4 is more reliable in this way than GPT-3.5, so progress has been made in the right direction.
One of the more controversial features of ChatGPT is that OpenAI has intentionally built certain bounds into the model that control what it can and cannot say. OpenAI maintains that this is to make the chatbot safer and less biased. Opponents of this move hold that this only serves to inject the biases of OpenAI into their products and introduce cumbersome limits on free speech. GPT-4 is no different in this regard, with these same types of behavior modifications having been put in place.
GPT-4 is being rolled out via the same ChatGPT interface that we have come to know. Subscribers to ChatGPT Plus already have limited access to the new model. For those without a subscription, you will have to get on the waiting list for access. The image input capability is also restricted at this time, with only a single partner helping to test this capability so far.