Accelerate AI: Introducing OpenVINO™ 2023.0
Learn what to expect from OpenVINO™ 2023.0 in this article breaking down new features you can use in your designs.
This year, we are celebrating OpenVINO™’s 5-year anniversary with a brand-new release of the AI inferencing toolkit: OpenVINO 2023.0. This latest release comes with a range of new features and capabilities designed to empower developers to achieve more by making it easier than ever to deploy and accelerate AI workloads.
But before we get into everything you need to know about this release, we’d like to take a moment to express our deepest appreciation for you, our developer community. It has been an incredible journey thus far, and we could not have done it without you. It is with your continued loyalty and support of OpenVINO over the past five years that we’ve been able to reach more than 1 million downloads.
Thank you for being part of this community. We look forward to seeing all the amazing things you do with this new release.
Now, what to expect in 2023.0!
The Top Features
The 2023.0 version of OpenVINO was designed to improve the developer journey through minimizing offline conversions, broadening model support, and advancing hardware optimizations. Below are the top highlights AI developers need to know about this release, but you can find the full release notes here.
ModelSelection: These new features minimize code changes for AI developers to allow them to adopt, maintain, and align code better with deep learning frameworks.
- New TensorFlow integration: To simplify the workflow from training to deployment of TensorFlow models.
- Now on Conda Forge: To provide easier OpenVINO Runtime access for C++ developers who prefer Conda.
- Broader processor support: ARM processor support now includes OpenVINO CPU inferencing, including dynamic shapes, full processor performance and broad sample code/notebook tutorial coverage.
- Extended Python support: Added support for Python 3.11 for more potential performance improvements.
Optimize: With these next set of additions, AI developers can now optimize and deploy with ease across more models including NLP, as well as access more AI acceleration with new hardware feature capabilities.
- Broader model support: Support for generative AI models, text processing models, transformer models, etc. is now available.
- Dynamic shapes support on GPU: This means there is no need to reshape models to static shapes when leveraging GPU, providing more flexibility in coding — especially for NLP models.
- NNCF as the quantization tool of choice: We’ve combined post-training quantization (POT) into Neural Network Compression Framework (NNCF) to make it easier to add tremendous performance improvements through model compression.
Deploy: These capabilities are designed to give solutions an immediate performance boost with automatic device discovery, load balancing and dynamic inference parallelism across CPU, GPU and more.
- Thread scheduling in CPU plugin: AI developers can now optimize for performance or power saving by running inference on E-cores, P-cores, or both for 12th Gen Intel® Core™ CPU and up.
- Default inference precision: By defaulting to different formats, we are able to provide optimal performance on CPU and GPU.
- Extension of Model caching: To reduce the first inference latency for both GPU and CPU.
Explore OpenVINO’s 2023.0’s latest capabilities
Let’s take a deeper look into some of these new features introduced above and what exactly they mean for AI developers:
New TensorFlow Experience
OpenVINO 2023.0 enables TensorFlow developers to move from training to deployment more easily.
With this feature, there is no longer a need to convert TensorFlow format model files to OpenVINO IR format offline — it happens automatically at runtime.
Now, developers can start experimenting with the — use_new_frontend option passed to Model Optimizer or the Model Conversion API to enjoy improved conversion time for limited scope models or load a standard TensorFlow model directly in OpenVINO Runtime or OpenVINO Model Server for deployment. (Currently, it supports the SavedModel format and binary frozen format .pb. ) We recommend leveraging OpenVINO Runtime for even more performance benefits with model compression but now developers have options based on their needs.
The following diagram shows a simple example:
Broader Model Support
With new model support additions, AI developers can find extended model support for generative AI models, such as Stable Diffusion 2.0 (Figure 2), Stable diffusion with ControlNet (Figure 3), text processing models, and transformer models such as CLIP, BLIP, S-BERT, GPT-J, etc., Detectron2, Paddle Slim, Segment Anything Model (SAM) (Figure 4), YOLOv8, RNN-T, and more.
Default inference precision
The latest update also includes a significant improvement in the performance of the inference on various devices, which now operate in high-performance mode by default. This means that for GPU devices, FP16 inference is used, while CPU devices use BF16 inference if available (Figure 5). Previously, users had to convert IR to FP16 themselves to enable GPU execution in FP16 mode. Now, all devices can select default inference precision automatically, and this selection is disconnected from IR precision. In the rare event that high-performance mode impacts accuracy, users can adjust the inference precision hint.
Additionally, developers can now control the IR precision separately. By default, we recommend setting it to FP16 to reduce the model size by 2x for floating-point models. It is important to note that IR precision does not affect how devices execute the model but serves to compress the model by reducing the weight precision.
Neural Network Compression Framework or NNCF is the quantization tool of choice
Previously, OpenVINO had separate tools for post training optimization (POT) and quantization aware training. We’ve combined both methods into NNCF. This helps to reduce the model size, memory footprint and latency, as well as improve the computational efficiency.
NNCF provides a suite of advanced algorithms for Neural Networks inference optimization in OpenVINO with minimal accuracy drop. It is designed to work with models from PyTorch, TensorFlow, ONNX and OpenVINO (Figure 6).
The post-training quantization algorithm takes samples from the representative dataset, inputs them into the network, and calibrates the network based on the resulting weights and activation values. Once calibration is complete, values in the network are converted to 8-bit integer format. The basic POT quantization flow in NNCF is the simplest way to apply 8-bit quantization to the model:
- Set up an environment and install dependencies.
pip install nncf- Prepare the calibration dataset
import nncf
calibration_loader = torch.utils.data.DataLoader(...)
def transform_fn(data_item):
images, _ = data_item
return images
calibration_dataset = nncf.Dataset(calibration_loader, transform_fn)- Run to get a quantized model
model = … #OpenVINO/ONNX/PyTorch/TF object
quantized_model = nncf.quantize(model, calibration_datasetTutorials on how to use NNCF for model quantization and compression can be found here, of which we have validated applying post-training quantization to YOLOv5 model with little accuracy drop (Figure 7).
Thread scheduling on Intel 12th Gen Core and up
Improve multi-thread scheduling for Intel® platform.
With the new ov::hint::scheduling_core_type property, performance or power saving could be configured by choosing to run inference on {ov::hint::SchedulingCoreType::ANY_CORE, ov::hint::SchedulingCoreType::PCORE_ONLY, ov::hint::SchedulingCoreType::ECORE_ONLY}, for Intel® 12th Gen CORE CPU and up, HYBRID platform.
By setting the ov::hint::enable_hyper_threading property to “True”, both physical and logical cores could be enabled on P-cores for Intel® platform.
Another new property is ov::hint::enable_cpu_pinning. In default, ov::hint::enable_cpu_pinning is set to “True”, which means multiple threads for running inference requests of multiple deep learning models will be scheduled by OpenVINO Runtime (TBB). In this mode, inference of one deep learning model with multiple threads will be treated as an overall graph, of which each thread will be bound to a CPU processor without cache missing and additional overhead. However, in the case of simultaneously running inference for two neural networks, multiple threads of different inference requests could be scheduled on the same CPU processors, leading to computation competition on the same processor resources (as shown in Figure 9).
To avoid CPU processor resource competition, we can disable the processor binding properties by setting ov::hint::enable_cpu_pinning to “False” and let the operation system schedule the processor resource for each thread of the network. In this mode, the inference on different layers of the same deep learning model could be switched across different processors, resulting in cache missing and additional overhead (as shown in Figure 10 & 11). Developers can decide whether to enable CPU pinning according to their validation results.
Upgrade to OpenVINO™ 2023.0
With these latest features, OpenVINO aims to get the most out of your AI application from start to finish. With your continued support, we can produce valuable upgrades for AI developers everywhere. And with its smart and comprehensive capabilities, it can be like having your very own performance engineer by your side.
But enough about what OpenVINO can do for you, try it out for yourself and upgrade using the following command:
pip install — upgrade openvino-devBe sure to check all your dependencies because the upgrade may update other packages beyond OpenVINO. If you wish to install the C/C++ API, pull a pre-built Docker image or download from another repository, visit the download page to find a package that suits your needs. If you are looking for model serving instructions, check out the new documentation.
Additional Resources
Provide Feedback & Report Issues
Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
A special thanks to everyone who participated in this blog:
Zhuo Wu, Ethan Yang, Adrian Boguszewski, Anisha Udayakumar, Yiwei Lee, Stephanie Maluso, Raymond Lo, Ryan Loney, Ansley Dunn, Wanglei Shen