One of the biggest problem of autonomous driving Donkey Car is computational power of the hardware. For instance, Donkey Car is using Raspberry pi3 as the core platform, which provides 6.2 Gflops when solving systems of linear algebra problem. It's barely enough for some simple machine learning application. Our Donkey Car project has 9 frames in each second by using default autopilot model which contains 2 CNN layers and 3 Dense layers. If we apply a customized model with 2 3d CNN layers and 3 Dense layers, the system takes like 3 seconds to process one image.
Since this is a very common problem for a lot of AI application, there's a lot of company doing research of improving the capability of edging device, such as Intel's Openvino platform and Nvidia's Jetson platform. This post will give a brief introduction of the idea of "AI at the edge" and use Intel's VPU(Vision process Unit) to apply this technology in our autonomous driving project.
AI at the EdgeCloud computing has been widely using my different industries since 2010, since it is very efficient and scalable. However, it is not always applicable for AI applications, especially for automotive and robotic industries. Imagine your are driving a self-driving car in the downtown, your car is keeping feeding information to the cloud and meanwhile your car is getting instructions from the cloud server. If suddenly the vehicle lose internet signal, this is not an acceptable situation for the people seating in the car. Implementing the AI application at the edge is a very challenging task as well as most of edging devices are not capable enough to run complex machine learning algorithm.
We can improve the performance in the edging device from both hardware side and software side. From the software perspective, such as Inference engining can help improve the a lot for speeding up the model; using some GPU API also can be very useful to control GPU to make the process time shorter. From the hardware perspective, there's a lot of hardware accelerations, such as GPU, VPU, and DL Accelerator.
If you are interested in AI at edge, Nvidia's Jetson AGX Xavier is one of the most mature solution at the time. It's been widely used by robotics and automotive industries nowadays.
Intel's Vision Process UnitIntel Movidius Neural Compute Stick is a low power mode device that specially designed for processing vision related input. It can provide 1 trillion flops for image processing task which is have enough capability for running medium level machine learning task at the edge device. It's integrated as a part of Intel's Openvino platform and it supports almost all common machine learning architecture, like CNN, LTSM etc.
Using Intel Movidius Neural Compute Stick is easy, it contains following 3 steps:
1. Model Optimaztion
2. Model Transformation
3. Model Inference
In next section, I'm going to use Donkey Car project as an example, to use intel Movidius Neural Compute Stick to implement a complex 3D-CNN model.
Applying Neuron Compute Stick to Donkey Car Project1. Model Preparation and Optimization
In order to use Intel's Compute Stick, we need to transfer our model to the format that can be recognized by in Inference engine. Openvino Inference engine supports multiple models transformation tools from TF model, Caffe model etc. Unfortunately, Donkey Car's machine learning models are based on Keras, which currently is not supported by Openvino Inference engine. so the first step is transfer the Keras model to TF model format.
Import dependencies:
from tensorflow.python.keras.models import load_model
import tensorflow as tf
from tensorflow.python.keras import backend as K
Load your Keras Model:
model = load_model('0507_a')
Transform Keras to Tensorflow model
def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
"""
Freezes the state of a session into a pruned computation graph.
Creates a new computation graph where variable nodes are replaced by
constants taking their current value in the session. The new graph will be
pruned so subgraphs that are not necessary to compute the requested
outputs are removed.
@param session The TensorFlow session to be frozen.
@param keep_var_names A list of variable names that should not be frozen,
or None to freeze all the variables in the graph.
@param output_names Names of the relevant graph outputs.
@param clear_devices Remove the device directives from the graph for better portability.
@return The frozen graph definition.
"""
from tensorflow.python.framework.graph_util import convert_variables_to_constants
graph = session.graph
with graph.as_default():
freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
output_names = output_names or []
output_names += [v.op.name for v in tf.global_variables()]
# Graph -> GraphDef ProtoBuf
input_graph_def = graph.as_graph_def()
if clear_devices:
for node in input_graph_def.node:
node.device = ""
frozen_graph = convert_variables_to_constants(session, input_graph_def,
output_names, freeze_var_names)
return frozen_graph
frozen_graph = freeze_session(K.get_session(),
output_names=[out.op.name for out in model.outputs])
Save model
tf.train.write_graph(frozen_graph, "model", "tf_model.pb", as_text=False)
2. Model Transformation
Model transformation is aimed to change our model to the format that can be recognized by compute stick. Meanwhile, this transformation helps to improve the model time efficiency by following 5 actions:
•Change Weight Precision
•Optimize Layer Structure
•Auto Tuning Kernels
•Auto GPU Execution Planning
•Memory Optimization
For model transformation, we need to do as follows:
1. Install Openvino
You can find all installation related documents from Here.
2. Model Transformation
Go to the <INSTALL_DIR>/deployment_tools/model_optimizer directory
Run the mo_tf.py script with the path to the checkpoint file to convert a model:
python3 mo_tf.py
--input_model xxx.pb \
--output_dir path/to/your/output \
--data_type FP32 \
--device CPU/GPU/MYRIAD \
--batch 1 \ will/use/default/input/size
For our case, it will be like as follows:
3. Model Inference
Next, we need to implement our model to the edge device by using Intel VPU. the first step should be commenting out the default model from Donkey Car manage.py:
#kl = KerasLinear()
#if model_path:
# kl.load(model_path)
#V.add(kl,
# inputs=['cam/image_arrays'],
# outputs=['pilot/angle', 'pilot/throttle'],
# run_condition='run_pilot')
Then we need to create the NCS class, it includes two parts: init and run. we can use Openvino API or Opencv machine learning API to create NCS class:
# Using Openvino API
from armv7l.openvino.inference_engine import IENetwork,IEPlugin
#Part to save multiple image arrays from camera
class NCS:
def __init__(self):
#hardware
plugin = IEPlugin(device = "MYRIAD")
#model
net = IENetwork.from_ir(model="/home/pi/mycar/ncs/3dcnn/tf_model.xml",weights="/home/pi/mycar/ncs/3dcnn/tf_model.bin")
exec_net = plugin.load(network=net)
def run(self, image):
res = exec_net.infer(inputs={input_blob: processedImg})
print(res)
return res
#using Openvino
class NCS:
def __init__(self):
self.net = cv2.dnn.readNet("/home/pi/mycar/ncs/2dcnn/tf_model.xml","/home/pi/mycar/ncs/2dcnn/tf_model.bin")
self.net.setPreferableTarget(cv2.dnn.DNN_TARGET_MYRIAD)
layer_names = self.net.getLayerNames()
self.output_layer = [layer_names[i[0]-1] for i in self.net.getUnconnectedOutLayers()]
def run(self, image):
image = cv2.resize(image,(320,240))
blob = cv2.dnn.blobFromImage(image, size=(320,240), ddepth=cv2.CV_32F)
self.net.setInput(blob)
res = self.net.forward(self.output_layer)
return res[0], res[1]
Both Openvino API and OpenCV machine learning API are working well based on my experiment. After creating NCS class, we need to create a NCS object and add it to the main pipeline:
ncs = NCS()
V.add(ncs, inputs=['cam/image_array'],
outputs=['pilot/angle', 'pilot/throttle'])
4. Code:
After finishing above process, we finally implement our model to our NCS stick! I have attached code to the code section and you also can find the code from GitHub.
Comments