In this post, I would like to share my experience about how I applied transfer learning to Donkey car project and tp explain the reason why transfer learning is a good machine learning technique, although not suitable for our self-driving Donkey car
How transfer learning(TL) works for our problemTransfer learning is a very good way to let our model gain the knowledge from our models which solve similar problem. In our case, we can take our self-driving model as a model that can identify objects related with driving(lines, roads, pedestrian etc.) and predict how the driver will behave while he/she is driving. There are so many pre-trained Image recognization models available in the world. Those model are trained in much larger dataset and has been well-tuned comparing to our small model. This means that those pre-trained models might have better performance and pretentious?? to generalize more complicated problems.
The approach of applying transfer learning to our problem is to use those pre-trained Image recognization models to help us identify driving related features/objects. Then our own model is only responsible for predicting the throttle value and the angle of the wheel based on the input of those pre-trained model.
There's a lot people sharing their models and weight for different problem. You can find it from Github or other open source project website. If you are using Keras or Tensorflow, it provides some pre-trained models for you to use without loading from other places. You can get more information from here.
ImplementationTransfer learning sounds like a very advanced topic but the implementation is very simple. We only need to take first N layers from pre-trained model to connect with our real model. By this way, instead of using raw images as our input, our model can use pre-trained ones to extract related features for us based on the knowledge from pre-trained model. The value N is a parameter that needs to be tuned based on different problem context.
Implement a Transfer learning algorithm takes the following steps:
- Load pre-trained model
- Chose the layers
- Set each layer as trainable or not-trainable
- Concatenate pre-trained layer and real model together
Load pre-trained model:
In this example, we use VGG-16 as pre-trained model:
model = applications.VGG16(weights = "imagenet", include_top=False, \
input_shape = (img_width, img_height, 3))
print(len(model.layers))
print(model.summary())
output:
Choose the layers:
Get the first N layers that you want, you can choose by slicing out the layers you want :
layers = model.layers[:10]
In Keras, you can specify include_top=False
to get rid of all dense layers, so in Donkey project, I didn't manually delete any layers.
Set each layer as trainable or not-trainable
We only need to go through each layer and set is as not trainable. In the following, I have set first 10 layers as not trainable.
for layer in model.layers[:10]:
layer.trainable = False
Concatenate?? pre-trained layer and real model together
The following code shows how to add a simple dense layer for a pre-trained model:
x = model.output
x = Flatten()(x)
x = Dense(48, activation='relu')(x)
predictions = Dense(10, activation="softmax")(x)
Corecode for Donkey car:
def default_linear(freeze_layer=16):
img_in = Input(shape=(120, 160, 3), name='img_in')
vgg = VGG16(weights = "imagenet", include_top=False, input_shape = (120, 160, 3))
x = vgg.output
print(len(vgg.layers))
x = Flatten(name='flattened')(x)
#x = Dense(units=1024, activation='linear')(x)
#x = Dropout(rate=.1)(x)
x = Dense(units=256, activation='linear')(x)
x = Dropout(rate=.1)(x)
x = Dense(units=48, activation='linear')(x)
x = Dropout(rate=.1)(x)
# categorical output of the angle
angle_out = Dense(units=1, activation='linear', name='angle_out')(x)
# continous output of throttle
throttle_out = Dense(units=1, activation='linear', name='throttle_out')(x)
model = Model(inputs=[vgg.input], outputs=[angle_out, throttle_out])
for layer in model.layers[:freeze_layer]:
layer.trainable = False
print(model.summary())
model.compile(optimizer='adam',
loss={'angle_out': 'mean_squared_error',
'throttle_out': 'mean_squared_error'},
loss_weights={'angle_out': 0.5, 'throttle_out': .5})
return model
More explanation: Understand more about each neuronA better way to understand why transfer learning works is to understand what each layer is doing in the deep neural network system. I highly recommend to use THIS PAPER as reference. In this section, I will give more explanation based on Previous PAPER and This PAPER.
In a very general way, each layer is responsible to abstract some features based on input of this layer (output of previous layer). For example, a first layer of a face recognition model might detect some very basic line or cure as its job. In the next layer, the model might recognize some high level features. At the last layer, the feature can be eyes, ears or other high level features. By combining all these features together, the model can predict there's a face in the image.
Instead of visualizing the deep network by layer, we would like to interpret the function computed by each individual neuron. If we find any of those neuron has been activated, then we can do deconvolution (backprop) to find out what things activate those neurons. Based on this idea, we can get a image map to see what can make each neuron activated visually. The following example is from Krizhevsky et al, 2012. It perfectly shows us how each neuron works.
As you can see, for lower layers, neuron can recognize( which means neuron will be activated by some pixels in the original image) low level features, such as lines or some basic colors. Going up to higher level, we can tell that the model can recognize high level features. At the end, it can recognize features like human beings.
By understanding what exactly happened in the hidden layers, we can get the real reason why transfer learning works and how it works.
The benefit of using TL and TL limitation for Donkey car projectAs mentioned before, the best part of Transfer Learning is knowledge sharing. We can use a little bit effort to get high level features by using other models. It does not reduce the time to design a complicated model, but it can also improve the performance of the model and the ability to generalize other untrained scenarios.
But unfortunately, it didn't work well in our donkey car project. Transfer Learning model usually has super big size(200MB for upper model) and millions, even billions parameters. It requires a relatively big computing power if you want get the result in a very short time. Our Donkey car use raspberry pi as a computing unit. It takes 2s to make decision for 1 frame. It ca not realize real time prediction. So evenwhen it looks extremely good in the test environment. It really does not work in our real project. But for a real auto-driving car, the calculation time is probably not a problem.
Some lessons we can learn from this experiment: before implementing anything in the real project, we need to think about the capability of the hardware & software environment as well. We should not simply take any problem as a math or machine learning issue.
Reference[1] Understanding Neural Networks Through Deep Visualization
[2] ImageNet Classification with Deep Convolutional Neural Networks
Comments