We find mentors for ourselves, so that we don’t need to learn things the hard way. They help making the learning process smoother and faster.
Transfer learning is the same, but it is for machine learning techniques. Transfer learning helps to transfer learned parameters or weights (or we can say knowledge) to different algorithms as inputs or engineered features.
Apart from mentorship metaphor, we, humans use transfer learning in our day to day tasks.
Let’s say, you know how to drive a bike and planning to learn driving a car or vice versa. Would you learn every aspect of driving a different vehicle from the beginning? The answer is no.
You already know about driving, you know about gear shifting, accelerator, break and many other things. You use those learnings to learn to drive a different vehicle.
The same can be the case with cooking. You learned to cook a few recipes and based on the knowledge gained you try to learn new recipes. You will find endless examples like these to understand transfer learning.
Since we gained understanding on how transfer learning was inspired, let’s go through its definition:
Transfer learning and domain adaptation refer to the situation where what has been learned in one setting … is exploited to improve generalization in another setting.
This definition is taken from the deep learning textbook by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
In layman terms, it is a technique to use the output of a model developed for a task as an input for another model to carry out the related task.
Transfer learning is a very efficient technique to solve real world problems. It addresses many business problems.
- Generally, deep learning works better when provided with more volumes of training data, but with transfer learning, you can get a good result with limited dataset. This is because deep learning algorithms try to learn low level and high-level features from data, but in the case of transfer learning, the model has already learned those features from another dataset.
- In the case of supervised learning, we need tons of labelled data, but with transfer learning, it will work better even with limited labelled data.
- Transfer learning speeds up the learning process and reduces the training time, and thus decreases PoC and project delivery time.
Let’s consider an example. If you want to build an object detection system, what would you do?
Traditionally, you would go for data collection and labelling, but for your model to work properly, you need to collect lots of data depending on the number of objects to detect along with various scenarios. With transfer learning, you need, relatively, very less data to build a good model.
For an object detection task, if you choose to use transfer learning, instead of traditional deep learning, it makes life a lot less easy. Here, you can choose AlexNet as your base model to train on. AlexNet contains 8 layers, first 5 layers are for convolutional operations, followed by max-pooling operations, and last 3 are fully connected layers. Initial layers detects low level features like edges, while last fully connected layers combines low level feature to detect object level features.
You can remove those three fully connected layers from AlexNet pre-trained model and instead add fresh 3 or more layers based on your requirements. Also, do not forget to freeze the weights of first five convolutional layers. Now train your model with the limited data you have. The model result will be far better than the traditional ML technique.
Here is the snapshot taken from MathWorks documentation.
What is weight freezing?
Weight freezing is a method where you don’t allow weights to get updated while backpropagation cycles. Such weights don’t change throughout the training while trainable weights get updated after each batch of training.
There are three ways to use transfer learning to speed the development of machine learning applications for businesses. Below is the brief about each one of them.
1. Using pre-trained models
Complex neural network architectures involve hundreds of millions of parameters to be optimized and can take weeks or a month with multi GPU systems for training. Deep learning is backed by a very generous community who believe in keeping their pre-trained models as open source.
RELATED BLOG
The pre-trained model is built for a particular task and if you want to carry the same task, you can use it directly without any modification.
Also, AlexNet is trained over a million images and able to classify 1000 different objects. If your project is about classifying a few of those classes, you can directly use AlexNet as your classification model.
2. Using the pre-trained model as a feature extractor
In this approach, you take a pre-trained model and remove the last few layers based on your task objective and the model with all other layers can work as a feature extractor for your task.
Deep learning models try to learn different features at each layer and here the idea is to retain those features for further learning for a similar or different task. Initially, we discussed AlexNet model with last 3 layers removed, falls under this method.
Here, after freezing initial layers and adding more layers according to the need, you can use your dataset to train that model for the task.
A very good example is the neural style transfer technique. The idea here is to use the style from one image and content from another to build a new image which is the mix of both. The algorithm takes feature representation from pre-trained VGG-16 model to obtain the mix of style and content of two different images. Below is the example:
3. Fine tuning a pre-trained model
While building a machine learning model, you initialize neuron weights randomly. With transfer learning, instead of initializing the model with random weights, we use weights from a pre-trained model as initializers and then you can train the model on your dataset. This approach helps to adapt the weights from previous learning and makes training faster.
For this approach, you may or may not remove the last few layers based on your requirements. Also, note that you do not freeze weights from the pre-trained model but you fine tune them.
A slight variation of this approach is to freeze weight from initial layers and weights from other layers can be trained.
Let’s say you are building a face recognition system which identifies whether a person should be given door access or not. Here, you can take a pre-trained Siamese network as an initializer and then train it on your dataset. Siamese is very good at face recognition and it can adopt weights based on new face dataset.
Whether to use transfer learning or how to use transfer learning is very problem specific and depends on many things such as availability of pre-trained models, the domain of a problem, type of dataset, etc. But it can help for sure in case of limited data, training time, and effectiveness of a model.
Integrating transfer learning into your project can help you deliver your project faster. On the other hand, it has its own set of challenges such as which pre-trained model to use, how much data is enough and more.
eInfochips’ machine learning experts have implemented transfer learning on image, text, and video data and have expertise in addressing all transfer learning related challenges. If you are trying to figure out how transfer learning can be integrated into your project, contact our machine learning and AI team.