Neural networks have made significant leaps in the image and natural language processing (NLP) recently. They’ve not only learned to recognise, localise and segment images; they’re now able to effectively translate natural language and answer complex questions. One of the precursors to such massive progress was the introduction of Seq2Seq and neural attention models – enabling neural networks to become more selective about the data they’re working with at any given time.
The core focus of the neural attention mechanism is to learn to recognise where to find important information. Here’s an example of a neural machine translation:
- The words from the input sentence are fed into the encoder to deliver the sentence meaning; the so-called ‘thought vector’.
- Based on this vector, the decoder produces words one by one to create the output sentence.
- Throughout this process, the attention mechanism helps the decoder focus on different fragments of the input sentence.
- Sepp Hochreiter and Jurgen Schmidhuber’s 1997 creation of the LSTM (long short term memory) neural cell. This presented the opportunity to work with relatively long sequences, using a machine learning paradigm.
- The realisation of sequence-to-sequence (Sutskever et al., 2014, Cho et al., 2014), based on LSTM. The concept being to “eat” part of a sequence and “return” another.
- The creation of the ‘attention mechanism’, first introduced by Bahdanau et al., 2015.
Comments