14.18 LSTM
LSTM
- long short term memory is a RNN architecture that is widely used in Deep Learning.
- it is a variation that solves issues like vanishing and exploding gradient with RNN
- LSTM excels at capturing long term deps making it ideal for sequence prediction tasks
- LSTM uses feedback connections allowing it to process entire sequences of data not just data points
- its effective for understanding and predicting pattterns in sequential data like time series , text and speech.
- LSTM recurrent units tries to remember all the past knowledge that the network has seen so far and to forget irrelevant data
Purpose
- it is literally made to be a better RNN:
- RNN cannot do parallellized
- context is only computed from history
- there is no distinction between short and long term mem, mem is just memory
- training is tricky
- suffer from vanishing gradient and exploding gradient
- cannot process very long sequences effectively
- LSTM was developed to handle long and range dependencies better than RNNs and solve the gradient issues LSTMs solve this by introducing different activation function layers called "gates" for different purposes
LSTM Architecture:
-
at a high level LSTM works like RNN cell but with internal functioning divided into three parts
- Choosing whether information from previous timestamp is to be remembered or forgotten
- Learning new information from current input
- Passing the updated info to the next timestamp
-
LSTM architecture has a chain struct that contains four neural networks and different memory blocks called cells
-
info retained by these cells are manipulated using gates
-
there are three gates:
-
forget gate: determines what info to be removed
-
input gate : add useful info to the currrent cell state
-
output gate: controls what info is output from mem cell and extracts useful information from current cell state
-
Bidirectional LSTM: a variation of standard LSTM that processes sequential data in both forward and backward directions
-
these are made of actually 2 LSTM one that goes front and one that goes back but their outputs are connected
-
they have state of the art performance in tasks like machine translation speech recognition and text summarization
Applications of LSTM:
- Language modeling learn dependencies between words and generate coherent and grammatically correct sentences
- Speech recognition used for transcribing text and interpreting spoken commands
- Time series forcasting: used to predict stock prices weather and energy consumption
- Anomaly Detection: used for detecting fraud or network intrusions
- Recommender System: used in tasks like suggesting movies, music and books by learning user behavior patterns