Recurrent Neural Networks
- RNN is a type of neural network that processes sequential data by maintaining a mem of previous inputs
- its different because it uses training data to learn but they are different from their mem as they take information from prior inputs to influcense the current IO
- regular Deep LN assume that IO are independent the output of RNN depends on prior elements within the sequence
- RNN allow the network to remember the past information by feeding the output from one setp into the next step helping the network understand the context of what has already happened and make better predictions
- RNNs are decribed as Blackbox with an internal state that is updated as its sequence is processed
Key Features:
- Internal Memory: This is a key feature allowing them to remember past inputs and use that as context when processing new ifnormation
- best when used with sequential data where the order of elements matter, they are the model of choice when working with variable length inputs so good for NLP
- RNNs can analyze the current input in relation to what they've seen before
- they can continuously update their internal memeory as they process new data
- fundamental unit of RNN is called a Recurrent Unit which holds a hidden state that maintains information about the previous inputs, these units remember info by feeding back their hidden state
- Hidden state (h) is calculated for every X to retain sequential dependencies, the current state is ht, which depends on ht-1 and the current input is xt, the calculation involves weight matries like U, W, Whh, Wxh and bias b and non linear activation functions like ReLU
- Y or yt is the calculated output based on the hidden state using a activation function
- Unrolling: RNN unfolding is the process of expanding the recurrent structure over time representing each step as a layer in a series, this makes it easier to look at how the hidden state propogates. Despite the unrolled struct, the same function and same set of params are used every time step
Training RNNs
- RNNs are trained by using Backpropogation Through Time (BPTT) this extends the standard BP algorithm by unrolling the network over time and computing gradients for each time step
- the loss function of RNN is often computed at each step and aggregated over the entire sequence to guide param adjustment
Issues:
- RNNs suffer from vanishing and exploding gradient problems, because gradients are repeatedly multiplier through many time steps during backpropogation
- Vanishing gadient become exceedingly small as they propogate backward if values are less than 1 hindering the network's ability to capture long term deps and causing slow learning
- Exploding gradient: Gradients increase exponentially if the values are greater than 1 leading to very large weigth updates
- RNNs cannot process very long sequences effectively.
- The sequential nature of RNNs prohibit parallelization
- Context is computed only from the past
- there is no distinction between short and long range dependencies
Types of RNNs:
- One to One: single input single output used for normal classification like image
- One to Many: single input sequence output, useful when one input trigger a sequence of predictions like music generation or image catptions
- Many to One: Sequence input, single output , used when overall context of an input sequence is needed for one prediction such as sentiment analysis
- Many to Many: sequence input sequence output both inputs and outputs are sequences can be direct or delayed
Comparison to FeedForward Neural Network (FNNs):
- FFNs process data in one direction without retaining information from past input
- this makes them good at independent input tasks like image classification
- FNNs are bad with sequential data as they lack mem
- RNNs solve this by incorporating a bunch of feedback loops allowing them to remember prior inputs which is ideal for context matters