14.17 RNN

Recurrent Neural Networks

RNN is a type of neural network that processes sequential data by maintaining a mem of previous inputs
its different because it uses training data to learn but they are different from their mem as they take information from prior inputs to influcense the current IO
regular Deep LN assume that IO are independent the output of RNN depends on prior elements within the sequence
RNN allow the network to remember the past information by feeding the output from one setp into the next step helping the network understand the context of what has already happened and make better predictions
RNNs are decribed as Blackbox with an internal state that is updated as its sequence is processed

Internal Memory: This is a key feature allowing them to remember past inputs and use that as context when processing new ifnormation
best when used with sequential data where the order of elements matter, they are the model of choice when working with variable length inputs so good for NLP
RNNs can analyze the current input in relation to what they've seen before
they can continuously update their internal memeory as they process new data
fundamental unit of RNN is called a Recurrent Unit which holds a hidden state that maintains information about the previous inputs, these units remember info by feeding back their hidden state
Hidden state (h) is calculated for every X to retain sequential dependencies, the current state is ht, which depends on ht-1 and the current input is xt, the calculation involves weight matries like U, W, Whh, Wxh and bias b and non linear activation functions like ReLU
Y or yt is the calculated output based on the hidden state using a activation function
Unrolling: RNN unfolding is the process of expanding the recurrent structure over time representing each step as a layer in a series, this makes it easier to look at how the hidden state propogates. Despite the unrolled struct, the same function and same set of params are used every time step

RNNs are trained by using Backpropogation Through Time (BPTT) this extends the standard BP algorithm by unrolling the network over time and computing gradients for each time step
the loss function of RNN is often computed at each step and aggregated over the entire sequence to guide param adjustment

RNNs suffer from vanishing and exploding gradient problems, because gradients are repeatedly multiplier through many time steps during backpropogation
Vanishing gadient become exceedingly small as they propogate backward if values are less than 1 hindering the network's ability to capture long term deps and causing slow learning
Exploding gradient: Gradients increase exponentially if the values are greater than 1 leading to very large weigth updates
RNNs cannot process very long sequences effectively.
The sequential nature of RNNs prohibit parallelization
Context is computed only from the past
there is no distinction between short and long range dependencies

One to One: single input single output used for normal classification like image
One to Many: single input sequence output, useful when one input trigger a sequence of predictions like music generation or image catptions
Many to One: Sequence input, single output , used when overall context of an input sequence is needed for one prediction such as sentiment analysis
Many to Many: sequence input sequence output both inputs and outputs are sequences can be direct or delayed

FFNs process data in one direction without retaining information from past input
this makes them good at independent input tasks like image classification
FNNs are bad with sequential data as they lack mem
RNNs solve this by incorporating a bunch of feedback loops allowing them to remember prior inputs which is ideal for context matters