The attention mechanism allows us to merge a variable-length sequence of vectors into a fixed-size context vector. What if we could use this mechanism to entirely replace recurrence for sequential modeling? This blog post covers the Transformer architecture which explores such an approach.| Jeremy Jordan
In this post, I'll discuss a third type of neural networks, recurrent neural networks, for learning from sequential data. For some classes of data, the order in which we receive observations is important. As an example, consider the two following sentences:| Jeremy Jordan