In this blog post, we'll discuss a key innovation in sequence-to-sequence model architectures: the attention mechanism. This architecture innovation dramatically improved model performance for sequence-to-sequence tasks such as machine translation and text summarization. Moreover, the success of this attention mechanism led to the seminal paper, "Attention Is All You Need"| Jeremy Jordan
In this post, I'll discuss a third type of neural networks, recurrent neural networks, for learning from sequential data. For some classes of data, the order in which we receive observations is important. As an example, consider the two following sentences:| Jeremy Jordan