Attention Is All You Need
摘要
The Transformer model, originally proposed for sequence transduction tasks, has since become the dominant architecture for many NLP tasks. Its key innovation is the self-attention mechanism, which allows the model to attend to different parts of the input sequence when processing each position. This paper presents the Transformer model, which is based on self-attention and position-wise feedforward networks, and demonstrates its effectiveness on a variety of NLP tasks.