whatlaunched

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin | 2017年06月12日

摘要

The Transformer model, originally proposed for sequence transduction tasks, has since become the dominant architecture for many NLP tasks. Its key innovation is the self-attention mechanism, which allows the model to attend to different parts of the input sequence when processing each position. This paper presents the Transformer model, which is based on self-attention and position-wise feedforward networks, and demonstrates its effectiveness on a variety of NLP tasks.

📄 论文链接