论文
- ImageNet Classification with Deep Convolutional Neural Networks
This paper introduced AlexNet, a deep convolutional neural network that won the ImageNet 2012 challenge with a significant margin. The model achieved a top-5 error rate of 15.3% compared to 26.1% for the runner-up, marking the beginning of the deep learning revolution in computer vision.
- Attention Is All You Need
The Transformer model, originally proposed for sequence transduction tasks, has since become the dominant architecture for many NLP tasks. Its key innovation is the self-attention mechanism, which allows the model to attend to different parts of the input sequence when processing each position. This paper presents the Transformer model, which is based on self-attention and position-wise feedforward networks, and demonstrates its effectiveness on a variety of NLP tasks.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
This paper introduced BERT (Bidirectional Encoder Representations from Transformers), a pre-trained bidirectional transformer model that achieved state-of-the-art results on eleven NLP tasks. BERT's bidirectional training and fine-tuning approach revolutionized natural language processing.
- Learning representations by back-propagating errors
This paper introduced the backpropagation algorithm for training multi-layer neural networks. The algorithm solved the credit assignment problem by efficiently computing gradients through the chain rule, enabling the training of deep networks and marking the beginning of the neural network renaissance.
- Language Models are Few-Shot Learners
This paper introduced GPT-3, a 175-billion parameter language model that demonstrated remarkable few-shot learning capabilities across a wide range of tasks without task-specific fine-tuning. GPT-3 showed that scaling up language models can lead to emergent abilities and general-purpose AI capabilities.
- Perceptrons: An Introduction to Computational Geometry
This influential book provided a rigorous mathematical analysis of perceptrons, demonstrating their limitations in solving certain problems like the XOR function. While initially seen as a critique that slowed neural network research, it actually laid important theoretical foundations and highlighted the need for multi-layer networks.
- Language Models are Few-Shot Learners
We show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.
- Improving Language Understanding by Generative Pre-Training
This paper introduced the first Generative Pre-trained Transformer (GPT-1), demonstrating that unsupervised pre-training on a large corpus followed by supervised fine-tuning could achieve strong performance on various NLP tasks. This work established the foundation for the GPT series.
- Support-Vector Networks
This paper introduced Support Vector Machines (SVMs), a powerful supervised learning algorithm for classification and regression. SVMs find the optimal hyperplane that separates classes with maximum margin, and can handle non-linear problems through the kernel trick.
- Deep Residual Learning for Image Recognition
This paper introduced Residual Networks (ResNet), which solved the degradation problem in very deep networks through skip connections. ResNet enabled training of networks with 152 layers and achieved 3.57% error on ImageNet, winning the ILSVRC 2015 classification task.
- Computing Machinery and Intelligence
This seminal paper introduced the famous 'Turing Test' as a criterion for machine intelligence. Turing proposed that a machine could be considered intelligent if it could engage in conversations that were indistinguishable from those of a human. This work laid the philosophical foundation for artificial intelligence and remains one of the most influential papers in the field.