论文

ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton · 2012年01月01日

This paper introduced AlexNet, a deep convolutional neural network that won the ImageNet 2012 challenge with a significant margin. The model achieved a top-5 error rate of 15.3% compared to 26.1% for the runner-up, marking the beginning of the deep learning revolution in computer vision.

📄 查看论文
Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin · 2017年06月12日

The Transformer model, originally proposed for sequence transduction tasks, has since become the dominant architecture for many NLP tasks. Its key innovation is the self-attention mechanism, which allows the model to attend to different parts of the input sequence when processing each position. This paper presents the Transformer model, which is based on self-attention and position-wise feedforward networks, and demonstrates its effectiveness on a variety of NLP tasks.

📄 查看论文
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova · 2018年01月01日

This paper introduced BERT (Bidirectional Encoder Representations from Transformers), a pre-trained bidirectional transformer model that achieved state-of-the-art results on eleven NLP tasks. BERT's bidirectional training and fine-tuning approach revolutionized natural language processing.

📄 查看论文
Learning representations by back-propagating errors
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams · 1986年01月01日

This paper introduced the backpropagation algorithm for training multi-layer neural networks. The algorithm solved the credit assignment problem by efficiently computing gradients through the chain rule, enabling the training of deep networks and marking the beginning of the neural network renaissance.

📄 查看论文
Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei · 2020年01月01日

This paper introduced GPT-3, a 175-billion parameter language model that demonstrated remarkable few-shot learning capabilities across a wide range of tasks without task-specific fine-tuning. GPT-3 showed that scaling up language models can lead to emergent abilities and general-purpose AI capabilities.

📄 查看论文
Perceptrons: An Introduction to Computational Geometry
Marvin Minsky, Seymour Papert · 1969年01月01日

This influential book provided a rigorous mathematical analysis of perceptrons, demonstrating their limitations in solving certain problems like the XOR function. While initially seen as a critique that slowed neural network research, it actually laid important theoretical foundations and highlighted the need for multi-layer networks.

📄 查看论文
Language Models are Few-Shot Learners
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei · 2020年05月28日

We show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.

📄 查看论文
Improving Language Understanding by Generative Pre-Training
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever · 2018年01月01日

This paper introduced the first Generative Pre-trained Transformer (GPT-1), demonstrating that unsupervised pre-training on a large corpus followed by supervised fine-tuning could achieve strong performance on various NLP tasks. This work established the foundation for the GPT series.

📄 查看论文
Support-Vector Networks
Corinna Cortes, Vladimir Vapnik · 1995年01月01日

This paper introduced Support Vector Machines (SVMs), a powerful supervised learning algorithm for classification and regression. SVMs find the optimal hyperplane that separates classes with maximum margin, and can handle non-linear problems through the kernel trick.

📄 查看论文
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun · 2015年01月01日

This paper introduced Residual Networks (ResNet), which solved the degradation problem in very deep networks through skip connections. ResNet enabled training of networks with 152 layers and achieved 3.57% error on ImageNet, winning the ILSVRC 2015 classification task.

📄 查看论文
Computing Machinery and Intelligence
Alan Turing · 1950年01月01日

This seminal paper introduced the famous 'Turing Test' as a criterion for machine intelligence. Turing proposed that a machine could be considered intelligent if it could engage in conversations that were indistinguishable from those of a human. This work laid the philosophical foundation for artificial intelligence and remains one of the most influential papers in the field.

📄 查看论文