Interactive visualizations of AI architectures.
See how machines learn, one layer at a time.
Gradient-Based Learning Applied to Document Recognition - The CNN "Hello World" that started the deep learning revolution, used in banks for check recognition.
Deep Residual Learning for Image Recognition - The breakthrough architecture enabling 100+ layer networks through skip connections.
Scalable Diffusion Models with Transformers - Replacing U-Nets with transformers for state-of-the-art image generation.
Bidirectional Encoder Representations from Transformers - Deep bidirectional pre-training that revolutionized NLP.
Attention Is All You Need - The groundbreaking architecture that replaced RNNs with self-attention mechanisms.
Generative Adversarial Networks - The revolutionary framework where two neural networks compete to generate realistic data.
Auto-Encoding Variational Bayes - The probabilistic approach to deep generative modeling that learns structured latent representations.
An Image is Worth 16x16 Words - Applying transformers to image recognition with patch-based sequences.
Learning Transferable Visual Models From Natural Language Supervision - Zero-shot image classification using natural language descriptions.
High-Resolution Image Synthesis with Latent Diffusion Models - Democratizing AI image generation on consumer hardware.
The seminal review by LeCun, Bengio & Hinton that established deep learning as a revolutionary field in AI - 50,000+ citations.
Open and Efficient Foundation Language Models - The breakthrough open-source LLM that democratized access to powerful language models.
Very Deep Convolutional Networks for Large-Scale Image Recognition - Small 3x3 filters, deep architectures, and the simplicity that outperformed complex designs.
You Only Look Once - Unified, Real-Time Object Detection. The paradigm shift that made real-time object detection practical.
Going Deeper with Convolutions - Multi-scale feature extraction with inception modules achieving state-of-the-art with fewer parameters.
Adaptive computation with sparsely-gated expert networks. The foundation for scaling large language models efficiently.
Dynamic Routing Between Capsules - Hinton's revolutionary architecture using vector outputs and dynamic routing to solve CNN's spatial information loss.
Language Models are Few-Shot Learners - 175 billion parameters enabling powerful few-shot learning without fine-tuning.
Large Multimodal Model - Human-level performance on professional exams including passing the bar in the top 10%.
Efficient Convolutional Neural Networks for Mobile Vision - Depthwise separable convolutions enabling real-time mobile AI.
ImageNet Classification with Deep Convolutional Neural Networks - The paper that ignited the deep learning revolution with 8 layers and 60M parameters.
Densely Connected Convolutional Networks - Feature reuse through dense connections, maximum information flow with fewer parameters.
Rethinking Model Scaling for CNNs - Compound scaling method that uniformly scales depth, width, and resolution.
Improving Language Understanding by Generative Pre-Training - The original Generative Pre-Training that started the GPT revolution.
Language Models are Unsupervised Multitask Learners - 1.5B parameters and zero-shot task transfer without fine-tuning.
Long Short-Term Memory - The groundbreaking architecture that solved the vanishing gradient problem for sequential data.
Linear-Time Sequence Modeling with Selective State Spaces - A new paradigm challenging Transformers with linear complexity.
More classic papers will be added hourly by automated updates. Stay tuned for detailed explanations with visualizations.
How computers "see" images. Learn about convolution kernels, feature maps, pooling, and how hierarchical patterns emerge.
Learning from connected data. Discover message passing, neighborhood aggregation, and how graphs become embeddings.
Where attention meets graphs. Learn how GAT assigns different importance to neighbors, creating adaptive message passing.