🧠 Neural Networks

Interactive visualizations of AI architectures.
See how machines learn, one layer at a time.

📚 Classic Papers

Proceedings of IEEE 1998

LeNet-5

Gradient-Based Learning Applied to Document Recognition - The CNN "Hello World" that started the deep learning revolution, used in banks for check recognition.

32×32 C1 S2 C3 FC 10
CVPR 2016 Best Paper

ResNet

Deep Residual Learning for Image Recognition - The breakthrough architecture enabling 100+ layer networks through skip connections.

Input Conv Conv Skip + Output 152 Layers
ICCV 2023

DiT · Diffusion Transformers

Scalable Diffusion Models with Transformers - Replacing U-Nets with transformers for state-of-the-art image generation.

DiT IMG
NAACL 2019

BERT

Bidirectional Encoder Representations from Transformers - Deep bidirectional pre-training that revolutionized NLP.

[CLS] The [MASK] cat predict: "cute"
NeurIPS 2017

Transformer

Attention Is All You Need - The groundbreaking architecture that replaced RNNs with self-attention mechanisms.

Input Output
NeurIPS 2014

GAN

Generative Adversarial Networks - The revolutionary framework where two neural networks compete to generate realistic data.

Generator Discriminator Real Data
ICLR 2014

VAE

Auto-Encoding Variational Bayes - The probabilistic approach to deep generative modeling that learns structured latent representations.

Encoder Latent Z Decoder Reconstruction
ICLR 2021

Vision Transformer

An Image is Worth 16x16 Words - Applying transformers to image recognition with patch-based sequences.

Image → Patches Transformer Encoder Class
ICML 2021

CLIP

Learning Transferable Visual Models From Natural Language Supervision - Zero-shot image classification using natural language descriptions.

Image Text Joint Space Zero-shot
CVPR 2022

Stable Diffusion · LDM

High-Resolution Image Synthesis with Latent Diffusion Models - Democratizing AI image generation on consumer hardware.

Latent Space UNet Denoiser Cross-Attn Image
Nature 2015 · Turing Award

Deep Learning

The seminal review by LeCun, Bengio & Hinton that established deep learning as a revolutionary field in AI - 50,000+ citations.

In L1 L2 L3 Out 🏆
Foundation LLM 2023

LLaMA

Open and Efficient Foundation Language Models - The breakthrough open-source LLM that democratized access to powerful language models.

Token Transformer RMSNorm RoPE Next Token
ICLR 2015

VGG

Very Deep Convolutional Networks for Large-Scale Image Recognition - Small 3x3 filters, deep architectures, and the simplicity that outperformed complex designs.

Pool FC
CVPR 2016

YOLO

You Only Look Once - Unified, Real-Time Object Detection. The paradigm shift that made real-time object detection practical.

Image S×S Grid Boxes + Classes 45 FPS
CVPR 2015

GoogLeNet · Inception

Going Deeper with Convolutions - Multi-scale feature extraction with inception modules achieving state-of-the-art with fewer parameters.

Inception Module Class
NeurIPS 2017

MoE · Mixture of Experts

Adaptive computation with sparsely-gated expert networks. The foundation for scaling large language models efficiently.

Gate Expert1 Expert2 Output
NIPS 2017

CapsNet · Capsule Networks

Dynamic Routing Between Capsules - Hinton's revolutionary architecture using vector outputs and dynamic routing to solve CNN's spatial information loss.

0 1 2 1
NeurIPS 2020

GPT-3

Language Models are Few-Shot Learners - 175 billion parameters enabling powerful few-shot learning without fine-tuning.

Text+ Examples 175B GPT-3 96 Layers Few-Shot Output
arXiv 2023

GPT-4

Large Multimodal Model - Human-level performance on professional exams including passing the bar in the top 10%.

Text Input Image Input GPT-4 Multimodal Human-Level Reasoning
arXiv 2017

MobileNet · Efficient CNNs

Efficient Convolutional Neural Networks for Mobile Vision - Depthwise separable convolutions enabling real-time mobile AI.

224² DW PW 13× DS Blocks 4.2M params
NeurIPS 2012

AlexNet

ImageNet Classification with Deep Convolutional Neural Networks - The paper that ignited the deep learning revolution with 8 layers and 60M parameters.

1000
CVPR 2017 Best Paper

DenseNet

Densely Connected Convolutional Networks - Feature reuse through dense connections, maximum information flow with fewer parameters.

Dense Block L=12 Feature Reuse
ICML 2019

EfficientNet

Rethinking Model Scaling for CNNs - Compound scaling method that uniformly scales depth, width, and resolution.

B0 ↑ Depth ↑ Width ↑ Resolution B1 B7
OpenAI 2018

GPT-1

Improving Language Understanding by Generative Pre-Training - The original Generative Pre-Training that started the GPT revolution.

Pre-train Unsupervised Fine-tune Supervised Transformer Decoder 117M params
OpenAI 2019

GPT-2

Language Models are Unsupervised Multitask Learners - 1.5B parameters and zero-shot task transfer without fine-tuning.

Prompt GPT-2 1.5B Params 48 Layers Summarize Translate QA
Neural Computation 1997

LSTM

Long Short-Term Memory - The groundbreaking architecture that solved the vanishing gradient problem for sequential data.

LSTM Cell f i o Forget · Input · Output Cell State
ICML 2024

Mamba · SSM

Linear-Time Sequence Modeling with Selective State Spaces - A new paradigm challenging Transformers with linear complexity.

Mamba Selective SSM O(L) complexity
Coming Soon

NeRF · StyleGAN · T5 · More

More classic papers will be added hourly by automated updates. Stay tuned for detailed explanations with visualizations.

🕐 Auto-updating...

🧠 Network Architectures

Computer Vision

CNN 卷积神经网络

How computers "see" images. Learn about convolution kernels, feature maps, pooling, and how hierarchical patterns emerge.

Graph Learning

GNN 图神经网络

Learning from connected data. Discover message passing, neighborhood aggregation, and how graphs become embeddings.

Attention Mechanism

GAT 图注意力网络

Where attention meets graphs. Learn how GAT assigns different importance to neighbors, creating adaptive message passing.