Progressive Growing of GANs

An Insanely Useful Method for Training Generative Models

The Core Insight

What if instead of throwing your GAN at the hardest version of the problem immediately, you started with the simplest version and gradually increased complexity? That's progressive growing: train at 4x4 pixels first, then 8x8, then 16x16, doubling resolution until you reach your target. The network learns large-scale structure before sweating the fine details.

Why Training Big GANs Is Hard

Try training a GAN to generate 1024x1024 images from scratch. You'll quickly discover why most people don't:

Unstable gradients: The generator must simultaneously learn coarse structure and fine detail across a massive parameter space. The optimization landscape is brutal.
Discriminator dominance: At high resolution, even tiny imperfections are easy to spot. The discriminator overwhelms the generator before it learns anything meaningful.
Computational waste: Early training where the network is learning basic blobs? Wasted on full-resolution pixel shuffling.
Mode collapse: The simultaneous learning of all scales increases the likelihood the generator collapses to producing narrow, repetitive outputs.

Progressive growing sidesteps all of this by decomposing the problem into a series of simpler sub-problems.

How Progressive Growing Works

The Original Paper

Progressive Growing of GANs for Improved Quality, Stability, and Variation by Karras, Aila, Laine, and Lehtinen (NVIDIA, 2017). This paper produced the first photorealistic 1024x1024 synthetic faces and launched the lineage that led to StyleGAN, StyleGAN2, and much of modern image synthesis.

The training progression follows a resolution ladder. Both the generator and discriminator grow in lockstep:

Phase 1

4x4

→

Phase 2

8x8

→

Phase 3

16x16

→

Phase 4

32x32

→

...

1024

At each phase, new layers are added to both networks. The generator grows at its output end (producing higher-resolution images), while the discriminator grows at its input end (accepting higher-resolution images). Critically, existing layers keep their learned weights.

Phase	Resolution	What the Network Learns
1	4x4	Overall color, basic blob structure
2-3	8x8 - 16x16	Rough spatial layout, coarse shapes
4-5	32x32 - 64x64	Recognizable features, basic textures
6-7	128x128 - 256x256	Fine details, complex textures
8-9	512x512 - 1024x1024	Highest-fidelity micro-details

The Fade-In: Smooth Transitions

You can't just bolt on a new layer and expect things to work. The new layer's random weights would destroy what the network already learned. The solution: fade in new layers smoothly using a blending parameter α that transitions from 0 to 1.

output = (1 - α) × upsampled_old + α × new_layer

Alpha Fade-In Progress

α = 0.0

Old only

α = 0.25

Blending

α = 0.5

Half & half

α = 0.75

Blending

α = 1.0

New only

When α = 0, the output comes entirely from the upsampled previous resolution. As training progresses, α linearly increases to 1, at which point the new layer has fully taken over. The old shortcut path is then removed. Each resolution phase consists of a transition period (fade-in) followed by a stabilization period (training at the new resolution with α fixed at 1.0).

Supporting Techniques

Progressive growing doesn't work in isolation. Several auxiliary techniques keep training stable:

⚖️

Equalized Learning Rate

Instead of careful weight initialization, all weights start from N(0,1) and are scaled at runtime by the He constant. This ensures all parameters have the same dynamic range regardless of layer depth, preventing some layers from learning faster than others.

📏

Pixelwise Feature Normalization

Applied after each generator convolution, this normalizes the feature vector at each pixel to unit length. Prevents signal magnitudes from spiraling out of control during the adversarial feedback loop.

📊

Minibatch Standard Deviation

A layer near the end of the discriminator that computes the statistical variation across the minibatch and concatenates it as an extra feature channel. If the generator produces low-diversity outputs, this statistical signature gives it away.

📐

WGAN-GP Loss

Wasserstein distance with gradient penalty provides stable training and a meaningful loss metric. The gradient penalty enforces a Lipschitz constraint by penalizing gradient norms that deviate from 1.

Case Study: The Abominable SnowGAN

snowGAN

A progressive GAN built to generate synthetic images of magnified snowpack. Trained on real winter snowpack photography collected in the Colorado Rocky Mountains, it produces realistic 1024x1024 RGB images of snow crystal formations.

The snowGAN project applies progressive growing to an unusual domain: snow crystal imagery. Here's how the technique maps to a real implementation:

Architecture

Generator

Input: 100-dim latent vector
Initial projection to 16x16 feature map
Transposed convolution stack with configurable filters
Fade-in blocks for smooth resolution transitions
Final activation: tanh (output range [-1, 1])
Output: 1024x1024 RGB images

Discriminator

Input: 1024x1024 RGB images
Conv2D blocks with LeakyReLU
Wasserstein output (no activation)
Gradient penalty for stability
Optional dual-head inference mode

The Fade-In Implementation

snowGAN's fade-in implementation splits the generator output into two paths during resolution transitions:

Generator during fade-in (e.g., 16x16 → 32x32): ┌─────────────────────────┐ │ Existing 16x16 block │ └────────────┬─────────────┘ │ ┌────────────────┼────────────────┐ │ │ │ ┌───────▼───────┐ ┌─────▼──────┐ │ │ Upsample to │ │ New 32x32 │ │ │ 32x32 + │ │ conv block │ │ │ old toRGB │ │ + new toRGB│ │ └───────┬───────┘ └─────┬──────┘ │ │ │ │ ▼ ▼ │ (1 - alpha) * alpha * │ prev_output curr_output │ │ │ │ └───────┬────────┘ │ ▼ │ Blended output

The blending alpha is computed from the current training step:

alpha = clamp(current_step / total_fade_steps, 0.0, 1.0)

Training Data

snowGAN trains on the Rocky Mountain Snowpack Dataset (~2,341 samples from the 2024-2025 Colorado winter season). Each sample includes magnified crystal images, snowpack profiles, and core segments, along with metadata like GPS coordinates, temperatures, slope angles, and avalanche observations.

With only ~2,341 samples, this is a challenging dataset for GANs. Progressive growing helps enormously here because the early low-resolution phases can learn meaningful structure from small datasets, where a full-resolution GAN would struggle with such limited data.

Beyond Generation: Dual-Head Inference

An interesting twist: snowGAN repurposes its trained discriminator as a feature extractor. The discriminator's output layer is stripped and replaced with two classification heads:

Avalanche Count

21 classes predicting the number of avalanches observed at the sampling site. Leverages the discriminator's learned snow crystal representations.

Wind Loading

4 classes (high/medium/low/none) predicting wind loading conditions. The discriminator already learned to distinguish crystal structures; those features transfer to safety assessment.

This is a clever application of transfer learning: the adversarial training process forces the discriminator to develop rich feature representations of snow crystals, which can then be repurposed for downstream tasks without training a new model from scratch.

Why Progressive Growing Matters

⚡

2-6x Faster

Early phases on small images are cheap. Computational effort is concentrated where it matters most.

🎯

Higher Quality

The first method to produce photorealistic 1024x1024 faces. Structure before detail is a winning strategy.

🔧

More Stable

The adversarial game stays balanced at every stage. No resolution shock, no mode collapse spiral.

The broader lesson extends well beyond GANs: curriculum learning works. Start simple, increase complexity gradually, and preserve what you've already learned. Progressive growing proved this for image generation, and the principle has influenced training strategies across generative modeling, including some diffusion model approaches.

The Legacy: ProGAN to StyleGAN

Progressive growing didn't just produce great images. It spawned a lineage:

ProGAN (2017): The original progressive growing paper. First photorealistic 1024x1024 synthesis.
StyleGAN (2019): Built directly on ProGAN's architecture. Added style-based generation with AdaIN. Disentangled latent spaces.
StyleGAN2 (2020): Refined further, actually removing progressive growing in favor of other improvements, but the core insights remained foundational.
StyleGAN3 (2021): Addressed aliasing artifacts. The lineage continued.

This ProGAN → StyleGAN lineage essentially defined the trajectory of photorealistic image synthesis and influenced commercial tools like Adobe Firefly, Midjourney, and others.

Ramblings snowGAN on GitHub