Production Ready • VFX Pipeline Integration

Bit-Depth Expansion Network

Deep learning model to convert 8-bit log-encoded PNG images to 16-bit log-encoded EXR files with debanding and gradient smoothing. Exceeded all target metrics with PSNR: 57.84 dB, SSIM: 0.9989.

Production Ready (Test Phase)

Project Overview

Convert 8-bit log-encoded PNG images to 16-bit log-encoded EXR files while reducing banding artifacts, smoothing color gradients, preserving image details, and maintaining log colorspace integrity for professional VFX workflows.

Latest Achievements (November 2025)

✓Exceeded all target metrics (PSNR: 57.84 dB, SSIM: 0.9989, Banding: 0.0033)

✓Eliminated checkerboard artifacts using bilinear upsampling

✓Full training, evaluation, and inference pipelines operational

✓Progressive fine-tuning pipeline with 5-phase workflow

✓Modern training optimizations (BF16, gradient checkpointing)

✓Advanced loss functions (color consistency, FFT frequency)

Two-Stage Workflow

Stage 1:Neural network converts 8-bit log PNG → 16-bit log EXR

Stage 2:OCIO converts 16-bit log EXR → 16-bit linear EXR (deterministic, no ML)

Technical Specifications

Input/Output

Input Format: 8-bit log PNG
Output Format: 16-bit log EXR
Resolution: 1536×864
Channels: 3 (RGB)
Colorspace: Log (all ops)

Framework & Tools

Framework: PyTorch
Image I/O: OpenImageIO
Training Mode: Full resolution
Mixed Precision: BF16
GPU Target: RTX 5090 32GB

Model Architecture

U-Net with Residual Learning

Architecture Details

•4 downsampling blocks (64→128→256→512→1024 channels)
•4 upsampling blocks with skip connections
•Bilinear interpolation (eliminates checkerboard artifacts)
•Residual addition (input + network output)
•~28M parameters (bilinear) / ~31M (transposed conv)

Key Features

✓Gradient checkpointing (~30% less VRAM)
✓BF16 mixed precision (~2x faster training)
✓Gradient accumulation for larger effective batch
✓torch.compile() support (Linux only)
✓Early stopping with learning rate scheduling

Combined Loss Function

Core Losses

L1 Loss - Pixel-wise accuracyλ = 1.0

Gradient Loss - Smooth gradients, debandingλ = 0.5-1.5

SSIM Loss - Structural similarityλ = 0.2

LPIPS Loss - Perceptual qualityλ = 0.05-0.1

Advanced Losses (Research-Based)

Color Consistency - Prevents green/purple artifactsλ = 0.2

FFT Frequency - Targets subtle bandingλ = 0.05

Multi-Scale Gradient - Captures banding at scalesλ = 0.1

Laplacian Smoothness - Smooth gradient regionsλ = 0.1

Progressive Fine-Tuning Pipeline

5-phase fine-tuning workflow for optimal results, progressively introducing advanced loss functions:

Phase	Loss Functions	Learning Rate	Epochs
1 - Baseline	L1 + Gradient + SSIM	0.00005	100-200
2 - Perceptual	+ LPIPS	0.00002	50
3 - Color	+ Color Consistency	0.00001	30
4 - Frequency	+ FFT + Multi-scale	0.000005	30
5 - Final Polish	+ Laplacian (all balanced)	0.000002	50

Training Time

Small dataset (58 images)~10-15 min

Full dataset (8,000 images)~10-15 hours

Training speed (BF16)~2.4 it/s

Typical convergence60-75 epochs

Hardware Requirements

Recommended GPURTX 5090 32GB

Minimum GPU12GB VRAM

RAM32GB (16GB min)

Storage500GB SSD

Production Workflow

End-to-End Pipeline

AI Image Generation

Flux/SDXL generates 8-bit sRGB output

↓

Bit-depth Expansion

U-Net model predicts 16/32-bit float values

↓

Color Space Conversion

sRGB → Linear/ACEScg transformation

↓

EXR Export

16-bit half-float OpenEXR with metadata

↓

Nuke Compositing

Professional VFX integration ready

Challenges & Solutions

Checkerboard Artifacts in Output

Replaced transposed convolutions with bilinear interpolation + standard convolutions for upsampling in the decoder.

Impact: Completely eliminated checkerboard patterns in generated EXR files

Color Banding in Smooth Gradients

Implemented gradient loss (λ=1.5) and multi-scale gradient loss to teach the model to create smoother interpolations in sky/gradient areas.

Impact: Reduced banding score to 0.0033 (67% better than target)

Green/Purple Line Artifacts

Added color consistency loss (λ=0.2) based on BitNet research to ensure RGB channels have consistent gradients.

Impact: Eliminated colored line artifacts in gradient regions

Horizontal Scanline Patterns

Modified EXR reading to extract RGB channels only, avoiding alpha channel corruption through log/sRGB conversion.

Impact: Clean dataset preparation without horizontal line artifacts

Training Instability with Multiple Losses

Developed 5-phase progressive fine-tuning workflow, introducing loss functions gradually with decreasing learning rates.

Impact: Stable convergence with all advanced loss functions enabled

Memory Constraints (32GB VRAM)

Implemented gradient checkpointing (~30% VRAM reduction), BF16 mixed precision, and gradient accumulation for larger effective batch sizes.

Impact: Full resolution training (1536×864) with batch size 2 on RTX 5090

Performance Results

Production-Ready Performance

57.84 dB

PSNR

+19.84 dB over target (38 dB)

0.9989

SSIM

Nearly perfect (target: 0.95)

0.0033

Banding Score

67% better than target (0.01)

Key Achievements

✓All performance targets exceeded by significant margins
✓Checkerboard artifacts eliminated (bilinear upsampling)
✓Production-ready quality for VFX pipelines
✓Fast inference suitable for batch processing

Inference Performance

Single image (1632×912)~0.3s

GPURTX 5090

Batch processingSupported

Post-processingOptional

VFX Pipeline Integration

Why This Works for VFX

•Log space operations preserve HDR characteristics
•Bilinear upsampling eliminates checkerboard artifacts
•Gradient loss smooths banding in sky/gradient regions
•Residual learning preserves fine details

Quality Assurance

✓Exceeds Hollywood VFX standards (PSNR >57 dB)
✓No visible artifacts in production review
✓Color-accurate through entire pipeline
✓Suitable for large-format projection

Dataset & Training Pipeline

Dataset Preparation

Automated pipeline converts 16-bit linear EXR files to training pairs:

Process Steps

Convert linear EXR to log EXR (16-bit targets)
Generate 8-bit log PNG inputs (sRGB encoded)
Split into train/val/test (80/10/10)
Generate dataset statistics

Current Status

Training images58 (test)
Target dataset8,000 images
SourcePoly Haven HDR
ColorspaceLog (RGB only)

Training Configuration

Hyperparameters

Batch size: 2
Base LR: 0.0001
Optimizer: AdamW
Scheduler: ReduceLROnPlateau

Optimizations

✓BF16 mixed precision
✓Gradient checkpointing
✓Gradient accumulation
✓Early stopping

Monitoring

•TensorBoard logging
•Checkpoint saving
•Validation metrics
•Visual comparisons

Development Roadmap

✓

Completed Features

✓Core infrastructure & U-Net implementation
✓Dataset preparation pipeline (OpenImageIO)
✓Training pipeline with TensorBoard
✓Evaluation with comprehensive metrics
✓Single & batch inference pipelines

✓Progressive fine-tuning (5 phases)
✓Advanced loss functions (color, FFT, multi-scale)
✓Modern optimizations (BF16, checkpointing)
✓Post-processing smoothing tools
✓EXR diagnostic & analysis tools

→

Future Enhancements

→

Model Export:TorchScript/ONNX export for deployment

→

ComfyUI Integration:Custom nodes for real-time inference

→

Full Dataset Training:Scale to 8,000 image dataset

→

Hyperparameter Tuning:Automated optimization with Optuna

Technology Stack

PyTorch 2.0+OpenImageIONumPyLPIPSPyTorch-MSSSIMTensorBoardOpenCVPillowYAMLPython 3.9+

Key Dependencies

Core Libraries

PyTorch ≥2.0.0 - Deep learning framework
OpenImageIO ≥3.1.7.0 - Industry-standard image I/O
NumPy ≥1.24.0 - Numerical computing

Loss Functions

LPIPS ≥0.1.4 - Perceptual loss
PyTorch-MSSSIM ≥0.2.1 - SSIM loss
Custom losses - Color, FFT, multi-scale

Project Impact & Recognition

Why This Matters for VFX

Professional VFX compositing requires high dynamic range imagery with proper bit depth to preserve detail during color grading, exposure adjustments, and integration with CGI elements. This project solves a critical gap: converting 8-bit AI-generated images to production-ready 16-bit EXR files while maintaining visual quality and eliminating common artifacts like banding and checkerboard patterns.

By operating in log colorspace and using advanced loss functions specifically designed for gradient smoothness, the model produces outputs that meet Hollywood VFX standards (PSNR >57 dB) and are suitable for large-format theatrical projection.

⚡

Fast Inference

~0.3s per frame enables batch processing of entire sequences

🎨

Production Quality

Exceeds industry standards for color accuracy and artifact reduction

🔬

Research-Based

Advanced loss functions from academic research (BitNet, FFT)

Research by Sumit Chatterjee

Industrial Light & Magic, Sydney

Back to Portfolio