This project benchmarks CNN inference on MNIST across CPU (PyTorch/JIT) and GPU (CUDA), including NCHW vs NHWC layout analysis.
GPU/CPU architecture overview and results.