This paper demonstrates that CNNs can be trained from scratch at true 4-bit precision
on commodity CPUs, with no specialized hardware and no post-training tricks. It achieved
92.34% accuracy on CIFAR-10 (just 0.16% below full-precision baseline), 70.94% on CIFAR-100,
and 83.16% on a consumer Android device in only 6 epochs, with 8x memory compression over FP32.
The key contribution is tanh-based soft weight clipping as a novel quantization technique.
Read the paper on arXiv