Inferencing for Convolutional Neural Networks (CNNs) is notoriously compute intensive. This makes them an ideal candidate for hardware acceleration, which is faster and more power efficient than running software on general purpose CPUs. Training and inferencing are typically done using floating point representations of the features, weights, and biases. Using a fixed point representation reduces the size and power of the operators in the accelerator. With a purpose built accelerator, the size of fixed point operators can be anything - they are not limited to 8 or 16 bits. Qkeras, or quantized Keras, is a library built on Tensorflow that allows developers to specify quantized fixed-point operations for each layer. It enables training and inferencing with reduced precision representations. This webinar will describe how to use Qkeras and High-Level Synthesis to produce a bespoke quantized CNN accelerator, and compares the accuracy, power, performance, and area of different quantizations. 

