Speeding Up Deep Learning Inference on Edge Devices

Speeding Up Deep Learning Inference on Edge Devices

Once a deep learning model is trained and ready to go into production, we might need to look into latency constraints depending upon the application. Specifically for real-time applications where latency requirements are in milliseconds, it requires to come up with a strategy to speed up inference.