Since our last posting about medical image classification on Tesla K20 we have made some more progress. We have moved to Tesla K40M.
Our first major hurdle was to control the temperature of the passively cooled GPU. The Tesla K40M required fans on the server to ensure optimal temperature control.
Thanks to Ilya Goldberg from National Institute on Health and the co-founder of Open Microscopy, who shared his knowledge on the application of computer-aided programs in the medical field. Refer to https://www.openmicroscopy.org for more details.
While porting WindCharm application on GPU, we spend a huge amount of time in understanding Kepler architecture to identify the right techniques for code optimization on GPU.
Understanding register usage by the NVCC driver and its subsequent impact on occupancy can help you to balance register usage and desired occupancy. For more info on registers, click here.
Occupancy is a key parameter in determining optimal usage of GPU. However since occupancy can prove to be deceptive, we explored various technical documents & webinars and performed in-house experiments to validate occupancy’s importance to deliver best performance. The whitepaper linked here explains more on occupancy.
Function units is another area that one need to keep a close watch on.The tenets of accuracy versus speed are important, wherein single and double precision calculations can make or break the game. Visit here to know more about function units.
Now coming back to Tesla K40, we have also started using texture memory. As a result, the performance has improved significantly and we are now able to run the application in under 7 minutes, compared to the 11 minutes on a Tesla K20.
The benchmarking numbers now look like this:
CPU
Dataset Used | Number of Consoles | Time |
RNAi Images | 1 | 4 Hrs 38 Mins |
RNAi Images | 4 | 2 Hrs 15 Mins |
GPU
Dataset Used | Number of Consoles | Time |
RNAi Images | 1 | 23 Mins (12 X) |
RNAi Images | 4 | 6 Mins 44 Sec (20 X) |
I recently presented a webinar on this topic with NVIDIA. I will be sharing the slides here soon. Watch this space.