Read the article [responsivevoice_button buttontext='Hear the article' voice='US English Female']
Since our last posting about medical image classification on Tesla K20 we have made some more progress. We have moved to Tesla K40M.
Our first major hurdle was to control the temperature of the passively cooled GPU. The Tesla K40M required fans on the server to ensure optimal temperature control.
Thanks to Ilya Goldberg from National Institute on Health and the co-founder of Open Microscopy, who shared his knowledge on the application of computer-aided programs in the medical field. Refer to https://www.openmicroscopy.org for more details.
While porting WindCharm application on GPU, we spend a huge amount of time in understanding Kepler architecture to identify the right techniques for code optimization on GPU.
Understanding register usage by the NVCC driver and its subsequent impact on occupancy can help you to balance register usage and desired occupancy. For more info on registers, click here.
Occupancy is a key parameter in determining optimal usage of GPU. However since occupancy can prove to be deceptive, we explored various technical documents & webinars and performed in-house experiments to validate occupancy’s importance to deliver best performance. The whitepaper linked here explains more on occupancy.
Function units is another area that one need to keep a close watch on.The tenets of accuracy versus speed are important, wherein single and double precision calculations can make or break the game. Visit here to know more about function units.
Now coming back to Tesla K40, we have also started using texture memory. As a result, the performance has improved significantly and we are now able to run the application in under 7 minutes, compared to the 11 minutes on a Tesla K20.
The benchmarking numbers now look like this:
Number of Consoles
4 Hrs 38 Mins
2 Hrs 15 Mins
Number of Consoles
23 Mins (12 X)
6 Mins 44 Sec (20 X)
I recently presented a webinar on this topic with NVIDIA. I will be sharing the slides here soon. Watch this space.
Lalit Chandivade works as a Technical Manager at eInfochips. He has been leading a team at eInfochips on building automated NVMe test suites and enhancing the NVMe test suites on Linux & Windows OS. Lalit has also successfully executed projects in the Linux device drivers & applications in the Storage Area Network domain.