Boosting medical imaging application performance through CUDA optimization | My journey with NVIDIA’s latest GPUs

April 18, 2017
May 23, 2025
Lalit Chandivade
Blog
Medical Devices

Boosting medical imaging application performance through CUDA optimization | My journey with NVIDIA’s latest GPUs

April 18, 2017
May 23, 2025
eInfochips an Arrow Company
Blog
Medical Devices

Optical Coherence Tomography (OCT) is primarily used in the medical imaging filed for three dimensional imaging of biological tissues, which are used in full diagnosis of the patient and to provide a non-invasive way to diagnose. Speed and depth of scan remain the key parameter for technology progressiveness in OCT. Without impacting the cost structure, it becomes important to select the right hardware and software. That is where the NVIDIA desktop GPUS (single PCIe Slot) comes into the picture.

In relation to such use case, I was tasked with improving the performance of the scan rate in an already optimized CUDA code. The code was used in the OCT algorithm. One of the use case of OCT in medical imaging is to get high resolution images of the retina. It is also successfully employed in aiding angiography and in eye surgeries.

The existing system used Maxwell architecture, based Quadro M4000 GPU.

I started with the NVIDIA visual profiler to find the hotspots. And interestingly the profiler showed usage of the double function unit, which raised an alarm as the codes were written using floats. It appeared that the constants used in the codes were promoted to “double” during the computation. The fix was to qualify the constants as float using the “.0f” prefix.

Apart from it, I also tried using various techniques, like the ones mentioned below, to improve the performance.

Use of fast math
Use of L1 cache
Use of texture memory
Use of shuffle commands to do reduction
Reduce register counts inside the kernel
Streams
Club kernels working on the same data set

A few of these methods gave boost, while others caused degradation in performance. While making changes to the code, I ran into the CUDA address out-of-bound issues. Luckily, running the application in cuda-memcheck helped me in resolving these issues.

CXO’s Handbook for Navigating the Future of Healthcare

I did try to use the half precision data types. Unfortunately for the current application the usage of half data types did not help and had to revert to FP32.

Further, using the guided performance analysis in the visual profiler the performance was improved from 100 KHz to around 200 KHz. This would have done the job, however, it needed to be further improved to take care of the increased cycles needed in transferring the data to the application for display.

At a small brainstorming session it was found that Pascal based GPU P4000 is available in the market. A basic comparison showed that the Tera flops are double in P4000 compared to M4000.

	Quadro M4000	Quadro P4000
CUDA Cores	1664	1792
FP32 TFLOPS	2.66	5.3
Max Power Consumption	120 Watt	105 Watt
Architecture	Maxwell 2	Pascal

And by using the new GPU the performance went from 200KHz to 300 KHz! The use of P4000 made a real big difference.

If you look to accelerate medical image classification with NVIDIA Tesla, you can check my previous blog: Tesla K20 & Tesla K40. eInfochips offers CUDA Consulting, Migration, and System Design Services for companies looking to use NVIDIA GPUs for their products.

Visit our Medical Devices web page to know more or contact us at marketing@einfochips.com for your queries.

Authors

AUTHOR

Lalit Chandivade

Lalit Chandivade works as a Technical Manager at eInfochips. He has been leading a team at eInfochips on building automated NVMe test suites and enhancing the NVMe test suites on Linux & Windows OS. Lalit has also successfully executed projects in the Linux device drivers & applications in the Storage Area Network domain.

Explore More

Blog

Talk to an Expert

Subscribe
to our Newsletter

Stay in the loop! Sign up for our newsletter & stay updated with the latest trends in technology and innovation.

Boosting medical imaging application performance through CUDA optimization | My journey with NVIDIA’s latest GPUs

Table of Contents

Boosting medical imaging application performance through CUDA optimization | My journey with NVIDIA’s latest GPUs

Authors

Explore More

Talk to an Expert

Download Report

Download Sample Report

Download Brochure

Start a conversation today

Start a conversation today

Start a conversation today

Start a conversation today

Start a conversation today

Please Fill Below Details and Get Sample Report

Reference Designs

Our Work

Innovate

Transform.

Scale

Partnerships

Device Partnerships

Digital Partnerships

Quality Partnerships

Silicon Partnerships

Company

Mobility

Healthcare

Industrial

Hi-Tech

Products & IPs

Device

Digital

Quality

Silicon