einfochips logo
 


 

Industries

Industrial Automation
Semiconductors
Medical Imaging
Automotive
Networking
Security/Surveillance
Storage
Video Communications
Optimized porting of PC algorithm on TI DSP

By implementing algorithm optimization techniques,
eInfochips achieved 50 times performance improvement

Most of the research is done on PC platform initially, as the algorithm research emphasis is more on achieving a particular functionality. Today’s PC based systems have ample processing power and memory, so the algorithm developers don’t have to worry about these two aspects. When the same algorithm is used for an embedded system, the resources become major bottleneck to achieve real time performance. Hence, the algorithms need to be optimized both for faster execution and lower memory consumption.

This case study discusses a few techniques adopted for improving the performance while porting the image-processing algorithm developed for PC platform to Texas Instrument DSP TMS320C6205. The algorithm identifies specific features of human face and display the modified face image on the LCD of C6205 based hand-held device.

The solution discussed in this case study was ported to DSP platform for our client IDEO, Inc.— a California based engineering company known for designing innovative products. “eInfochips’ accomplishment was impressive. They overcame a series of significant engineering challenges to deliver on time and under budget. Their attention to detail, creative thinking and professional attitude was evident throughout the project.” said Ed Kirk, Director of Software Development at IDEO, on successful completion of the project.

The challenge

Everyone assumes that algorithms written in ANSI C are easily portable across platforms. In this case also, the first attempt to port the algorithm was to recompile the code in Code Composer Studio and use it. But, when we tried to run it on the evaluation board, it was taking 5 seconds to execute against the target execution time of 100 mS. A large performance gap of 50 times, which was a deciding factor for the success of the product.

As usual, the first response was “it is impossible”! But, a more detailed analysis revealed that the seemingly impossible task is doable and the same team finally achieved it.

Porting And Optimization Methodology

We achieved this performance by using only ‘C’, no DSP assembly coding was used!

The porting and optimization was divided in four stages:

C++ To C conversion (2 times improvement) We converted C++ classes to their equivalent C structures. The methods within the C++ classes were converted to global functions. The C++ templates were degrading the performance and hence they were removed. Parameter passing in functions was minimized, resulting in stack size reduction from 48 KB to 3 KB. The program memory was also reduced from 139 KB to 70 KB.

Efficient use of DMA and memory (2 times improvement) Pixel-by-pixel processing of the images stored in external memory was taking too much of time. The processing was done line by line instead of pixel by pixel and each line was moved to internal memory for processing using DMA.

Logic optimizations (4 times improvement) The algorithm was written for PC platform, so more emphasis was on flexibility. We broke the algorithm in smaller units based on their execution time and rewrote the time consuming functions in an optimized way. This was an iterative process, but helped in achieving considerable optimization.

Floating-point to fixed-point operations (3 times improvement) The floating-point operations are considerably slower than fixed-point ones and compiler cannot optimize “for/while” loops that contain floating-point operations. We replaced some of the division operations by shift operations. While converting from fixed-point to floating-point, we selected the appropriate fixed-point data types. The algorithm handled fixed-point decimal values to the power of 9.

Conclusion

By careful optimization, the algorithms developed for PC processors running at GHz speed, can be ported to lower speed DSP platforms and still achieve real-time performance.


  Related Links
 
 
 




  Feedback/Comment regarding website please write to WebMaster
©eInfochips 2008 | Privacy Policy | Sitemap