Our customer is a California based engineering company known for designing innovative products.
Our customer had a Teeth whitening algorithm on a PC based system. They wanted to build an embedded product using that algorithm. Since they did not have any expertise on DSP optimization, they involved eInfochips to port their algorithm to a DSP based platform. Their optimization target was a whooping 50x speed increase in processing the images.
The customer’s teeth whitening algorithm took multiple images as input and then recognized the teeth from the image. Based on the whitening level selected by the user, it whitened the teeth and generated the output image which has whitened teeth. eInfochips was able to produce incredible results as we achieved this goal using only ‘C’ (no DSP assembly coding was used). This has helped to reduce the project cost by well over 50%.
Our customer is a California based engineering company known for designing innovative products. They are well known for their advanced solutions.
Their new product idea was a teeth whitening photo frame. The device was designed to take a photo of a user, and in almost-real-time show an updated photo in which the user’s teeth were whitened.
Team eInfochips executed the project using only ‘C’

As per project requirements the porting and optimization was divided into four stages:
1. C++ to C conversion (2 times improvement):
We converted C++ classes to their equivalent C structures. The methods within the C++ classes were converted to global functions. The C++ templates were degrading the performance and hence they were removed. Parameter passing in functions was minimized, resulting in stack size reduction from 48 KB to 3 KB. The program memory was also reduced from 139 KB to 70 KB.
2. Efficient use of DMA and memory (2 times improvement)
Processing of the images Pixel-by-pixel and getting stored in the external memory was time consuming. Instead the processing was done line by line and each line was moved to the internal memory for the processing, using DMA.
3. Logic optimizations (4 times improvement):
The algorithm was written for PC platform, so more emphasis was on flexibility. We broke the algorithm in smaller units based on their execution time and rewrote the time consuming functions in an optimized way. This was an iterative process, but helped in achieving considerable optimization.
4. Floating-point to fixed-point operations (3 times improvement):
The floating-point operations are considerably slower than fixed-point ones and compiler cannot optimize “for/while” loops that contain floating-point operations. We replaced some of the division operations by shift operations. While converting from fixed-point to floating-point, we selected the appropriate fixed-point data types. The algorithm handled fixed-point decimal values to the power of 9
“eInfochips’ accomplishment was impressive. They overcame a series of significant engineering challenges to deliver on time and under budget. Their attention to detail, creative thinking and professional attitude was evident throughout the project.” said the Director of Software Development of customer company on successful completion of the project.