In today’s data driven age, there are an increasing number of computationally complex algorithms turning raw data into actionable insights in real-time. There are many promising algorithms which have not yet made the jump to real-time throughput and latency.
These algorithms run the gamut from neural networks to statistical modeling, which can be time and resource intensive. Many algorithms existing only as research projects have been optimized to perform their function well, while speed and resource requirements remain secondary concerns.
The people developing these algorithms may not have the engineering expertise needed to bring these processes to their full potential, and their time may be better utilized on further algorithmic advancements. A team of engineers can optimize the software implementation of these algorithms, and port them on to GPUs or FPGAs, improving throughput and latency, so they can be used for real-time applications.
DornerWorks has experience accelerating algorithms for real-time processing and has developed a process to turn research algorithms into real-time applications ready for deployment.
DornerWorks will work with the algorithm’s developers to gain an understanding of the algorithm. DornerWorks will gather baseline performance data on the algorithm, evaluating its current throughput, latency, and resource utilization. Test data will be used to verify the functionality of the algorithm after any changes. Throughput and latency requirements for a real-time implementation will be established, and the specifications of the target hardware should be known to determine resource utilization requirements.
This is an iterative process, where DornerWorks will optimize the algorithm as much as possible in software running on a CPU before attempting to port to a different hardware platform. It is likely that only parts of the algorithm will need to undergo hardware acceleration, so it is imperative that the rest of the algorithm is performant as possible in software.
If the algorithm is not already in C or C++, it will be translated into C++. This is not only because C/C++ is faster than other languages such as python or MATLAB, but because it represents what is happening in memory more transparently, making acceleration easier. C/C++ code is also easier to port to HLS (High Level Synthesis) tools, which reduces FPGA development time.
The first step in this iterative process is to identify potential performance improvements that can be made to the algorithm. Then the team will redesign the algorithm’s architecture to take advantage of those performance improvements. Once a plan has been made, the algorithm is reimplemented according to the new architecture. The final stage is to verify the improvement in performance and verify the algorithm’s functionality with the algorithm developers.
This process repeats, until the team determines further algorithm acceleration in software is not feasible.
The hardware acceleration process is also iterative. The first step in the hardware acceleration process is to profile the algorithm to find functions which are performing poorly on the CPU. The team examines the code which is not performing and determines what hardware platform would be best suited to further accelerating it. After porting the relevant sections of code to the chosen hardware platform, the functionality of the algorithm is again verified with the original algorithm developers, and DornerWorks will evaluate the algorithm to determine if it meets throughput and latency requirements. If the algorithm is not yet running in real-time, the process is repeated until the algorithm is running fast enough.
Successfully accelerating complex algorithms requires collaboration between the engineers implementing the algorithm on a real-time system, and the experts who designed it. DornerWorks will work with your existing team to accelerate your applications into real-time systems that can deliver value to your customers.
Schedule a meeting with the DornerWorks team to start accelerating your applications.