Shall we explore a hypothetical situation? Sounds like fun, you say? Very Good.
For the sake of this example, let us say you are an engineer developing some novel new technology and you are working with system requirements with complex processing demands best satisfied by a powerful and flexible system-on-chip (SoC). Some requirements can be satisfied through software executing on a microprocessor. Additionally there are hardware requirements for a unique, perhaps unusually fast, external interface that require a custom logic implementation.
Subsequently, you select an SoC that implements an embedded heterogeneous system architecture* with both a powerful processing system (PS) and flexible programmable logic (PL) to meet these requirements. You plan to use the PS predominantly for your application and algorithm development, and the PL for a custom interface used by your application to communicate with an external device within the product. You start the development lifecycle with this plan in place. Everything is going according to plan. Your design envelope for PS utilization is right on target for the final product. There is room in the resource utilization to enable future upgrades or reuse as a platform for additional variants. The PL implementation is providing you with the flexibility and power you needed for your custom interface and even has some extra resources available.
*If you are not familiar with what we mean by “heterogeneous system architecture”, please consider taking a few moments to read a previous post that provides a definition for our usage of the term.
Unexpectedly, you learn of a change in requirements resulting in the addition of a new feature to the system. Due to the nature of this feature, a software solution is used. Adding this feature more heavily utilizes the PS’s available processing capacity. Unfortunately, if it is a severe case, the new feature will push the PS well beyond its limits and, it is too late in the product lifecycle to start over with new hardware without missing or renegotiating deadlines. What do you do if you want to make your current deadline, and still include the future upgrade capability?
By now, you may have realized that this hypothetical situation can be all too real for those designing embedded systems but there are paths to take that can help in this situation. Intensive optimization of the source code for the PS architecture to bring the performance within spec is a potential option. However, as is often the case with innovative designs that push the envelope of the hardware, software optimization alone may be insufficient to hit the anticipated performance targets.
Alternatively, because you selected a heterogeneous system (such as one of MPSoC devices from Xilinx’s Zynq Ultrascale+ product line up), offloading computationally complex portions of your application to the PL is an option available to you. This results in reduced processing load on the PS by using the PL to do work for it. Though this offloading process may not be an option for the new feature, there are likely opportunities waiting for you in the code for your existing features.
Not all code blocks make sense to move to the PL. This is because there is an overhead incurred when transferring data between the PS to the PL. Sections of code that require significant data transfer with relatively little processing will not benefit from the PL’s parallel processing compared to the incurred data transfer overhead. An example of a good opportunity for offloading an algorithm from software to hardware is a block of code that needs to execute frequently, that involves above average computation, does not have strict data movement requirements, and already uses parallel software processing. A block of code with complex independent loops processing a relatively small amount of data is an ideal candidate for algorithm acceleration in the PL because the relative effect of the induced data transfer latency is minor compared to the increased PL processing capability. Other blocks to consider are those with strict data latency requirements that would be best served by processing in the hardware, where software scheduling mechanisms and interrupts cannot introduce unwanted spikes in data processing latencies.
If there are no ideal candidate code blocks or features for offloading to PL, you can still leverage the PL. Select a block of code that is difficult for the processor to compute, and let the PL do the work for it, to free up the PS for the work it does best. You can still see a benefit of offloading work to the PL. You can see a benefit even if the PL is slower than the PS at executing the offloaded code. This might occur in the situation where the offloaded processing logic on the PL is not a natural fit for acceleration. Yet, the PS is now free to do other tasks that it could not do without the offloaded processing occurring in the PL. Therefore, the PL provides you with an opportunity for parallelization and balancing of the processing load across the PS and the PL, thereby improving the overall system performance.
You might even find that this approach becomes a standard procedure during development of future designs as a load balancing mechanism to improve consistent baseline performance when adjusting for dynamic requirements over the development lifecycle of your new technology. This benefit alone makes it advantageous to consider the selection of a heterogeneous SoC for your next embedded system.