You’ve got an innovative, game-changing product idea. But while thinking of all the wondrous things your design should do, sometimes you don’t stop to think of protecting your amazing product from doing things it shouldn’t.
Here are some techniques for adding robustness to your FPGA design:
Get to know the embedded built-in capabilities of the device you are using, and take advantage of them.
Sounds obvious, right?
Technology moves fast and you may not be aware of the latest feature set available to you. Parity checking or Error Detection and Correction (ECC) on internal RAM provides data integrity. Configuration memory integrity monitoring using ECC or CRC algorithms can be performed periodically during run-time to reduce the period of exposure to prospective configuration memory upsets. Clock frequency, and voltage/temperature monitoring can help detect failures before your product becomes impaired.
Evaluate architectural mitigations to shield your design from unintended failures. Built-in self-test capabilities (Memory BIST and Logic BIST) and functional monitors are common methods. Watchdog timers can keep essential operations from getting hung, and thus unavailable when needed.
ECC on external memories provides integrity to data storage areas. And to prove these mitigations, addition of an error injection capability is advised.
For critical processing functions, consider a redundant or triplicate implementation with built-in checking features to catch those potential errant events before they propagate and cause damage.
When a fault is encountered, you choose the response: reset the system, operate through, or shutdown. In any of these response options, logging the event can help as a future diagnostic tool. In addition, logical and/or physical isolation of critical functions from non-critical functions protects against crosstalk, manufacturing faults, or single event upsets.
Partition your architecture during the conceptual design phase to enable these isolation techniques.
Use a rigorous design methodology to provide protection against design faults. A development flow which includes multiple checkpoints throughout the lifecycle using reviews and common design standards has proven to eliminate failures.
Yes, it slows you down at first, but in the end you will have saved yourself from some nasty bugs. Think of it as cost, and more importantly failure, avoidance.
In this case, the bad guy is the abnormal condition which you haven’t thought to add to your test suite.
Have you ever found yourself using a product, and perhaps you’ve gone through a user sequence that produces undesired results? And you think to yourself “why didn’t the developers test that scenario?” When developing a verification suite, add those abnormal “what-if” situations to address potential design vulnerabilities.
Like secure systems, the safest systems employ a layered approach utilizing multiple techniques. Implementing a number of the practices discussed is recommended.