Publication | Open Access
Cores that don't count
175
Citations
15
References
2021
Year
Unknown Venue
EngineeringVerificationComputer ArchitectureSoftware EngineeringProcessor ArchitectureSoftware AnalysisVlsi EraHardware SecuritySystems EngineeringParallel ComputingManycore ProcessorReliabilityHardware ReliabilityComputer EngineeringMicrocode UpdatesComputer ScienceDesign For TestingEphemeral Computational ErrorsSilicon DebuggingProgram AnalysisSoftware TestingMany-core ArchitectureParallel ProgrammingFault Injection
We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, we have observed ephemeral computational errors that were not detected during manufacturing tests. These defects cannot always be mitigated by techniques such as microcode updates, and may be correlated to specific components within the processor, allowing small code changes to effect large shifts in reliability. Worse, these failures are often "silent" - the only symptom is an erroneous computation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1