MODE: automated neural network model debugging via state differential analysis and input selection

TLDR

Artificial intelligence models are integral to modern computing, yet like software they can contain bugs that degrade accuracy; existing solutions that add training inputs are limited because they lack insight into model misbehaviors and cannot select appropriate inputs. We propose a novel model debugging technique inspired by software debugging. The technique first conducts model state differential analysis to identify internal features responsible for bugs, then performs training input selection akin to program input selection in regression testing. Evaluation on 29 models across six applications shows MODE fixes bugs effectively and efficiently, raising test accuracy from 75 % to 93 % on simple tasks and over 91 % on complex tasks, outperforming state‑of‑the‑art methods that improve to 85 % with 11× more training time or fail to fix bugs.

Abstract

Artificial intelligence models are becoming an integral part of modern computing systems. Just like software inevitably has bugs, models have bugs too, leading to poor classification/prediction accuracy. Unlike software bugs, model bugs cannot be easily fixed by directly modifying models. Existing solutions work by providing additional training inputs. However, they have limited effectiveness due to the lack of understanding of model misbehaviors and hence the incapability of selecting proper inputs. Inspired by software debugging, we propose a novel model debugging technique that works by first conducting model state differential analysis to identify the internal features of the model that are responsible for model bugs and then performing training input selection that is similar to program input selection in regression testing. Our evaluation results on 29 different models for 6 different applications show that our technique can fix model bugs effectively and efficiently without introducing new bugs. For simple applications (e.g., digit recognition), MODE improves the test accuracy from 75% to 93% on average whereas the state-of-the-art can only improve to 85% with 11 times more training time. For complex applications and models (e.g., object recognition), MODE is able to improve the accuracy from 75% to over 91% in minutes to a few hours, whereas state-of-the-art fails to fix the bug or even degrades the test accuracy.

References

Page 1

	Year	Citations

Page 1