Publication | Closed Access
A deep neural network language model with contexts for source code
43
Citations
61
References
2018
Year
Unknown Venue
Llm Fine-tuningEngineeringSoftware EngineeringSource Code AnalysisLarge Language ModelSoftware AnalysisCorpus LinguisticsNatural Language ProcessingComputational LinguisticsLanguage StudiesLanguage ModelsMachine TranslationSource CodeStatistical Language ModelsCode GenerationComputer ScienceCode RepresentationDeep LearningDeep Neural NetworkProgram AnalysisLinguistics
Statistical language models (LMs) have been applied in several software engineering applications. However, they have issues in dealing with ambiguities in the names of program and API elements (classes and method calls). In this paper, inspired by the success of Deep Neural Network (DNN) in natural language processing, we present Dnn4C, a DNN language model that complements the local context of lexical code elements with both syntactic and type contexts. We designed a context-incorporating method to use with syntactic and type annotations for source code in order to learn to distinguish the lexical tokens in different syntactic and type contexts. Our empirical evaluation on code completion for real-world projects shows that Dnn4C relatively improves 11.6%, 16.3%, 27.1%, and 44.7% top-1 accuracy over the state-of-the-art language models for source code used with the same features: RNN LM, DNN LM, SLAMC, and n-gram LM, respectively. For another application, we showed that Dnn4C helps improve accuracy over n-gram LM in migrating source code from Java to C# with a machine translation model.
| Year | Citations | |
|---|---|---|
Page 1
Page 1