Concepedia

Publication | Open Access

FalconCode: A Multiyear Dataset of Python Code Samples from an Introductory Computer Science Course

11

Citations

19

References

2023

Year

Abstract

The lack of large and diverse datasets of student code samples limits some forms of computer science education research. To address this problem, we created FalconCode, a novel collection of over 1.5 million Python programs from over two thousand undergraduate students at the United States Air Force Academy. FalconCode captures over five semesters worth of code samples from our introduction to computing course, which is taken by every student regardless of their academic major. The dataset contains student code submissions for over 800 programming assignments, as well as additional metadata such as the prompt for each assignment, the testcase(s) used to evaluate student submissions, and the specific skills needed to solve each problem. In this paper, we describe the methodology used to create FalconCode and the steps taken to anonymize the data. We then describe FalconCode's data schema, and show how it can support a wide range of research---including those utilizing machine learning (ML) and artificial intelligence (AI). FalconCode is provided free-of-charge, and is available upon request for computer science education research.

References

YearCitations

Page 1