Publication | Closed Access
What Makes a Well-Documented Notebook? A Case Study of Data Scientists’ Documentation Practices in Kaggle
27
Citations
20
References
2021
Year
Unknown Venue
Well-documented NotebookEngineeringWell-documented NotebooksData CurationData VisualizationData PublishingJournalismText MiningNatural Language ProcessingData ScienceDocument EngineeringDocument AnalysisData ResourcesCollaborative Data ScienceComputational NotebooksData-driven ScienceKnowledge DiscoveryResearch Data ManagementInformation ManagementResearch Data ArchivingMachine Learning CompetitionsCase StudyData-driven LearningData Literacy
Many data scientists use computational notebooks to test and present their work, as a notebook can weave code and documentation together (computational narrative), and support rapid iteration on code experiments. However, it is not easy to write good documentation in a data science notebook, partially because there is a lack of a corpus of well-documented notebooks as exemplars for data scientists to follow. To cope with this challenge, this work looks at Kaggle — a large online community for data scientists to host and participate in machine learning competitions — and considers highly-voted Kaggle notebooks as a proxy for well-documented notebooks. Through a qualitative analysis at both the notebook level and the markdown-cell level, we find these notebooks are indeed well documented in reference to previous literature. Our analysis also reveals nine categories of content that data scientists write in their documentation cells, and these documentation cells often interplay with different stages of the data science lifecycle. We conclude the paper with design implications and future research directions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1