Publication | Closed Access
Expandable group identification in spreadsheets
23
Citations
23
References
2018
Year
Unknown Venue
EngineeringSpreadsheet UsersSoftware EngineeringSoftware AnalysisLarge-scale DatasetsCombinatorial Data AnalysisData ScienceData MiningManagementData IntegrationIdentification MethodFuzzingData ManagementVery Large DatabaseKnowledge DiscoveryComputer ScienceAutomated AnalysisDatabase TechnologyStatic Program AnalysisAutomated ReasoningProgram AnalysisSoftware TestingExpandable Group IdentificationVenron CorporaData Modeling
Spreadsheets are widely used in various business tasks. Spreadsheet users may put similar data and computations by repeating a block of cells (a unit) in their spreadsheets. We name the unit and all its expanding ones as an expandable group. All units in an expandable group share the same or similar formats and semantics. As a data storage and management tool, expandable groups represent the fundamental structure in spreadsheets. However, existing spreadsheet systems do not recognize any expandable groups. Therefore, other spreadsheet analysis tools, e.g., data integration and fault detection, cannot utilize this structure of expandable groups to perform precise analysis. In this paper, we propose ExpCheck to automatically extract expandable groups in spreadsheets. We observe that continuous units that share the similar formats and semantics are likely to be an expandable group. Inspired by this, we inspect the format of each cell and its corresponding semantics, and further classify them into expandable groups according to their similarity. We evaluate ExpCheck on 120 spreadsheets randomly sampled from the EUSES and VEnron corpora. The experimental results show that ExpCheck is effective. ExpCheck successfully detect expandable groups with F1-measure of 73.1%, significantly outperforming the state-of-the-art techniques (F1-measure of 13.3%).
| Year | Citations | |
|---|---|---|
Page 1
Page 1