Publication | Closed Access
Dis-function: Learning distance functions interactively
196
Citations
37
References
2012
Year
Unknown Venue
Artificial IntelligenceGeometric LearningIntelligent Information ProcessingEngineeringMachine LearningSimilarity MeasureInteractive VisualizationInteractive Machine LearningData ScienceData MiningPattern RecognitionDomain ExpertData SizeRobot LearningVisual AnalyticsComputational Learning TheoryKnowledge DiscoveryVisual Data MiningComputer ScienceComputer VisionLearning Distance FunctionsData Points
The rapid growth and complexity of data corpora make it increasingly difficult for experts to interpret data, and although machine learning can discover patterns automatically, it often requires domain‑specific distance functions that experts cannot easily specify. We present a system that lets experts interact directly with a visual data representation to define an appropriate distance function, thereby eliminating the need to manipulate opaque model parameters. The system iteratively starts with a uniformly weighted Euclidean distance, projects data into a two‑dimensional scatterplot, lets users reposition points to reflect similarity, then optimizes a new distance function and re‑projects the data. Empirical results show that only a few iterations of interaction and optimization enable users to produce a scatterplot and distance function that capture their knowledge, and the system scales efficiently to large data sizes and dimensions, providing an interactive or near‑interactive experience.
The world's corpora of data grow in size and complexity every day, making it increasingly difficult for experts to make sense out of their data. Although machine learning offers algorithms for finding patterns in data automatically, they often require algorithm-specific parameters, such as an appropriate distance function, which are outside the purview of a domain expert. We present a system that allows an expert to interact directly with a visual representation of the data to define an appropriate distance function, thus avoiding direct manipulation of obtuse model parameters. Adopting an iterative approach, our system first assumes a uniformly weighted Euclidean distance function and projects the data into a two-dimensional scatterplot view. The user can then move incorrectly-positioned data points to locations that reflect his or her understanding of the similarity of those data points relative to the other data points. Based on this input, the system performs an optimization to learn a new distance function and then re-projects the data to redraw the scatter-plot. We illustrate empirically that with only a few iterations of interaction and optimization, a user can achieve a scatterplot view and its corresponding distance function that reflect the user's knowledge of the data. In addition, we evaluate our system to assess scalability in data size and data dimension, and show that our system is computationally efficient and can provide an interactive or near-interactive user experience.
| Year | Citations | |
|---|---|---|
Page 1
Page 1