Publication | Closed Access
Identifying the most influential data objects with reverse top-k queries
59
Citations
15
References
2010
Year
Ranking AlgorithmEngineeringBusiness IntelligenceIndividual User PreferencesLearning To RankBusiness AnalyticsText MiningOptimization-based Data MiningInformation RetrievalData ScienceData MiningInfluential ObjectsPreference LearningManagementData IntegrationData ManagementStatisticsVery Large DatabaseKnowledge DiscoveryComputer ScienceMarketingInfluential Data ObjectsQuery OptimizationTop- K QueriesInfluence ModelBig Data
Top- k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top- k queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top- k result set of their preferences). In this paper, we address the challenging problem of processing queries that identify the top- m most influential products to customers, where influence is defined as the cardinality of the reverse top- k result set. This definition of influence is useful for market analysis, since it is directly related to the number of customers that value a particular product and, consequently, to its visibility and impact in the market. Existing techniques require processing a reverse top- k query for each object in the database, which is prohibitively expensive even for databases of moderate size. In contrast, we propose two algorithms, SB and BB , for identifying the most influential objects: SB restricts the candidate set of objects that need to be examined, while BB is a branch-and-bound algorithm that retrieves the result incrementally. Furthermore, we propose meaningful variations of the query for most influential objects that are supported by our algorithms. Our experiments demonstrate the efficiency of our algorithms both for synthetic and real-life datasets.
| Year | Citations | |
|---|---|---|
Page 1
Page 1