Publication | Closed Access
Complementary Relation Contrastive Distillation
100
Citations
51
References
2021
Year
Unknown Venue
Natural Language ProcessingKnowledge RepresentationEngineeringMachine LearningData ScienceData MiningInformation RetrievalKnowledge DistillationRelationship ExtractionKnowledge DiscoveryIndividual Representation DistillationSymbolic TechniqueComputer ScienceTransfer LearningMutual InformationDistillationStatistical Relational Learning
Knowledge distillation aims to transfer representation ability from a teacher model to a student model. Previous approaches focus on either individual representation distillation or inter-sample similarity preservation. While we argue that the inter-sample relation conveys abundant information and needs to be distilled in a more effective way. In this paper, we propose a novel knowledge distillation method, namely Complementary Relation Contrastive Distillation (CRCD), to transfer the structural knowledge from the teacher to the student. Specifically, we estimate the mutual relation in an anchor-based way and distill the anchor-student relation under the supervision of its corresponding anchor-teacher relation. To make it more robust, mutual relations are modeled by two complementary elements: the feature and its gradient. Furthermore, the low bound of mutual information between the anchor-teacher relation distribution and the anchor-student relation distribution is maximized via relation contrastive loss, which can distill both the sample representation and the inter-sample relations. Experiments on different benchmarks demonstrate the effectiveness of our proposed CRCD.
| Year | Citations | |
|---|---|---|
Page 1
Page 1