Publication | Closed Access
Adversarial Text Generation for Google's Perspective API
26
Citations
22
References
2018
Year
Unknown Venue
Artificial IntelligenceAbuse DetectionEngineeringMachine LearningOnline Discussion PlatformsCorpus LinguisticsJournalismText MiningNatural Language ProcessingSpam FilteringData ScienceComputational LinguisticsMany QueriesAdversarial Machine LearningContent AnalysisMachine TranslationComputer ScienceSocial Media PlatformsGenerative Adversarial NetworkText GenerationPerspective ApiSocial Medium DataArtsLanguage Generation
With the preponderance of harassment and abuse, social media platforms and online discussion platforms seek to curb toxic comments. Google's Perspective aims to help platforms classify toxic comments. We have created a pipeline to modify toxic comments to evade Perspective. This pipeline uses existing adversarial machine learning attacks to find the optimal perturbation which will evade the model. Since these attacks typically target images, as opposed to discrete text data, we include a process to generate text candidates from perturbed features and select candidates to retain syntactic similarity. We demonstrated that using a model with just 10,000 queries, changing three words in each comment evades Perspective 25% of the time, suggesting that building a surrogate model may not require many queries and a more robust approach is needed to improve the toxic comment classifier accuracy.
| Year | Citations | |
|---|---|---|
Page 1
Page 1