Publication | Open Access
First- and second-order expectation semirings with applications to minimum-risk training on translation forests
137
Citations
35
References
2009
Year
Unknown Venue
Artificial IntelligenceStructured PredictionEngineeringMachine LearningExpectation SemiringLarge Language ModelTranslation ForestsNatural Language ProcessingData ScienceUncertainty QuantificationComputational LinguisticsSemi-supervised LearningSupervised LearningMachine TranslationComputational Learning TheoryKnowledge DiscoveryWeighted Logical DeductionProbability TheoryComputer ScienceStatistical Learning TheoryNeural Machine TranslationMinimum-risk TrainingAutomated ReasoningStatistical InferenceSecond-order Expectation Semirings
Many statistical translation models can be regarded as weighted logical deduction. Under this paradigm, we use weights from the expectation semiring (Eisner, 2002), to compute first-order statistics (e.g., the expected hypothesis length or feature counts) over packed forests of translations (lattices or hypergraphs). We then introduce a novel second-order expectation semiring, which computes second-order statistics (e.g., the variance of the hypothesis length or the gradient of entropy). This second-order semiring is essential for many interesting training paradigms such as minimum risk, deterministic annealing, active learning, and semi-supervised learning, where gradient descent optimization requires computing the gradient of entropy or risk. We use these semirings in an open-source machine translation toolkit, Joshua, enabling minimum-risk training for a benefit of up to 1.0 bleu point.
| Year | Citations | |
|---|---|---|
Page 1
Page 1