Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Lattice BLEU oracles in machine translation

Lattice BLEU oracles in machine translation Lattice BLEU Oracles in Machine Translation ¨ ARTEM SOKOLOV, Universitat Heidelberg GUILLAUME WISNIEWSKI and FRANCOIS YVON, Universit´ Paris Sud and LIMSI­CNRS ¸ e The search space of Phrase-Based Statistical Machine Translation (PBSMT) systems can be represented as a directed acyclic graph (lattice). By exploring this search space, it is possible to analyze and understand the failures of PBSMT systems. Indeed, useful diagnoses can be obtained by computing the so-called oracle hypotheses, which are hypotheses in the search space that have the highest quality score. For standard SMT metrics, this problem is, however, NP-hard and can only be solved approximately. In this work, we present two new methods for efficiently computing oracles on lattices: the first one is based on a linear approximation of the corpus BLEU score and is solved using generic shortest distance algorithms; the second one relies on an Integer Linear Programming (ILP) formulation of the oracle decoding that incorporates count clipping constraints. It can either be solved directly using a standard ILP solver or using Lagrangian relaxation techniques. These new decoders are evaluated and compared with several alternatives from the literature for three language pairs, using lattices produced by two PBSMT systems. Categories and Subject http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Speech and Language Processing (TSLP) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/lattice-bleu-oracles-in-machine-translation-zQDSygcVOB
Publisher
Association for Computing Machinery
Copyright
Copyright © 2013 by ACM Inc.
ISSN
1550-4875
DOI
10.1145/2513147
Publisher site
See Article on Publisher Site

Abstract

Lattice BLEU Oracles in Machine Translation ¨ ARTEM SOKOLOV, Universitat Heidelberg GUILLAUME WISNIEWSKI and FRANCOIS YVON, Universit´ Paris Sud and LIMSI­CNRS ¸ e The search space of Phrase-Based Statistical Machine Translation (PBSMT) systems can be represented as a directed acyclic graph (lattice). By exploring this search space, it is possible to analyze and understand the failures of PBSMT systems. Indeed, useful diagnoses can be obtained by computing the so-called oracle hypotheses, which are hypotheses in the search space that have the highest quality score. For standard SMT metrics, this problem is, however, NP-hard and can only be solved approximately. In this work, we present two new methods for efficiently computing oracles on lattices: the first one is based on a linear approximation of the corpus BLEU score and is solved using generic shortest distance algorithms; the second one relies on an Integer Linear Programming (ILP) formulation of the oracle decoding that incorporates count clipping constraints. It can either be solved directly using a standard ILP solver or using Lagrangian relaxation techniques. These new decoders are evaluated and compared with several alternatives from the literature for three language pairs, using lattices produced by two PBSMT systems. Categories and Subject

Journal

ACM Transactions on Speech and Language Processing (TSLP)Association for Computing Machinery

Published: Dec 1, 2013

There are no references for this article.