Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 512–519,Prague, Czech Republic, June 2007.c2007 Association for Computational LinguisticsRandomised Language Modelling for Statistical Machine TranslationDavid Talbot and Miles[r]
about 35M words of aligned texts that are alsoused to train the target LM. In our experiments,adding more than 580M words of Broadcast Newsdata had no impact on the BLEU score, despitea notable decrease of the perplexity of the targetLM. Therefore, we suggest to use more complexstatistical LMs that[r]
Proceedings of NTCIR-9, pages 559–578.Yanqing He, Yu Zhou, Chengqing Zong, and HuilinWang. 2010. A Novel Reordering Model Based onMulti-layer Phrase for Statistical Machine Translation.In Proceedings of the 23rd Coling, pages 447–455,Beijing, China, August.[r]
based model, the proposed model can well han-dle non-contiguous phrases with any large gaps by means of non-contiguous tree sequence alignment. An algorithm targeting the non-contiguous constituent decoding is also proposed. Experimental results on the NIST MT-05 Chi-nese-English tr[r]
corpus to estimate a biased LM. We then sketched animplementation that improves the time and space ef-ficiency of our method by pre-computing and “spar-sifying” n-gram projections off-line during the train-ing phase. Thus, our approach can be integratedwithin on-line, low-latency SMT systems.[r]
2009. Combination of statistical word alignmentsbased on multiple preprocessing schemes. In CyrillGoutte, Nicola Cancedda, Marc Dymetman, andGeorge Foster, editors, Learning Machine Transla-tion, chapter 5, pages 93–110. MIT Press.Nizar Habash and Fatiha Sadat. 2006. Arabic prepro-cess[r]
entries based on the estimation of the infor-mation redundancy encoded in phrase pairsand hierarchical rules, and thus preserve thesearch space of SMT decoders as much aspossible. Experimental results on Chinese-to-English machine translation tasks show thatour method is able to reduce[r]
We have presented four techniques for handlingOOV words in SMT. Our results show that we con-sistently improve over a state-of-the-art baseline interms of BLEU, yet there is still potential roomfor improvement. The described system is publiclyavailable. In the future, we plan to improv[r]
and Philip Resnik. This work was partially sup-ported by ONR MURI contract FCPO.810548265and Department of Defense contract RD-02-5700.S. D. G.ReferencesA. V. Aho and J. D. Ullman. 1969. Syntax directed trans-lations and the pushdown assembler. Journal of Com-puter and System Sciences, 3:37–56.Danie[r]
scription in the case of Speech Recognition)is typicallyreferred to as decoding.There is a fundamental difference between decodingfor machine translation and decoding for speech recog-nition. When decoding a speech signal, words are gen-erated in the same order in[r]
Translation. In Annual Meeting of the Association forComputational Linguistics (ACL), demonstration ses-sion, pages 177–180, Prague, Czech Republic, June.[Moore and Quirk2007] Robert C. Moore and ChrisQuirk. 2007. Faster beam-search decoding for phrasalstatistical machine tra[r]
35th Annual Meeting of the Association for Computa-tional Linguistics, pages 16–23, Madrid, Spain, July.Association for Computational Linguistics.Isao Goto, Bin Lu, Ka Po Chow, Eiichiro Sumita, andBenjamin K. Tsou. 2011. Overview of the patent ma-chine translation task at the nt[r]
The phrase-based approach has been considered thedefault strategy to Statistical Machine Translation(SMT) in recent years. It is widely known that thephrase-based approach is powerful in local lexicalchoice and word reordering within short distance.However, long-distance reorder[r]
Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What's in a translation rule? In HLT-NAACL. Kevin Knight. 1999. Decoding complexity in word replacement translation models. Computational Linguistics, 25(4):607–615. Philipp Koehn, Hieu Hoang, Alexandra Birch, Ch[r]
Ney. 1999. Improved alignment models for statisti-cal machine translation. In Proceedings of the 1999Joint SIGDAT Conference on Empirical Methods inNatural Language Processing and Very Large Cor-pora (EMNLP/WVLC-99), pages 20–28.Kazuteru Ohashi, Kazuhide Yamamoto, Kuniko Saito,a[r]
is the main reason that we adopt the local model in thispaper.3.3 Global versus Local ModelsBoth the global and the localized log-linear models de-scribed in this section can be considered as maximum-entropy models, similar to those used in natural languageprocessing, e.g. maximum-entr[r]
ties for monotone order and non-monotone order.The two probabilities can be set to prefer mono-tone or non-monotone orientations depending onthe language pairs.In view of content-independency of the dis-tortion and flat reordering models, several re-searchers (Och et al., 2004; Tillmann, 2004;[r]
not normalized to form probability distributions, thescores that different models assign to each phrase-pair may not be in the same scale. Therefore, mixingtheir scores might wash out the information in one(or some) of the models. We experimented with twodifferent ways to deal with this normalizatio[r]
plicitly aimed at improving retrieval performancewill nevertheless lead to “better” query transla-tions when compared to the baseline. The resultsof this apporach allow us also to observe whetherand to what extent changes in BLEU scores arecorrelated to changes in MAP scores.3.2 Reranking framework[r]
and bigram models. The improvement in speed does not appear to impair accuracy significantly. We have implemented a version that accepts ITGs rather than BTGs, and plan to experiment with more heavily structured models. However, it is im- portant to note that the search complexity rises ex- p[r]