Benjamin Marie,
researcher in Natural Language Processing (NLP)



I am a freelance research scientist in natural language processing. Previsouly, I was a researcher at 4i, in Sevilla (Spain), from March 2022 to July 2024. I mainly worked on improving multimodal dialogue engines using large language models. Before, I was a researcher at the Advanced Translation Technology Laboratory at NICT (Kyoto, Japan) from May 2016 to March 2022. My research focused on improving Machine Translation (MT) for low-resource language pairs, especially involving languages of East and South Asia.

Before joining NICT, I was a Ph.D. student at LIMSI-CNRS (Orsay, France), supervised by Aurélien Max and Anne Vilnat, also simultaneously engineer for the company Lingua-Et-Machina and sometimes teacher at Université Paris-Saclay.

Topics of interest : multimodal dialogue, low-resource neural MT, evaluation for MT, neural MT for user-generated texts

Grants/Fundings

Torres Quevedo grant (Spain). 3 years. Ended in July 2024.

NICT Tenure-track funding (Japan). 2 years. Ended in 2021.

JSPS (Japan Society for the Promotion of Science) grant for early-career scientists: Neural Machine Translation for User-Generated Contents. 2 years. Ended in 2022.

Committees

Best paper committees: ACL 2018 (Demo)

Paper reviewer: AACL (2020-), ACL (2017-), AAAI (2020-), COLING (2016-), COLM (2024-), EACL (2021-), EMNLP (2019*, 2017-), ICLR (2021-), IJCAI (2019-), IJCNLP (2017), LREC (2020,2018), NAACL (2016-), NeurIPS (2021-), TACL standing reviewer, JMLR, ACM TALLIP, IEEE/ACM TASLP, ACL Rolling Review (since 2021)
*: outstanding reviewer

Selected Publications
Full publication list here

2021 (First/Contact author only)

Marie, B., Fujita, A., Rubino, R. (2021). Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers. ACL 2021, online.
Outstanding Paper Award

2020 (First/Contact author only)

Marie, B., Rubino, R., Fujita, A. (2020). Combination of Neural Machine Translation Systems at WMT20. In WMT20, online.
Ranked 1st (tied) for Ja-En, En-Iu, and Pl-En

Marie, B., Fujita, A. (2020). Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation. In TACL Vol. 8 (2020). Presented at ACL2021.

Marie, B., Fujita, A. (2020). Iterative Training of Unsupervised Neural and Statistical Machine Translation Systems. In TALLIP Vol. 19 issue 5 (2020).

Marie, B., Rubino, R., Fujita, A. (2020). Tagged Back-translation Revisited: Why Does It Really Work?. In ACL 2020, online.

2019

Marie, B., Kaing, H., Mon, A.M., Ding, C., Fujita, A., Utiyama, M. and Sumita, E. (2019). Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English. In WAT 2019, Hong Kong.
Ranked 1st for En->Km and Km->En.

Marie, B., Sun, H., Wang, R., Chen, K., Fujita, A., Utiyama, M. and Sumita, E. (2019). NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task. In WMT19, Florence, Italy.
Ranked 1st.

Marie, B., Dabre, R., and Fujita, A. (2019). NICT’s Machine Translation Systems for the WMT19 Similar Language Translation Task. In WMT19, Florence, Italy.

Dabre, R., Chen, K., Marie, B., Wang, R., Fujita, A., Utiyama, M. and Sumita, E. (2019). NICT’s Supervised Neural Machine Translation Systems for the WMT19 News Translation Task. In WMT19, Florence, Italy.

Marie, B. and Fujita, A. (2019). Unsupervised Joint Training of Bilingual Word Embeddings. In ACL 2019, Florence, Italy.

Marie, B. and Fujita, A. (2019). Unsupervised Extraction of Partial Translations for Neural Machine Translation. In NAACL-HLT 2019, Minneapolis, USA.

2018

Marie, B., Fujita, A., Sumita, E. (2018). Combination of Statistical and Neural Machine Translation for Myanmar–English. In WAT 2018, Hong Kong.
Ranked 1st (BLEU) for My-En and En-My.

Wang, R., Marie, B., Utiyama, M., Sumita, E. (2018). NICT's Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task. In WMT18, Bruxelles, Belgium.

Marie, B., Wang, R., Fujita, A., Utiyama, M., Sumita, E. (2018). NICT's Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task. In WMT18, Bruxelles, Belgium.
Ranked 1st (BLEU) for Et-En, En-Et, En-Fi, and Fi-En.

Marie, B. and Fujita, A. (2018). A Smorgasbord of Features to Combine Phrase-Based and Neural Machine Translation. In AMTA 2018, Boston, USA.

Marie, B. and Fujita, A. (2018). Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation. In TALLIP Vol. 17 issue 3 (2018).

2017

Marie, B. and Fujita, A. (2017). Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation. In TACL Vol. 5 (2017). Presented at ACL 2018

Marie, B. and Fujita, A. (2017). Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings. In ACL 2017, Vancouver, Canada.

2015

Marie, B. and Max, A. (2015). Touch-Based Pre-Post-Editing of Machine Translation Output. In EMNLP 2015, Lisbon, Portugal.

Marie, B., Allauzen, A., Burlot, F., Do, Q. K., Ive, J., Knyazeva, E., Labeau, M., Lavergne, T., Löser, K., Pécheux, N., Yvon, F. (2015). LIMSI@WMT'15: Translation Task. In WMT'15, Lisbon, Portugal.
Ranked 1st for En-Fr and Fr-En.

Marie, B. and Apidianaki, M. (2015). Alignment-based sense selection in METEOR and the RATATOUILLE recipe. In WMT'15, Lisbon, Portugal.
Ranked 1st for En-Fr and Fr-En.

Marie, B. and Max, A. (2015). Multi-Pass Decoding With Complex Feature Guidance for Statistical Machine Translation. In ACL-IJCNLP 2015, Beijing, China.

Apidianaki, M., Marie, B. (2015). METEOR-WSD: Improved Sense Matching in MT Evaluation. In SSST-9, Denver, US.

2014

Marie, B., Max, A. (2014). Confidence-based Rewriting of Machine Translation Output. In EMNLP 2014, Doha, Qatar.

Pécheux, N., Gong, L., Do, Q. K., Marie, B., Ivanishcheva, Y., Allauzen, A., Lavergne, T., Niehues, J., Max, A., Yvon, F. (2014). LIMSI @ WMT’14 Medical Translation Task. In WMT’14, Baltimore, US.

2013

Marie, B. and Max, A. (2013). A Study in Greedy Oracle Improvement of Translation Hypotheses. In IWSLT 13, Heidelberg, Germany.

Some Blog Posts (The Kaitchup - AI on a Budget)

Reports

2016

Ph.D. thesis: Complex Feature Guidance for Statistical Machine Translation (french)

2013

Project ANR TRACE report, part 5.2 (french)

2012

M.S. thesis: Improving Machine Translation Outputs by Greedy Search (french)