Machine Translation

We started researching Machine Translation in 2000 and have followed the paradigms being developed in the area: first RBMT, then SMT and currently NMT. We have focused mainly on translation from and into Basque since, in addition to its commercial interest in our country, it is an important challenge for several reasons: the complexity of Basque morphology, the free order of sentence constituents, and the scarcity of resources. The results have been very good and the tools developed are being...Read More

see more

MT_tabs

Demos

Contracts

All HiTZ projects.

Projects


  • SignON - Sign Language Translation Mobile Application and Open Communications Framework
    (2021 - 2023)

  • DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
    (2019 - 2021)

  • MT4All: Unsupervised MT for low-resourced language pairs
    (2020 - 2021)

  • Building Neuronal Machine Translation methods and systems to improve coherence at paragraph and document level
    (2020 - 2021)

  • LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies.
    (2018 - 2020)
  • UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
    UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
    (2018 - 2020)

  • MODENA: Advanced neural modeling for high-quality translation.
    (2018 - 2019)

  • TADEEP: Deep Machine Translation
    (2016 - 2018)

  • MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
    (2016 - 2017)

  • QTLeap: Quality Translation by Deep Language Engineering Approaches
    (2013 - 2016)
  • All HiTZ projects

Patents

Matxin

Machine translation from Spanish to Basque.

EUSMT

Statistical Machine Translation from Spanish

TADEEP:

Sistema traducción automática neuronal para español -inglés y español-euskera

Publications

Ander Salaberria, Jon Ander Campos, Iker García, Joseba Fernandez de Landa

Itzulpen Automatikoko Sistemen Analisia: Genero Alborapenaren Kasua (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring (2021)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489

Sarasola, Kepa; Aranberri, Nora

Language-centered AI will allow additional EU official languages for 2025 (2021)

(In progress) Proceedings of the Conference on "Linguistic Rights and Language Varieties in Europe in the Age of AI". AI4EI project (Artificial Intelligence for European Integration). University of Turin. http://www.jmcoe.unito.it/content/linguistic-rights-and-language-varieties-euro

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Cristina Cumbreño, Nora Aranberri

What Do You Say? Comparison of Metrics for Post-editing Effort (2021)

In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79.

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Nora Aranberri

Can translationese features help users select an MT system for post-editing? (2020)

Revista Procesamiento del Lenguaje Natural, 64, 93-100.

Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

On the cross-lingual transferability of monolingual representations (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Nora Aranberri

With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262

Uxoa Inurrieta, tziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre

Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)

Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Translation Artifacts in Cross-lingual Transfer Learning (2020)

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).

Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)

Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.

Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova

Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)

Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task

Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille

LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)

Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71

Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)

Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz

Neural Machine Translation of clinical texts between long distance languages (2019)

JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110

Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)

Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation

Mikel Artetxe, Holger Schwenk

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

An Effective Approach to Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.

Mikel Artetxe, Holger Schwenk

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)

Transactions of the Association for Computational Linguistics 7 (2019): 597-610.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)

Procesamiento del Lenguaje Natural 63 (2019): 151-154.

Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)

Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.

Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge

Neural Machine Translation of Basque (2018)

EAMT 2018. Alicante.

Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin

QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)

Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Nora Aranberri, Gorka Labaka

Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)

Senez, 48 (2017)

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning (2013)

Translation: Computation, Corpora, Cognition (TC3) journal. Vol 3, No 1, pp: 119-135 (2013). Special Issue on Language Technologies for a Multilingual Europe, ISSN: 2193-6986, https://www.researchgate.net/publication/250927257_The_BerbaTek_project_for_Basque_Promoting_a_less-resourced_language_via_language_technology_for_translation_content_management_and_learning

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)

Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf

Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola

Matxin, an open-source rule-based machine translation system for Basque. (2011)

Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf

Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola

Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)

MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304

All HiTZ publications

MT_tabs_full

All HiTZ projects.

  • SignON - Sign Language Translation Mobile Application and Open Communications Framework
    (2021 - 2023)

  • DOMINO: Neural Machine Translation, in DOMaIn, and NO supervised
    (2019 - 2021)

  • MT4All: Unsupervised MT for low-resourced language pairs
    (2020 - 2021)

  • Building Neuronal Machine Translation methods and systems to improve coherence at paragraph and document level
    (2020 - 2021)

  • LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies.
    (2018 - 2020)
  • UnsupNMT: Traducción Automática Neuronal no Supervisada: un nuevo paradigma basado solo en textos monolingües.
    UnsupNMT: Unsupervised Neuronal Machine Translation: a new paradigm based only on monolingual text
    (2018 - 2020)

  • MODENA: Advanced neural modeling for high-quality translation.
    (2018 - 2019)

  • TADEEP: Deep Machine Translation
    (2016 - 2018)

  • MODELA: Statistical Modeling and Deep Learning for High Quality Machine Translation
    (2016 - 2017)

  • QTLeap: Quality Translation by Deep Language Engineering Approaches
    (2013 - 2016)
  • All HiTZ projects

Matxin

Machine translation from Spanish to Basque.

EUSMT

Statistical Machine Translation from Spanish

TADEEP:

Sistema traducción automática neuronal para español -inglés y español-euskera

Ander Salaberria, Jon Ander Campos, Iker García, Joseba Fernandez de Landa

Itzulpen Automatikoko Sistemen Analisia: Genero Alborapenaren Kasua (2021)

IV. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Ingeniaritza eta Arkitektura

Aitor Ormazabal, Mikel Artetxe, Aitor Soroa, Gorka Labaka, Eneko Agirre

Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring (2021)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6479–6489

Sarasola, Kepa; Aranberri, Nora

Language-centered AI will allow additional EU official languages for 2025 (2021)

(In progress) Proceedings of the Conference on "Linguistic Rights and Language Varieties in Europe in the Age of AI". AI4EI project (Artificial Intelligence for European Integration). University of Turin. http://www.jmcoe.unito.it/content/linguistic-rights-and-language-varieties-euro

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Language and Technology in Wales: Volume I (2021)

Language and Technology in Wales: Volume I. University of Bangor. ISBN: 978-1-84220-189-3

Prys Delyth, Sarasola Kepa, Alegria Iñaki, Perez-de-Viñaspre Olatz, Palmer Geraint, Corcoran Padraig, Arman Laura, Knight Dawn ,Spasic Irena, Bryn Jones Dewi, Cooper Sarah, Prys Myfyr, Muralidaran Vigneshwaran, O’Hare Keeziah, Prys Gruffudd, Watkins Gareth, Roberts Jonathan C, Butcher Peter W. S., Lew Robert, Rees Geraint, Sharma Nirwan, Frankenberg-Garcia Ana, Farhat Leena Sarah, Teahan William John.

Iaith a Thechnoleg yng Nghymru: Cyfrol 1 (2021)

Iaith a Thechnoleg yng Nghymru: Cyfrol 1. University of Bangor. ISBN: 978-1-84220-189-6

Cristina Cumbreño, Nora Aranberri

What Do You Say? Comparison of Metrics for Post-editing Effort (2021)

In: Carl M. (eds) Explorations in Empirical Translation Process Research. Machine Translation: Technologies and Applications, vol 3. Springer, Cham. pp 57-79.

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Nora Aranberri

Can translationese features help users select an MT system for post-editing? (2020)

Revista Procesamiento del Lenguaje Natural, 64, 93-100.

Xabier Soto, Dimitar Shterionov, Alberto Poncelas, Andy Way

Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp: 3898–3908.

Mikel Artetxe, Sebastian Ruder, Dani Yogatama

On the cross-lingual transferability of monolingual representations (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

A Call for More Rigor in Unsupervised Cross-lingual Learning (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Nora Aranberri

With or without you? Effects of using machine translation to write flash fiction in the foreign language (2020)

Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, p. 165–174, Lisboa, Portugal, November 2020.

Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining (2020)

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Pages 255-262

Uxoa Inurrieta, tziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Mikel Artetxe, Gorka Labaka, Noe Casas, Eneko Agirre

Do all roads lead to Rome? Understanding the role of initialization in iterative back-translation (2020)

Knowledge-Based Systems, Volume 206 (online first). Pre-print https://arxiv.org/abs/2002.12867

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Translation Artifacts in Cross-lingual Transfer Learning (2020)

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). (Pages 7674–7684).

Xabier Soto, Olatz Perez-de-Viñaspre, Gorka Labaka, Maite Oronoz

Ixamed's submission description for WMT20 Biomedical shared task: benefits and limitations of using terminologies for domain adaptation (2020)

Proceedings of the Fifth Conference on Machine Translation, pp: 873--878.

Rachel Bawden, Giorgio Maria Di Nunzio, Cristian Grozea, Inigo Jauregi Unanue, Antonio Jimeno Yepes, Nancy Mah, David Martinez, Aurélie Névéol, Mariana Neves, Maite Oronoz, Olatz Perez-de-Viñaspre, Massimo Piccardi, Roland Roller, Amy Siu, Philippe Thomas, Federica Vezzani, Maika Vicente Navarro, Dina Wiemann and Lana Yeganova

Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages (2020)

Fith Conference on Machine Translation (WMT20). Shared Task: Biomedical Translation Task

Itziar Aldabe, Josu Aztiria, Francho Beltrán, Myriam Bras, Klara Ceberio, Itziar Cor tes, Jean-Baptiste Coyos, Benaset Dazeas, Louise Esher, Gorka Labaka, Igor Leturia, Kepa Sarasola, Aure Séguier, Jean Sibille

LINGUATEC: Development of cross-border cooperation and knowledge transfer in language technologies (2020)

Workshop "INTELE : INfraestructura de TEcnologías del LEnguaje" CLARIN DARIAH-EU. http://ixa2.si.ehu.eus/intele/?q=node/71

Alberto Poncelas, Kepa Sarasola, Meghan Dowling, Andy Way, Gorka Labaka, Iñaki Alegria

Adapting NMT to caption translation in Wikimedia Commons for low-resource languages (2019)

Procesamiento del Lenguaje Natural, Revista no 63, septiembre de 2019, pp. 33-40

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Xabier Soto, Olatz Perez de Viñaspre, Gorka Labaka, Maite Oronoz

Neural Machine Translation of clinical texts between long distance languages (2019)

JAMIA (Journal of the American Medical Informatics Association), Volume 26, Issue 12, December 2019, Pages 1478–1487, https://doi.org/10.1093/jamia/ocz110

Xabier Soto, Olatz Perez de Viñaspre, Maite Oronoz, Gorka Labaka

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish (2019)

Proceedings of the Second Workshop on Multilingualism at the Intersection of Knowledge Bases and Machine Translation

Mikel Artetxe, Holger Schwenk

Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3197-3203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

An Effective Approach to Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 194-203.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Bilingual Lexicon Induction through Unsupervised Machine Translation (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5002-5007.

Mikel Artetxe, Holger Schwenk

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond (2019)

Transactions of the Association for Computational Linguistics 7 (2019): 597-610.

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Unsupervised Neural Machine Translation, a new paradigm solely based on monolingual text (2019)

Procesamiento del Lenguaje Natural 63 (2019): 151-154.

Gamallo, Pablo, Susana Sotelo, José Ramom Pichel, Mikel Artetxe

Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora (2019)

Computational Linguistics. First online. DOI: 10.1162/COLI_a_00353. ISSN: 0891-2017.

Thierry Etchegoyhen, Eva Martínez, Andoni Azpeitia, Gorka Labaka, Iñaki Alegria, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Maite Martin eta Eusebi Calonge

Neural Machine Translation of Basque (2018)

EAMT 2018. Alicante.

Thierry Etchegoyhen, Eva Martı́nez, Andoni Azpeitia, Iñaki Alegria, Gorka Labaka, Arantxa Otegi, Kepa Sarasola, Itziar Cortes, Amaia Jauregi, Igor Ellakuria, Eusebi Calonge, Maite Martin

QUALES: Estimación Automática de Calidad de Traducción Mediante Aprendizaje Automático Supervisado y No-Supervisado (2018)

Procesamiento del Lenguaje Natural, vol. 61, pp. 143-146. ISSN: 1135-5948

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Nora Aranberri, Gorka Labaka

Euskarazko Itzulpen Automatikoa - IXA Taldea (2017)

Senez, 48 (2017)

Mikel Artetxe, Gorka Labaka, Eneko Agirre

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance (2016)

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289--2294. Austin, Texas. ISBN: 978-1-945626-25-8.

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning (2013)

Translation: Computation, Corpora, Cognition (TC3) journal. Vol 3, No 1, pp: 119-135 (2013). Special Issue on Language Technologies for a Multilingual Europe, ISSN: 2193-6986, https://www.researchgate.net/publication/250927257_The_BerbaTek_project_for_Basque_Promoting_a_less-resourced_language_via_language_technology_for_translation_content_management_and_learning

Igor Leturia, Kepa Sarasola, Xabier Arregi, Arantza Diaz de Ilarraza, Eva Navas, Iñaki Sainz, Arantza del Pozo, David Baranda, Urtza Iturraspe

BerbaTek: euskararako hizkuntza teknologien garapena itzulpengintza, edukien kudeaketa eta irakaskuntza arloetan (2013)

Euskalingua aldizkari digitala, 23, 66-76. http://mendebalde.eus/euskalinguak/Euskalingua%2023/Berbatek:%20euskararako%20hizkuntza%20teknologien%20garapena%20itzulpengintza,%20edukien%20kudeaketa%20eta%20irakaskuntza%20arloetan.pdf

Aingeru Mayor, Iñaki Alegria, Arantza Díaz de Ilarraza, Gorka Labaka, Mikel Lersundi, Kepa Sarasola

Matxin, an open-source rule-based machine translation system for Basque. (2011)

Machine Translation Journal: Volume 25, Issue 1 (2011), Page 53-82. ISSN: 0922-6567. DOI: 10.1007/s10590-011-9092-y. http://link.springer.com/content/pdf/10.1007%2Fs10590-011-9092-y.pdf

Gorka Labaka, Nicolas Stroppa, Andy Way, Kepa Sarasola

Comparing Rule-Based and Data-Driven Approaches to Spanish-to-Basque Machine Translation (2007)file2 (2007)

MT-Summit XI, Copenhagen ISBN: 978-87-90708-16-0; pp.297-304

All HiTZ publications