Ressources vocales et langagières

Pour développer des produits et des applications en technologie linguistique, il est nécessaire de disposer de ressources linguistiques de base (corpus textuel et oral, lexiques et bases de connaissances) et d'outils de développement (analyseurs morphologiques et syntaxiques, désambiguiseurs, outils de traitement de corpus, lemmatiseurs, environnements intégrés des outils, etc.)

Nous avons plus de 25 ans d'expérience dans la création de ce type de ressources linguistiques de base et no...lire la suite

Chercheur/se principal/e: 

voir plus

data_tabs

Demos

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Contrats

Projects

Patents

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Ressources

Publications

Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (2020)

Proceedings of The 12th Language Resources and Evaluation Conference, pp. 429–435. European Language Resources Association. ISBN: 979-10-95546-34-4

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia

AzterTest: Open source linguistic and stylistic analysis tool (2020)

Procesamiento del Lenguaje Natural, 64, 61-68. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6196

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314

Itziar Gonzalez-Dios, Javier Álvez, German Rigau

Towards a Model for Ontologising WordNet Adjectives (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 1–6. ISBN: 979-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Jon Alkorta, Itziar Gonzalez-Dios

Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 20–24. ISBN: 79-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Thierry Declerck, Itziar Gonzalez-Dios, German Rigau (editors)

Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMWN-2020) (2020)

European Language Resources Association (ELRA), Paris. https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf ISBN: 979-10-95546-41-2 EAN: 9791095546412

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Uxoa Inurrieta, tziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

Sintaktikoki etiketatutako euskarazko corpus historikoa eraikitzen (2020)

Fontes Linguae Vasconum 50 urte. Ekarpen berriak euskararen ikerketari. Nuevas aportaciones al estudio de la lengua vasca

Jon Alkorta, Koldo Gojenola, Mikel Iruskieta

SentiTegi: building a semantic oriented Basque lexicon (2019)

Computación y Sistemas, 22 (4)

Igone Zabala

The elaboration of Basque in academic and professional domains. (2019)

In Grenoble, Lenore; Lane, Pia & Røyneland, Unn Unn Røyneland (ed.) Linguistic Minorities in Europe Online. The Gruyter Mouton. ISSN 2510-5361

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza

EUSKOR: End-to-end coreference resolution system for Basque (2019)

PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3

ItziarGonzalez-Dios, German Rigau

Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

Conversational QA for FAQs (2019)

NeurIPS 3rd Conversational AI Workshop: “Today's Practice and Tomorrow's Potential”

Igone Zabala, Izaskun Aldezabal, Maria Jesus Aranzabe

Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera (2019)

II Simposio Internacional sobre Lenguaje Científico en al Ámbito Académico. Universidad de Jaen.

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga

Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic? (2019)

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga (2019) 'Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic?' Wikimedia+Education Conference, Donostia 2019

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Igone Zabala

Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)

In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)file2 (2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2018)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Arantza Díaz de Ilarraza, Itziar Gonzalez-Dios, Ruben Urizar

Building the Gold Standard for the Surface Syntax of Basque (2017)

Procesamiento del Lenguaje Natural, 58, 125-132. Consultado en http://ixa.si.ehu.es/sites/default/files/dokumentuak/8825/5421-4766-1-PB.pdf (ISSN edición impresa: 1135-5948) (ISSN edición electrónica: 1989-7553)

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1

Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J.

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank/Verbnet Model (2016)

Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492. DOI: http://dx.doi.org/10.1093/llc/fqv001 First published online: 17 June 2015 (23 pages). Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)

A. Minard, M. Speranza, R. Urizar, B. Altuna, M. van Erp, A. Schoen, and C. van Son

MEANTIME, the NewsReader Multilingual Event and Time Corpus (2016)

Proceedings of LREC 2016.Pages: 4417-4422. ISBN: 978-2-9517408-9-1

Maria Jesús Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola, Larraitz Uria

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies (2015)

Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4

Iruskieta M., Aranzabe M., Diaz de Ilarraza A., Gonzalez I., Lersundi I., Lopez de Lacalle O.

The RST Basque TreeBank: an online search interface to check rhetorical relations (2013)

4th​ Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. October 20-24 (http://encontrorst2013.wix.com/encontro-rst-2013)​

Pociello E., Agirre E. and Aldezabal I.

Methodology and construction of the Basque WordNet (2011)

Language Resources and Evaluation. Springer. Volume 45, Issue 2, pp 121-142. ISSN 1574-020X. DOI 10.1007/s10579-010-9131-y. official

Izaskun Aldezabal, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Larraitz Uria

EusPropBank: Integrating Semantic Information in the Basque Dependency Treebank (2010)

Lecture Notes in Computer Science (LNCS) nº 6008, Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing. pp.60-73, Springer. ISSN: 0302-9743, ISBN-10: 3-642-12115-2 Springer Berlin Heidelberg New York, ISBN-13: 978-3-642-12115-9 Springer Berlin Heidelberg New York. 11th International Conference, CICLing 2010, Iasi, Romania, March 21-27, 2010

Izaskun Aldezabal, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza, Ainara Estarrona

Building the Basque PropBank (2010)

Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.), Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), pp. 1414-1417, European Language Resources Association (ELRA), ISBN: 2-9517408-6-7. LREC 2010, Valletta (Malta), May 19-21, 2010

Uria L., Estarrona A., Aldezabal I., Aranzabe M., Díaz de Ilarraza A., Iruskieta M.

Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque (2009)

Lecture Notes in Computer Science (LNCS) nº 5449, Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing. pp 72-85. Springer. ISSN: 0302-9743, ISBN-10: 3-642-00381-8, ISBN-13: 978-3-642-00381-3. 10th International Conference, CICLing 2009, Mexico City, Mexico, March 1-7, 2009

Izaskun Aldezabal, Maria Jesús Aranzabe, Jose Maria Arriola, Arantza Diaz de Ilarraza

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues (2009)

Corpus Linguistics and Linguistic Theory 5-2 (2009), 241-269. Mouton de Gruyter. Berlin-New York. Print ISSN: 1613-7027 Online ISSN: 1613-7035

Zabala I., Aierbe A., Aldezabal I., Aranzabe M., Arregi X., Arriola J.M., Elordui A., Elosegi A., Elosegi K., Ezeiza J., Garcia I., Garcia J., Lersundi M., San Martin I. eta Ugarteburu I.

GARATERM: Diskurtso akademiko-profesionalaren didaktika eta garapena uztartzeko tresna informatikoen diseinua eta integrazioa helburu duen proiektua (2008)

In Iñaki Ugarteburu eta Pello Salaburu (arg.), Espezialitate hizkerak eta terminologia III: espezialitate hizkeren didaktika eta komunikazioa, 211-219, UPV/EHUko argitalpen zerbitzua. Bilbo (Bizakia). ISBN: 978-84-691-6424-2

Izaskun Aldezabal, Klara Ceberio, Itsaso Esparza, Ainara Estarrona, Jone Etxeberria, Elixabete Izagirre, Mikel Iruskieta, Larraitz Uria

EPEC (Euskararen Prozesamendurako Erreferentzia Corpusa) segmentazio-mailan etiketatzeko eskuliburua (2007)

UPV/EHU / LSI / TR 11-2007

Itziar Aduriz, Maria Jesús Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, Ruben Urizar

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing (2006)

Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). ISBN 90-420-1836-4 Ed. Andrew Wilson, Paul Rayson, and Dawn Archer. Rodopi. Netherlands.

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, Eli Pociello

Improving the Basque WordNet by corpus annotation. (2006)

Proceedings of Third International WordNet Conference. pp. 287-290. ISBN 80-210-3915-9. Jeju Island (Korea).

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernández, Mikel Lersundi

EDBL: a General Lexical Basis for the Automatic Processing of Basque (2001)

IRCS Workshop on linguistic databases. Philadelphia (USA).

All HiTZ publications

data_tabs_full

Konbitzul

Izen+aditz konbinazio-itzulpenen datu-basea

e-ROLda

A tool for looking up verb entries in the BVI lexicon and examples in EPEC-RolSem corpus

Universal Dependencies treebank for Basque

This treebank has 121 K words annotated following the guidelines proposed in the Universal Dependencies project.

 

Eusemcor

Corpus tagged with Basque WordNet senses.

Basque WordNet / Euskal WordNet

Basque WordNet

EDBL

Basque lexical database.

EPEC-ROLSEM

Corpus tagged with semantic roles.

EPEC-DEP (BDT)

A syntactic corpus tagged using the Dependency Grammar Theory.

Arantxa Otegi, Aitor Agirre, Jon Ander Campos, Aitor Soroa, Eneko Agirre

Conversational Question Answering in Low Resource Scenarios: A Dataset and Case Study for Basque (2020)

Proceedings of The 12th Language Resources and Evaluation Conference, pp. 429–435. European Language Resources Association. ISBN: 979-10-95546-34-4

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Uxoa Iñurrieta

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study (2020)

Procesamiento del Lenguaje Natural, 64, pp. 123-126.

Kepa Bengoetxea, Itziar Gonzalez-Dios, Amaia Aguirregoitia

AzterTest: Open source linguistic and stylistic analysis tool (2020)

Procesamiento del Lenguaje Natural, 64, 61-68. http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6196

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314

Itziar Gonzalez-Dios, Javier Álvez, German Rigau

Towards a Model for Ontologising WordNet Adjectives (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 1–6. ISBN: 979-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Jon Alkorta, Itziar Gonzalez-Dios

Exploring the Enrichment of Basque WordNet with a Sentiment Lexicon (2020)

Proceedings of the Workshop on Multimodal Wordnets (MMWN-2020), pages 20–24. ISBN: 79-10-95546-41-2 https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf

Thierry Declerck, Itziar Gonzalez-Dios, German Rigau (editors)

Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMWN-2020) (2020)

European Language Resources Association (ELRA), Paris. https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/MMW2020book.pdf ISBN: 979-10-95546-41-2 EAN: 9791095546412

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Uxoa Inurrieta, tziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. (2020)

Inurrieta U, Aduriz I, Díaz de Ilarraza A, Labaka G, Sarasola K (2020) Learning about phraseology from corpora: A linguistically motivated approach for Multiword Expression identification. PLoS ONE 15(8): e0237767. https://doi.org/10.1371/journal.pone.0237767

Ainara Estarrona, Izaskun Etxeberria, Ricardo Etxepare, Manuel Padilla-Moyano, Ander Soraluze

Sintaktikoki etiketatutako euskarazko corpus historikoa eraikitzen (2020)

Fontes Linguae Vasconum 50 urte. Ekarpen berriak euskararen ikerketari. Nuevas aportaciones al estudio de la lengua vasca

Jon Alkorta, Koldo Gojenola, Mikel Iruskieta

SentiTegi: building a semantic oriented Basque lexicon (2019)

Computación y Sistemas, 22 (4)

Igone Zabala

The elaboration of Basque in academic and professional domains. (2019)

In Grenoble, Lenore; Lane, Pia & Røyneland, Unn Unn Røyneland (ed.) Linguistic Minorities in Europe Online. The Gruyter Mouton. ISSN 2510-5361

Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Mikel Iruskieta

Towards a top-down approach for an automatic discourse analysis for Basque: Segmentation and Central Unit detection tool (2019)

PLoS ONE 14(9): e0221639

Ander Soraluze, Olatz Arregi, Xabier Arregi, Arantza Diaz de Ilarraza

EUSKOR: End-to-end coreference resolution system for Basque (2019)

PLoS ONE 14(9): e0221801. https://doi.org/10.1371/journal.pone.0221801

Ainara Estarrona, Izaskun Etxeberria, Ander Soraluze, Manuel Padilla-Moyano

Spelling Normalisation of Basque Historical Texts (2019)

Procesamiento del Lenguaje Natural, vol. 63, pp. 59-66

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Commonsense Reasoning Using WordNet and SUMO: a Detailed Analysis (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 197--205. ISBN 978-83-7493-108-3

ItziarGonzalez-Dios, German Rigau

Textual genre based approach to use wordnets in language-for-specific-purpose classroom as dictionary (2019)

Proceedings of the Tenth Global Wordnet Conference, pp 222--227. ISBN 978-83-7493-108-3

Jon Ander Campos, Arantxa Otegi, Aitor Soroa, Jan Deriu, Mark Cieliebak, Eneko Agirre

Conversational QA for FAQs (2019)

NeurIPS 3rd Conversational AI Workshop: “Today's Practice and Tomorrow's Potential”

Igone Zabala, Izaskun Aldezabal, Maria Jesus Aranzabe

Retos actuales del desarrollo y aprendizaje de los registros académicos orales y escritos del euskera (2019)

II Simposio Internacional sobre Lenguaje Científico en al Ámbito Académico. Universidad de Jaen.

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga

Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic? (2019)

Meghan Dowling, Kepa Sarasola, Ana Zelaia, Aitzol Astigarraga (2019) 'Looking for possible new articles. What Wikipedia pages are often consulted in English... but there are not defined in Gaelic?' Wikimedia+Education Conference, Donostia 2019

Begoña Altuna, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza

Adapting TimeML to Basque: Event Annotation (2018)

In Gelbukh A. (eds.) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science (LNCS, vol 9624), 565-577. Springer, Cham. DOI https://doi.org/10.1007/978-3-319-75487-1_43; Print ISBN 978-3-319-75486-4; Online ISBN 978-3-319-75487-1

Uxoa Iñurrieta, Itziar Aduriz, Arantza Díaz de Ilarraza, Gorka Labaka, Kepa Sarasola

Konbitzul: an MWE-specific Database for Spanish-Basque (2018)

Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. orrialdeak: pages 2500-2504.

Uxoa Iñurrieta, Itziar Aduriz, Ainara Estarrona, Itziar Gonzalez-Dios, Antton Gurrutxaga, Ruben Urizar, Iñaki Alegria

Verbal Multiword Expressions in Basque corpora (2018)

In the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (at COLING 2018)

Igone Zabala

Euskararen terminologiaren garapena Terminologiaren Teoria Komunikatiboaren argitan (2018)

In Ruben Urizar eta Itizar Aduriz (ed.) Hizkuntzalari Euskaldunen III Topaketa. Zer berri?. 349-358.

Klara Ceberio, Itziar Aduriz, Arantza Díaz de Ilarraza and Ines Garzia-Azkoaga

Coreferential Relations in Basque: The Annotation Process (2018)

J Psycholinguist Res (2018) 47, Issue 2. Pages 325-342. https://doi.org/10.1007/s10936-018-9559-6. ISSN 0090-6905. Online ISSN 1573-6555.

Izaskun Aldezabal, Xabier Artola, Arantza Diaz De Ilarraza, Itziar Gonzalez-Dios, Gorka Labaka, German Rigau and Ruben Urizar

Basque e-lexicographic resources: linguistic basis, development, and future perspectives (2018)file2 (2018)

Workshop on eLexicography: Between Digital Humanities and Artificial Intelligence. https://lexdhai.insight-centre.org/Lex_DH__AI_2018_paper_5.pdf

Ainara Estarrona, Izaskun Aldezabal, Arantza Díaz de Ilarraza

How the corpus-based Basque Verb Index lexicon was built (2018)

Language Resources and Evaluation. First Online 05 December 2018. DOI: https://doi.org/10.1007/s10579-018-9440-0. Springer Netherlands

Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Arantza Díaz de Ilarraza, Itziar Gonzalez-Dios, Ruben Urizar

Building the Gold Standard for the Surface Syntax of Basque (2017)

Procesamiento del Lenguaje Natural, 58, 125-132. Consultado en http://ixa.si.ehu.es/sites/default/files/dokumentuak/8825/5421-4766-1-PB.pdf (ISSN edición impresa: 1135-5948) (ISSN edición electrónica: 1989-7553)

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Arantxa Otegi, Nora Aranberri, António Branco, Jan Hajic, Steven Neale, Petya Osenova, Rita Pereira, Martin Popel, Joao Silva, Kiril Simov, Eneko Agirre

QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages (2016)

Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). ISBN 978-2-9517408-9-1

Estarrona A., Aldezabal I., Díaz de Ilarraza A. eta Aranzabe M.J.

A Methodology for the Semiautomatic Annotation of EPEC-RolSem, a Basque Corpus Labeled at Predicate Level following the PropBank/Verbnet Model (2016)

Edward Vanhoutte (ed.) Digital Scholarship in the Humanities (2016) 31 (3): 470-492. DOI: http://dx.doi.org/10.1093/llc/fqv001 First published online: 17 June 2015 (23 pages). Published by Oxford University Press on behalf of EADH: The European Association for Digital Humanities (Online ISSN 2055-768X - Print ISSN 2055-7671)

A. Minard, M. Speranza, R. Urizar, B. Altuna, M. van Erp, A. Schoen, and C. van Son

MEANTIME, the NewsReader Multilingual Event and Time Corpus (2016)

Proceedings of LREC 2016.Pages: 4417-4422. ISBN: 978-2-9517408-9-1

Maria Jesús Aranzabe, Aitziber Atutxa, Kepa Bengoetxea, Arantza Diaz de Ilarraza, Iakes Goenaga, Koldo Gojenola, Larraitz Uria

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies (2015)

Markus Dickinsons, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski (eds), Proceedings of the Fourteenth International Workshop on Treebanks an Linguistic Theories (TLT14), 233-241. Institute of Computer Science of the Polish Academy of Sciences, Warszawa, Poland. ISBN: 978-83-63159-18-4

Iruskieta M., Aranzabe M., Diaz de Ilarraza A., Gonzalez I., Lersundi I., Lopez de Lacalle O.

The RST Basque TreeBank: an online search interface to check rhetorical relations (2013)

4th​ Workshop RST and Discourse Studies, 40-49, Sociedad Brasileira de Computacao, Fortaleza, CE, Brasil. October 20-24 (http://encontrorst2013.wix.com/encontro-rst-2013)​

Pociello E., Agirre E. and Aldezabal I.

Methodology and construction of the Basque WordNet (2011)

Language Resources and Evaluation. Springer. Volume 45, Issue 2, pp 121-142. ISSN 1574-020X. DOI 10.1007/s10579-010-9131-y. official

Izaskun Aldezabal, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza, Ainara Estarrona, Larraitz Uria

EusPropBank: Integrating Semantic Information in the Basque Dependency Treebank (2010)

Lecture Notes in Computer Science (LNCS) nº 6008, Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing. pp.60-73, Springer. ISSN: 0302-9743, ISBN-10: 3-642-12115-2 Springer Berlin Heidelberg New York, ISBN-13: 978-3-642-12115-9 Springer Berlin Heidelberg New York. 11th International Conference, CICLing 2010, Iasi, Romania, March 21-27, 2010

Izaskun Aldezabal, Maria Jesús Aranzabe, Arantza Diaz de Ilarraza, Ainara Estarrona

Building the Basque PropBank (2010)

Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odjik, Stelios Piperidis, Mike Rosner and Daniel Tapias (eds.), Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), pp. 1414-1417, European Language Resources Association (ELRA), ISBN: 2-9517408-6-7. LREC 2010, Valletta (Malta), May 19-21, 2010

Uria L., Estarrona A., Aldezabal I., Aranzabe M., Díaz de Ilarraza A., Iruskieta M.

Evaluation of the Syntactic Annotation in EPEC, the Reference Corpus for the Processing of Basque (2009)

Lecture Notes in Computer Science (LNCS) nº 5449, Alexander Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing. pp 72-85. Springer. ISSN: 0302-9743, ISBN-10: 3-642-00381-8, ISBN-13: 978-3-642-00381-3. 10th International Conference, CICLing 2009, Mexico City, Mexico, March 1-7, 2009

Izaskun Aldezabal, Maria Jesús Aranzabe, Jose Maria Arriola, Arantza Diaz de Ilarraza

Syntactic annotation in the Reference Corpus for the Processing of Basque (EPEC): Theoretical and practical issues (2009)

Corpus Linguistics and Linguistic Theory 5-2 (2009), 241-269. Mouton de Gruyter. Berlin-New York. Print ISSN: 1613-7027 Online ISSN: 1613-7035

Zabala I., Aierbe A., Aldezabal I., Aranzabe M., Arregi X., Arriola J.M., Elordui A., Elosegi A., Elosegi K., Ezeiza J., Garcia I., Garcia J., Lersundi M., San Martin I. eta Ugarteburu I.

GARATERM: Diskurtso akademiko-profesionalaren didaktika eta garapena uztartzeko tresna informatikoen diseinua eta integrazioa helburu duen proiektua (2008)

In Iñaki Ugarteburu eta Pello Salaburu (arg.), Espezialitate hizkerak eta terminologia III: espezialitate hizkeren didaktika eta komunikazioa, 211-219, UPV/EHUko argitalpen zerbitzua. Bilbo (Bizakia). ISBN: 978-84-691-6424-2

Izaskun Aldezabal, Klara Ceberio, Itsaso Esparza, Ainara Estarrona, Jone Etxeberria, Elixabete Izagirre, Mikel Iruskieta, Larraitz Uria

EPEC (Euskararen Prozesamendurako Erreferentzia Corpusa) segmentazio-mailan etiketatzeko eskuliburua (2007)

UPV/EHU / LSI / TR 11-2007

Itziar Aduriz, Maria Jesús Aranzabe, Jose Maria Arriola, Aitziber Atutxa, Arantza Diaz de Ilarraza, Nerea Ezeiza, Koldo Gojenola, Maite Oronoz, Aitor Soroa, Ruben Urizar

Methodology and steps towards the construction of EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing (2006)

Corpus Linguistics Around the World. Book series: Language and Computers. Vol 56 (pag 1- 15). ISBN 90-420-1836-4 Ed. Andrew Wilson, Paul Rayson, and Dawn Archer. Rodopi. Netherlands.

Eneko Agirre, Izaskun Aldezabal, Jone Etxeberria, Mikel Iruskieta, Elixabete Izagirre, Karmele Mendizabal, Eli Pociello

Improving the Basque WordNet by corpus annotation. (2006)

Proceedings of Third International WordNet Conference. pp. 287-290. ISBN 80-210-3915-9. Jeju Island (Korea).

Izaskun Aldezabal, Olatz Ansa, Bertol Arrieta, Xabier Artola, Aitzol Ezeiza, Gregorio Hernández, Mikel Lersundi

EDBL: a General Lexical Basis for the Automatic Processing of Basque (2001)

IRCS Workshop on linguistic databases. Philadelphia (USA).

All HiTZ publications