Extracción y recuperación de la información

La disponibilidad cada vez mayor de recursos textuales no estructurados en la Web, y su potencial para ser utilizados en aplicaciones para la adquisición automática de conocimiento, han provocado un aumento espectacular en la investigación relacionada con la Extracción de la Información (EI) y la Recuperación de la Información (RI). Tradicionalmente, la extracción de contenido estructurado para alimentar bases de datos se hacía manualmente, en un proceso extremadamente costoso. En la última d...Leer Más

Investigador/a principal: 

ver más

ie_ir_tabs

Demos

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

Contratos

Proyectos

Patentes

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

Recursos

Publicaciones

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics (https://doi.org/10.1162/COLI_r_00372)

Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre

Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)

Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain

Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar

Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2020)

International Joint Conference on Artificial Intelligence (IJCAI 2020)

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Applying the Closed World Assumption to SUMO-basedFOL Ontologies for Effective Commonsense Reasoning (2020)

To appear in 24th European Conference on Artificial Intelligence - ECAI 2020 (preprint) ECAI2020 proceedings, including the main conference and the PAIS papers, will be published OA, as usual, by in IOS Press Ebook Series Frontiers in Artificial Intelligence and Applications (FAIA) on August 29.

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2019)

Artificial Intelligence, 268 (2019) 85-95

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief. DOI: https://doi.org/10.1016/j.dib.2019.104432

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665. DOI: https://doi.org/10.1016/j.engappai.2019.07.010

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Rodrigo Agerri

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

SemEval@NAACL-HLT 2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf

Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria

Euskaldun gazte eta helduen harremanak Twitterren (2019)

III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Josu Goikoetxea, Aitor Soroa eta Eneko Agirre

Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Rodrigo Agerri, German Rigau

Simple Language Independent Sequence Labelling for the Annotation of Disabilities in Medical Texts (2018)

Proceedings of the Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), Diann Track, Sevilla, Spain.

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Multi-lingual and Cross-lingual timeline extraction (2017)

Knowledge-Based Systems, 133, 77-89

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Goikoetxea J., Agirre E., Soroa A.

Single or Multiple. Combining Word Representations Independently Learned from Text and WordNet (2016)

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pp. 2608-26014. ISBN: 978-1-57735-760-5. Phoenix (USA).

Rodrigo Agerri, German Rigau

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

Goikoetxea J., Agirre E., Soroa A.

Random Walks and Neural Network Language Models on Knowledge Bases (2015)

Proceedings of the Annual Meeting of the North American chapter of the Association of Computational Linguistics (NAACL HLT 2015), pages 1434-1439. ISBN: 978-1-937284-73-2. Denver (USA).

Arantxa Otegi

Hedapena informazioaren berreskurapenean: hitzen adiera-desanbiguazioaren eta antzekotasun semantikoaren ekarpenak (2012)
file2
(2012)

Lengoaia eta Sistema Informatikoak Saila, EHU/UPV. Informatika Fakultatea. 2012/03/16

More publications

ie_ir_tabs_full

Demo of the NewsReader NLP pipeline

 

Just copy in any English text and see what entities and events and other annotations are added automatically. The result is represented in the NAF format.

Demo of the NewsReader NLP pipeline

 

Just copy in any Spanish text and see what entities and other annotations are added automatically. The result is represented in the NAF format

 

Eihera

Basque named entities recognizer/classifier

Eustagger

Basque lemmatizer and morphosyntactic analyzer

EUSLEM

EUSLEM: lemmatizer for Basque

UKB

Word sense disambiguation and similarity.

KYBOT

Knowledge Yielding Robot

Eneko Agirre

Cross-Lingual Word Embeddings (Book Review) (2020)

Computational Linguistics (https://doi.org/10.1162/COLI_r_00372)

Oier Lopez de Lacalle, Ander Salaberria, Aitor Soroa, Gorka Azkune and Eneko Agirre

Evaluating Multimodal Representations on Visual Semantic Textual Similarity (2020)

Proceedings of the Twenty-third European Conference on Artificial Intelligence, ECAI 2020, June 8-12, 2020, Santiago Compostela, Spain

Oscar Sainz, Oier Lopez de Lacalle, Itziar Aldabe, Montse Maritxalar

Domain Adapted Distant Supervision for Pedagogically Motivated Relation Extraction (2020)

Proceeding of 12th Edition of its Language Resources and Evaluation Conference (LREC2020). Marseille, France

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Towards Word Sense Disambiguation by Reasoning (2020)

Vampire 2018 and Vampire 2019. The 5th and 6th Vampire Workshops. EPiC Series in Computing. Pages 19-29. ISSN: 2398-7340

Rodrigo Agerri, Iñaki San Vicente, Jon Ander Campos, Ander Barrena, Xabier Saralegi, Aitor Soroa, Eneko Agirre

Give your Text Representation Models some Love: the Case for Basque (2020)

Proceedings of LREC. Also available at arxiv https://arxiv.org/pdf/2004.00033.pdf

Begoña Altuna, María Jesús Aranzabe, Arantza Díaz de Ilarraza

EusTimeML: A mark-up language for temporal information in Basque (2020)

Research in Corpus Linguistics 8: 86-104. ISSN 2243-4712. Asociación Española de Lingüística de Corpus (AELINCO) DOI 10.32714/ricl.08.01.06

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2020)

International Joint Conference on Artificial Intelligence (IJCAI 2020)

Elena Zotova, Rodrigo Agerri, Manuel Nuñez and German Rigau

Multilingual Stance Detection in Tweets: The Catalonia Independence Corpus (2020)

Language Resources and Evaluation Conference (LREC 2020)

Javier Álvez, Itziar Gonzalez-Dios, German Rigau

Applying the Closed World Assumption to SUMO-basedFOL Ontologies for Effective Commonsense Reasoning (2020)

To appear in 24th European Conference on Artificial Intelligence - ECAI 2020 (preprint) ECAI2020 proceedings, including the main conference and the PAIS papers, will be published OA, as usual, by in IOS Press Ebook Series Frontiers in Artificial Intelligence and Applications (FAIA) on August 29.

Rodrigo Agerri, German Rigau

Language independent sequence labelling for Opinion Target Extraction (2019)

Artificial Intelligence, 268 (2019) 85-95

lñigo Lopez-Gazpio, Montse Maritxalar, Mirella Lapata, Eneko Agirre

Word n-gram attention models for sentence similarity and inference (2019)

Expert Systems with Applications. Volume 132, 15 October 2019, Pages 1-11. https://doi.org/10.1016/j.eswa.2019.04.054.

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa and Eneko Agirre

Analyzing the Limitations of Cross-lingual Word Embedding Mappings (2019)

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4990-4995.

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity (2019)

Data in Brief. DOI: https://doi.org/10.1016/j.dib.2019.104432

Juan J. Lastra-Díaz, Josu Goikoetxea, Mohamed Ali Hadj Taieb, Ana García-Serrano, Mohamed Ben Aouicha, Eneko Agirre

A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art (2019)

Engineering Applications of Artificial Intelligence. Volume 85, October 2019, Pages 645-665. DOI: https://doi.org/10.1016/j.engappai.2019.07.010

Andrea Amelio Ravelli, Oier Lopez de Lacalle, Eneko Agirre

A comparison of representation models in a non-conventional semantic similarity scenario (2019)

Proceedings of the Sixth Italian Conference on Computational Linguistics, Bari, Italy.

Rodrigo Agerri

Doris Martin at SemEval-2019 Task 4: Hyperpartisan News Detection with Generic Semi-supervised Features (2019)

SemEval@NAACL-HLT 2019: 944-948 https://www.aclweb.org/anthology/S19-2161.pdf

Joseba Fernandez de Landa, Rodrigo Agerri, Iñaki Alegria

Euskaldun gazte eta helduen harremanak Twitterren (2019)

III. Ikergazte. Nazioarteko ikerketa euskaraz. Kongresuko artikulu bilduma. Gizarte Zientziak eta Zuzenbidea. 2, pp. 83 - 90

Mark Stevenson, Eneko Agirre

Word Sense Disambiguation (2018)

The Oxford Handbook of Computational Linguistics 2nd edition (2 ed.) Edited by Ruslan Mitkov. Oxford. ISBN: 9780199573691. DOI of the chapter: 10.1093/oxfordhb/9780199573691.013.28

Josu Goikoetxea, Aitor Soroa eta Eneko Agirre

Knowledge-Based Systems (KNOSYS). Volume 150, 15 June 2018, Pages 218-230. ISSN: 0950-7051. DOI https://doi.org/10.1016/j.knosys.2018.03.017 Preprint at https://arxiv.org/pdf/1804.08316.pdf

Rodrigo Agerri, Yiling Chung, Itziar Aldabe, Nora Aranberri, Gorka Labaka, German Rigau

Building Named Entity Recognition Taggers via Parallel Corpora (2018)

In Proceedings of the 11th Language Resources and Evaluation Conference (LREC 2018), 7-12 May, 2018, Miyazaki, Japan.

Ander Barrena, Aitor Soroa, Eneko Agirre

Learning text representations for 500K classification tasks on Named Entity Disambiguation (2018)

The SIGNLL Conference on Computational Natural Language Learning CONLL 2018

Rodrigo Agerri, German Rigau

Simple Language Independent Sequence Labelling for the Annotation of Disabilities in Medical Texts (2018)

Proceedings of the Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018), Diann Track, Sevilla, Spain.

Egoitz Laparra, Rodrigo Agerri, Itziar Aldabe, German Rigau

Multi-lingual and Cross-lingual timeline extraction (2017)

Knowledge-Based Systems, 133, 77-89

Itziar Aduriz, Iñaki Alegria, Olatz Arregi, Arantza Diaz de Ilarraza, Kepa Sarasola

Hizkuntza-teknologia “Datu Handien” garaian: programa bilatzaileak, itzultzaileak… (2017)

Senez, 48, pp. 191-200. ISSN: 1132-2152. 2017 https://eizie.eus/eu/argitalpenak/senez/20171102/aurkezpena/datuhandiak

Goikoetxea J., Agirre E., Soroa A.

Single or Multiple. Combining Word Representations Independently Learned from Text and WordNet (2016)

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. pp. 2608-26014. ISBN: 978-1-57735-760-5. Phoenix (USA).

Rodrigo Agerri, German Rigau

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (2016)

Artificial Intelligence, 238 (2016) pages 63-82. http://dx.doi.org/10.1016/j.artint.2016.05.003

Goikoetxea J., Agirre E., Soroa A.

Random Walks and Neural Network Language Models on Knowledge Bases (2015)

Proceedings of the Annual Meeting of the North American chapter of the Association of Computational Linguistics (NAACL HLT 2015), pages 1434-1439. ISBN: 978-1-937284-73-2. Denver (USA).

Arantxa Otegi

Hedapena informazioaren berreskurapenean: hitzen adiera-desanbiguazioaren eta antzekotasun semantikoaren ekarpenak (2012)
file2
(2012)

Lengoaia eta Sistema Informatikoak Saila, EHU/UPV. Informatika Fakultatea. 2012/03/16

More publications