Information Extraction and Information Retrieval
The ever increasing availability of unstructured textual resources in the Web and their potential to be used in applications for the automatic acquisition of knowledge have caused a dramatic rise in research related to Information Extraction (IE) and Information Retrieval (IR). Traditionally, the required textual content was produced by means of manual annotations by human experts on the task at hand, which is too costly in terms of both economic and human resources. In the last decade, new techniques have been developed in order to (semi) automatize the annotation processes in order to minimize the need for manually annotated data. Furthermore, the use of indexes, searchers and other basic information retrieval tools show several shortcomings. Nowadays, the aim is to view information not as a mere sequence of words, but by trying to understand the semantic meaning hidden in a document and addressing the large variety of languages in which they can be written. Specifically, our work so far has been focused on the following topics:
1. Named Entity Recognition, including person, organizations, locations, temporal and numerical expressions.
2. Terminology Extraction, in order to obtain the most relevant concepts from a given corpus.
3. Relation Extraction between entities and concepts.
4. Extraction of events and sequences of events at both intra- and cross-document levels.
5. Opinion Mining applied to a variety of text genres and domains.
6. Semantic Textual Similarity.
7. Automatic Classification of multimedia content.
We have obtained state of the art results for multilingual Information Extraction and Information Retrieval in every of the tasks mentioned, as it can be seen from our list of publications in every major Natural Language Processing conference and journal (ACL, EMNLP, Artificial Intelligence Journal, Knowledge Based Systems...). Furthermore, we have coordinated and participated in several European (NEWSREADER, LoCloud, OpeNER, PATHS, KYOTO, MEANING) national (CROSSTEXT, TUNER, SKATER, KNOW) projects. Furthermore, we have obtained a prestigious Google Research Award (Eneko Agirre) and we maintain a close relation with many companies to help the transference of technology from university to the industry.