Text Analysis

Natural Language Analysis Tools are software modules that perform linguistic analysis on texts at different levels. These tools are essential components of any Natual Language Processing (NLP) software that analyzes text, and any text mining software is typically built by combining basic linguistic modules forming complex pipelines.

The HiTZ center has a large tradition in building analysis tools for many languages, which range from basic linguistic processors such as tokenizers, Part-of-Speech taggers or Named Entity Recognizers, to complex modules that perform sentiment analysis or event detection on News feeds. It has also developed distributed architectures to deploy complex pipelines in cluster of machines, thus allowing the processing of the vast amount of textual information is produced every day through diverse channels such as traditional newspapers and social media sites.

HiTZ has developed the IXA-pipes tools, a set of ready to use NLP tools which provide easy access to NLP technology for several languages. It offers robust and efficient linguistic annotation with the aim of lowering the barriers of using NLP technology either for research purposes or for small industrial developers and SMEs.

The Basque language is of great interest for HiTZ, and building robust and scalable processing tools for Basque is one of the strategic goals of the center. HiTZ has developed the largest set of Basque linguistic processors available to day, which enables automatically analysis and facilitates building text mining tools for Basque.

 

Pages