Actualités > Séminaire de Emanuela Boros

Title : Information Extraction - An Overview and Thoughts On Where We Should Be Going

Abstract :

With the increasing amount of data and the exploding number of data sources, information extraction, whether from the perspective of acquiring knowledge or from a more directly operational perspective, becomes a more and more obvious need. This extraction, nevertheless, comes up against a recurring difficulty : most of the information is present in documents in a textual form, thus unstructured and difficult to be grasped by the machine.

More exactly, information extraction is the task of automatically extracting entities, relations, and events from unstructured texts. Therefore, the process of scanning a text for relevant information mainly implies three levels of extraction tasks : named entity recognition, relation extraction, and event extraction. Named entity recognition represents the detection of target entities in text and relation extraction is the identification of binary relations between entities. From the point of view of natural language processing, the extraction of events from texts is the most complex form of information extraction, which more generally encompasses the extraction of named entities and relationships that bind them in texts.

Despite the usefulness and wide prospective applicability of information extraction, several issues and challenges have to be overcome until an information extraction system is widely adopted as an effective tool in practice. For example, practical issues related to the high cost of manual annotation of texts have to be faced. An information extraction system needs a higher level of annotation of relations or events. While the research in this field has significantly benefited from manually annotated datasets to learn patterns for text analysis, the availability of these resources remains a significant problem. Another challenge is regarding the characterization of the different types of approaches that can be either highly dependent on the coverage of expert knowledge or require a lot of data for getting statistically significant features.

Thus, considering the obstacles that need to be faced in extracting relevant information and the development of information extraction systems over the years, we will present several methods and strategies in the recent advances in this field, with a focus on event extraction. Also, after an insight into some of the latest breakthroughs, we also give an overview of what challenges we could face both in the near and far future.

publie le vendredi 7 mai 2021