Document Image Understanding: Computational Image Processing in the Cultural Heritage Sector
Textual documents, such as manuscripts and historical newspapers, make up an important part of our cultural heritage. Massive digitization projects have been conducted across the globe for a better preservation of, and for providing easier access to such, often vulnerable, documents. These digital counterparts also allow to unlock the rich information contained inside and across them thanks to various types of computational models for document image understanding. In this article, we will shed a light on the document image processing pipeline, from scan to information extraction. As it turns out, human perceptual-driven algorithms are among the most powerful approaches for generic document image understanding, required to deal with a myriad of layouts. In this context, we will in particular explain Gestalt visioning and the linked concept of text homogeneity that allows for enhanced layout analysis and even damage recognition, especially relevant in a cultural heritage setting. We conclude with a recent promising development, namely joint visual and language processing, that will take document image understanding to the next level in the future.
Related Articles
Forecasting Environmental Conditions to Foster Climate Resilience in Heritage
Heritage objects are continually at risk from the harmful agents of deterioration, and these risks may be exacerbated by climate change [2]. Therefore, heritage institutions need to adopt a position of climate resilience; they must “anticipate, absorb, and adapt” to the effects of climate change to preserve cultural heritage for future generations [10]. One crucial step is to understand how the future climate may affect the environments surrounding heritage objects whether they are ...
Revealing and Reconstructing Hidden or Lost Features in Art Investigation
In recent decades, cultural heritage research—and in particular art investigation—has been undergoing a digital revolution. This is due both to improvements ...
Reducing Bias in AI-Based Analysis of Visual Artworks
Empirical research in science and the humanities is vulnerable to bias which, by definition, implies incorrect or misleading findings. Artificial intelligenc...