AI and Heritage: Practical Skills for Extracting Information from Historical Documents

key details

6, 11 and 13 March 2025
Online on Zoom
3pm — 5pm (CET)

about

The objective of this course is to introduce techniques and resources that help research on the digitisations of historical documents, in particular in the context of architectural and urban history. Over three two-hours sessions, we will cover all the steps that allow to move from a set of scanned historical documents to a structured geo-historical database.

The sessions introduce techniques as varied as image segmentation, text recognition, alignment on external databases, and geocoding. It will combine both lectures and hands-on tutorials on provided data. There will also be space for interested researchers to ask questions about their own data, and how studied techniques can be applied.

The lectures present more general theoretical concepts of information processing understandable by a professional but untrained audience; Knowledge of coding in Python is expected for practical tutorials. Other specific knowledge in computer science is also welcome.

For any questions on the content of this course, please contact: paul.guhennec@epfl.ch

Programme

March 6, 2025

From archive to segmented image

  • Introduction to the automatic processing of large digitised archives. Motivation and case studies;
  • Lecture: Image segmentation;
  • Tutorial.

March 11, 2025

From segmented image to extracted text

  • Lecture: Text segmentation, alignment on authority databases, named entity recognition;
  • Tutorial.

March 13, 2025

From extracted text to geo-historical database

  • Lecture: Geocoding and alignment on Linked Open Databases;
  • Tutorial;
  • Summary case study on corpus analysis.

Lecturers

Frédéric Kaplan

He is the director of the College of Humanities at the École Polytechnique Fédérale de Lausanne (EPFL). He also holds the Chair of Digital Humanities and is President of the Time Machine Organisation, a non-profit association of over 600 institutions. He is the author of a dozen books, translated into several languages, and over a hundred scientific publications.

Isabella di Lenardo

She is a researcher at EPFL’s Digital Humanities Institute, specializing in Art History, Digital Humanities, and Digital Urban History. She focuses on the circulation of artworks and historical cartography. Leading the Time Machine Unit at EPFL, she uses AI and digitization to explore Europe’s cultural heritage. She is the PI on European projects and co-PI on the SNFS-funded “Parcels of Venice” project.

Paul Guhennec

He is a post-doctoral researcher in the Digital Humanities Laboratory at the École Polytechnique Fédérale de Lausanne (Switzerland). His research focuses on the usage of new computational methods for architectural history, and their epistemological implications. He recently finished his doctoral dissertation on Venice’s urban history, with a focus on vernacular domestic architecture.