top of page
DSC01751-蔡宗翰老師.jpg.jpeg

From Collection to Search: Challenges and Strategies in Building Humanities-Oriented Generative AI

Tzong-Han Tsai

Professor, Computer Science and Information Engineering, National Central University

Abstract

With the rapid advancement of artificial intelligence technologies, language models have gradually evolved from processing text to incorporating speech. However, audio data is far more complex and difficult to handle than text. Speech not only conveys content but also includes intonation, emotion, accent, and even background noise. This talk will introduce the technological development of speech-based language models.

This talk, titled *“From Collection to Retrieval: Challenges and Strategies in Building Generative AI for the Humanities,”* will offer an in-depth look into my hands-on experience with the **Taihucais** project.

It will begin with an explanation of how we established structured collection and quality control workflows using historical texts and archival materials. Then, it will focus on the automatic benchmark generation methodology we developed — a process that automates the construction of evaluation metrics to objectively measure retrieval accuracy and generation quality.

The presentation will also delve into the core technologies behind our RAG (Retrieval-Augmented Generation) system, including vectorized indexing structures, retrieval result fusion, and prompt optimization. Additionally, it will discuss strategies for balancing data privacy with maintaining strong retrieval and generation capabilities when selecting models.

Finally, the talk will explore how this framework can be extended to the **National Central Library’s Taihu system**, as well as practical recommendations for open data platforms and industry applications. The ultimate goal is to build an efficient and sustainable AI-powered search ecosystem for humanities research in Taiwan.
 

bottom of page