Using ASR Tools to Produce Automatic Subtitles for TV Broadcasting: A Cross-Linguistic Comparative Analysis

Elena Davitti; Annalisa Sandrelli; Tomasz Korybski; Yuan Zou; Constantin Orasan; Sabine Braun

doi:10.47476/jat.v7i2.2024.305

Authors

Elena Davitti University of Surrey, Centre for Translation Studies https://orcid.org/0000-0002-7156-9275
Annalisa Sandrelli UNINT - University of International Studies, Rome https://orcid.org/0000-0001-6010-4862
Tomasz Korybski Centre for Translation Studies, University of Surrey https://orcid.org/0000-0003-2353-0816
Yuan Zou Centre for Translation Studies, University of Surrey https://orcid.org/0000-0001-8774-9978
Constantin Orasan Centre for Translation Studies, University of Surrey https://orcid.org/0000-0003-2067-8890
Sabine Braun Centre for Translation Studies, University of Surrey https://orcid.org/0000-0002-6187-3812

DOI:

https://doi.org/10.47476/jat.v7i2.2024.305

Abstract

This paper discusses the potential use of Automatic Speech Recognition (ASR) tools to produce intralingual subtitles for broadcasting purposes. Two different ASR tools were trialled by an international broadcaster to produce automatic subtitles for pre-recorded content in English and in Italian, a British talk show and a US feature film dubbed into Italian.

A study was commissioned to compare the performance of the two tools on the materials. Our evaluation focused on two key dimensions: the accuracy of the transcript and the readability of the subtitles. Accuracy was assessed quantitatively by using an adaptation of the NER and NTR models (Romero-Fresco & Martínez 2015, Romero-Fresco & Pöchhacker 2017), which focuses on ASR-generated errors and categorises them by error type (content- or form- related) and by level of severity (minor, standard and critical). Readability was assessed qualitatively by analysing text segmentation, namely line breaks and subtitle breaks. Our findings indicate that all the ASR outputs fell short of the 98% accuracy threshold expected in the broadcasting industry, although performance was notably better in English. Moreover, subtitle segmentation and timing were found to be relatively poor in the subtitles produced by both tools in both languages. Therefore, the ASR-generated subtitles from the samples provided by the broadcaster can only be considered an intermediate step. Substantial human input is required before the tools can be put to work (customisation) and after the ASR has generated the subtitles (human post-editing) to produce broadcast-ready subtitles.

Lay summary

This paper explores using Automatic Speech Recognition (ASR) tools to create intralingual subtitles (i.e. in the same language) for pre-recorded content. The study was commissioned by an international broadcaster to test two ASR tools for generating subtitles in English and Italian, covering a British talk show and a US film dubbed into Italian. The study compared the tools' performance, focusing on two dimensions, i.e. subtitles’ accuracy and readability. The former, i.e. accuracy, was measured using a model that enabled us to categorise and weigh errors generated by the ASR tools. The latter, i.e. readability, was measured by considering lines and subtitles breaks. The evaluation revealed that both tools fell short of the industry's expected 98% accuracy, especially in Italian. Additionally, subtitle segmentation and timing were found to be subpar in both languages. Consequently, substantial human involvement, including customisation and post-editing, is necessary to produce high-quality broadcast-ready subtitles.

Downloads

Download data is not yet available.

Author Biographies

Elena Davitti, University of Surrey, Centre for Translation Studies

Elena Davitti is Associate Professor (Reader) at the Centre for Translation Studies, University of Surrey (UK). Her research interests include hybrid modalities of spoken language transfer, particularly methods for real-time interlingual speech-to-text and how increasing automation of these processes would modify human-led workflows. Elena has been Principal Investigator in the ‘SMART’ project (Shaping Multilingual Access with Respeaking Technology, 2020-2023, Economic and Social Research Council UK, ES/T002530/1) on interlingual respeaking with an international consortium of national and international academic and industrial collaborators. She is currently leading the ESRC IAA (Impact Accelerator Account) SMART-UP project, on interlingual respeaking upskilling. Elena has also published extensively on communicative, interactional and multimodal dynamics of interpreter-mediated interaction (both face-to-face and technology-mediated), and she has been co-investigator on several EU-funded projects on technologies applied to interpreting, particularly video-mediated interpreting (WEBPSI, AVIDICUS 3, SHIFT in Orality) and innovations in interpreter education (EVIVA). Elena has been invited to serve on the boards of projects and organisations in her fields of research (e.g. GALMA, IATIS) and is co-editor of the Journal Translation, Cognition & Behavior (John Benjamins).

Annalisa Sandrelli, UNINT - University of International Studies, Rome

Prior to joining UNINT (Rome) as a Lecturer in English, Annalisa Sandrelli taught at the universities of Trieste and Bologna/Forlì, and was Marie Curie TMR Fellow and Lector in Italian at the University of Hull. She has published widely on corpus-based interpreting studies, audiovisual translation (dubbing, subtitling, respeaking), EU English, and Computer Assisted Interpreter Training (CAIT). She is a member of the European Society for Translation Studies (EST), of the European Association for Studies in Screen Translation (ESIST), and GALMA (Galician Observatory for Media Accessibility). She is currently involved in a project on live subtitling for the press conferences of the Venice Film Festival (ARTS- Accessibility via Real time Subtitling) and a project on upskilling for interlingual respeaking (SMART-UP).

Tomasz Korybski, Centre for Translation Studies, University of Surrey

Tomasz Korybski is an Assistant Professor at the Institute of Applied Linguistics, University of Warsaw, Visiting Researcher at the Centre for Translation Studies (University of Surrey) and a conference interpreter/translator with over twenty years' experience. As a member of international research teams, Tomasz regularly publishes and presents on topics such as the evaluation of interpreting and respeaking quality and the applicability of AI-based solutions in the provision of interpreting and respeaking services.

Yuan Zou, Centre for Translation Studies, University of Surrey

Yuan Zou is a Lecturer in Translation Studies at the University of Surrey's Centre for Translation Studies (CTS). Yuan's background is in audiovisual translation (AVT), interpreting, and post-editing. She holds a PhD in AVT from Queen's University Belfast (QUB) and an MTI in Translation and Interpreting from Jilin University. Currently, Yuan's research focuses on integrating language technologies into interpreting and AVT to make digital content more inclusive and accessible.

Constantin Orasan, Centre for Translation Studies, University of Surrey

Constantin Orasan is Professor of Language and Translation Technologies at the Centre of Translation Studies, University of Surrey and a Fellow of the Surrey Institute for People-Centred Artificial Intelligence. He has over 25 years of experience in Natural Language Processing and Artificial Intelligence (AI). His current research focuses on the use of Large Language Models and Automatic Speech Recognition (ASR) in translation and interpreting, as well as on the development of novel assessment methods for these fields. He is currently leading EmpASR, a project which aims to understand how interpreters, language service providers and other users of ASR services can benefit from the latest developments in AI. More details can be found at https://dinel.org.uk

Sabine Braun, Centre for Translation Studies, University of Surrey

Sabine Braun is full Professor of Translation Studies, Director of the Centre for Translation Studies at the University of Surrey, and a Co-Director of Surrey’s Institute for People-Centred AI. Her research explores human-machine interaction and integration in different forms cross-lingual and cross-modal mediation (e.g., interpreting, audio description) to improve access to critical information, media content, and essential public services. In 2024, she launched a Leverhulme Trust-funded Doctoral Training Network on AI-Enabled Digital Accessibility (ADA).

Using ASR Tools to Produce Automatic Subtitles for TV Broadcasting

A Cross-Linguistic Comparative Analysis

Authors

DOI:

Abstract

Downloads

Author Biographies

Elena Davitti, University of Surrey, Centre for Translation Studies

Annalisa Sandrelli, UNINT - University of International Studies, Rome

Tomasz Korybski, Centre for Translation Studies, University of Surrey

Yuan Zou, Centre for Translation Studies, University of Surrey

Constantin Orasan, Centre for Translation Studies, University of Surrey

Sabine Braun, Centre for Translation Studies, University of Surrey

Downloads

Published

How to Cite

Issue

Section

Information