Behind the Paper

How Can AI Empower Disabled Communities? An Experimental Study of AI-Driven Audio Description for Visual Arts

Lying at the intersection of accessibility studies and translation studies, this study exemplifies how emerging technologies can empower humanity and enrich our experience in the digital information society.

Globally, vision impairment and blindness affect a substantial and growing proportion of the population. According to the World Health Organisation, at least 2.2 billion people worldwide live with some form of near- or distance-vision impairment, and in at least 1 billion of these cases, the condition could have been prevented or has yet to be addressed. As populations age, this number is expected to rise further, intensifying the need for inclusive social and cultural participation. To foster a more equitable global society, it is therefore imperative to address the needs of blind and visually impaired communities, ensuring meaningful access to the arts. Audio Description (AD) is a service that provides verbal narration of key visual elements in media, enabling blind and visually impaired audiences to engage more fully with visual content.

My research interest in AD emerged from my teaching experience in a Digital Translation course, where I was introduced to this form of intersemiotic translation from the non-verbal visual channel to the verbal auditory channel. More importantly, I became increasingly interested in the potential of translation as a means of fostering accessibility and contributing to a more inclusive society. While based at the University of Auckland, I learned that a Computer Science Capstone Course was seeking project proposals. I recognised this as an ideal opportunity to explore the intersection of translation studies and accessibility studies, supported by emerging technologies. I therefore proposed a project aimed at developing an application capable of providing automated AD services for visual artworks, particularly paintings. The proposal was selected for the Capstone Course group project in Semester 1, 2025. Working collaboratively with a team of students, we developed a mobile application, Chromeco, an AI-driven system designed to generate audio descriptions for paintings.

The app is designed to enhance accessibility to visual arts for blind and visually impaired users by enabling them to generate audio descriptions of paintings in real time. By simply capturing an image of an artwork using a mobile device, users can access automatically generated AD that conveys key visual elements of the painting. The system has been trained using carefully developed linguistic and content-oriented guidelines. In particular, the content framework incorporates genre-specific Artwork-Type Description Guidelines, which ensure that descriptions are sensitive to the stylistic and compositional features of different genres of paintings. To improve both accuracy and usability, the system adopts a human-in-the-loop approach, whereby generated descriptions have been reviewed and refined by our research team. This process allows the AI to produce descriptions that are not only clear and informative but also vivid and engaging across diverse artistic genres. Nevertheless, the application raises important ethical and legal considerations, particularly with regard to the copyright of the artworks being described. Addressing these concerns will require the establishment of collaborative partnerships with museums and art galleries, ensuring that accessibility initiatives can be developed in a manner that is both sustainable and respectful of intellectual property rights. Addressing these challenges should be a key priority for future development.