Iacopo Ghinassi, a PhD student in EECS, is working on a project with BBC Research and Development to automatically segment media content using Artificial Intelligence.
EECS PhD student, Iacopo Ghinassi, has been working with BBC Research and Development on a project that uses Artificial Intelligence (AI) to automatically segment and annotate different types of media content.
The project is part of Iacopo’s DAME’s doctoral training programme at the School of Electronic Engineering and Computer Science and has led to the publication of a first paper, with another to come.
The ever-growing amount of media content published each day makes it extremely challenging for human editors to consistently segment and annotate it. However, segmentation and labelling of media content are necessary to make short-form content available. For example, if someone is looking for a particular piece of news from a programme aired one month ago, some pre-segmentation and/or annotation of the news story in the show would be a massive help!
So how can we segment and annotate media content without direct human effort? It probably is no surprise that the answer is artificial intelligence (AI). PhD student, Iacopo Ghinassi, has been working with BBC R&D on ways to solve this problem as part of our Data Science Research Partnership.
I have been working on a fascinating project that uses AI to segment and annotate TV and radio programmes automatically. The project is part of my DAME doctoral training programme at Queen Mary University of London that the BBC sponsors. The BBC has provided valuable data and continuous support that allowed me and my supervisors (Dr Huy Phan and Prof Matthew Purver) to investigate new ways of automatically understanding the content of media.
'Understanding' is, in fact, crucial to solve the problem of segmenting an otherwise undivided piece of content, such as a news show or a podcast. We aim to segment content by topic, meaning that an automatic system needs to 'understand' when the topic changes. To achieve this, we turn to the branch of AI that is concerned with understanding human language. Sounds and acoustic elements are also explored, but understanding language is crucial if we want to isolate a self-contained section of the programme on one topic and, eventually, label the segment with the topic itself.
In a sense, this is not too different from what a search engine does when trying to return results relevant to your query. That's why our research takes a different direction from previous research on the topic - by investigating models and techniques from AI that are closely connected to advances in fields such as semantic search. A general understanding of language like this could be a unique way to segment and label content - recognising different topics and the way they appear within the programme. If our algorithm has a good understanding of the content, we can then potentially adapt it for things like automatic summarisation at little or no cost!
We have presented our research at an important academic workshop about the broadcasting industry’s use of data science, which led to the publication of a first paper. Another paper is on its way documenting the latest system built with this approach that managed to correctly segment a set of 270 news programmes from the BBC News Channel more than 90% of the time. This system has been adopted by R&D in a prototype news segmentation system called Yuzu, which will be used to explore potential applications for automatic segmentation.
Much more has yet to come, though! The potential that AI and data science have in helping shape processes and media consumption is, if not limitless, very far-reaching. I’m glad to have had an opportunity to lay a (small) tile on that path.