EECS researchers are using AI techniques to make it easier for journalists to write news, and for people to access it, in lesser-known European languages.
In today's society, accessing the Internet is considered a necessity for everyday life and community engagement. However, with 37 languages used across the EU, language barriers are preventing fair access to necessary information for all citizens, and making it hard for the news media industry to provide it. Most websites and online services for citizens are developed using the local, national language. Translations into a second language (usually English) might be created when critically necessary; but the constant increase in generated web content, multiple and fast-changing content streams, and an expanding user interest base make this strategy unsustainable.
English speakers are used to having a range of automated tools to help them search and produce content. Advanced natural language processing tools can help readers search for the content they want, help journalists write stories automatically, and help editors link content across stories and media. However, these only exist for a few dominant languages (e.g. English, French, and German). They haven’t been developed for many of Europe's smaller language communities or the news media industries that serve them, and would be expensive and time-consuming to build from scratch.
EECS researchers are developing tools to address these problems and level the linguistic playing field by applying innovations in the use of AI and cross-lingual embeddings; a representational model of language allowing word meanings to be mapped between languages automatically.
Dr Matthew Purver, Reader in Computational Linguistics at EECS and Prof Massimo Poesio, Professor of Computational Linguistics, are the QMUL Principal Investigator and Co-Investigator of the EMBEDDIA project, a €3m EU Horizon 2020 project.
EMBEDDIA investigates the use of deep neural networks to develop systems that can learn how to transfer existing tools developed for well-resourced languages such as English, French and German into less-resourced languages such as Croatian, Slovenian and Estonian, without the need for extensive computational resources or relying on error-prone machine translation.
The novel solutions created for under-represented languages will be tested in real-world applications by the project partners, including the largest news media organisations in Finland, Estonia and Croatia.
So the questions begs, was this news article written by a human or machine?
Tweet us your thoughts!