Terminology in the Brain of the Machine


Banner Terminology in the Brain of the Machine

Ensuring Consistency and Quality of Neural Machine Translation 

The European Institutions count some 5000 translators spread over 10 Institutions and translating thousands of pages into 24 languages. The 552 possible language combinations cover 110 fields of legislation making up to 80% of the national legal systems of the 28 member states.

This overflow of numbers comes to stress the high importance of the quality of the translated texts and of the need for their absolute consistency since they can become the object of legal disputes or make part of international agreements.

At the same time, the European Parliament alone translates on average more than 2 million pages per year and this huge amount of work combined with very strict deadlines makes the use of any kind of technical assistance very attractive.

The European Institutions have always been using the most advanced tools for computer-assisted translation, translation memories, and term extraction tools. They also often needed to develop tools internally or to customise largely the ones bought off the shelf, because of the specific needs of such a multilingual work.

One major tool for the linguistic consistency and quality assurance of the EU translation has always been its common terminology interactively collected and provided by the biggest multilingual database IATE, managed and fed in common by all Institutions and containing some 8 million terms. Its new version offers all possibilities of connection with any software through APIs and web services and very advanced features like term recognition and downloading of domain-specific and filtered term-bases for each text to translate.

As a Terminology Coordination service, we have always made sure that the terminology resources and platforms are connected with the CAT tools and available on the translators’ desktop.

This has been technically easy both with the statistical machine translation tools and with the CAT tools, where you can impose what resources the machine uses to produce the automated translation. The human keeps the total control of teaching the machine.

Neural MT, using artificial intelligence, represents a risk for being “too clever”. This autodidact tool hardly accepts human orders for filtering or for selecting given terminology. Of course, everything is feasible in IT development as our IT colleagues use to say, but it becomes more and more complex and costly in resources and time.

Nevertheless, we are very concerned about the quality of our final output with the unavoidable increased use of tools based on artificial intelligence.

We realise that NMT can be forced to use only very high-quality terminology resources, i.e. highly reliable, normative terms. This led us to step up our efforts to eliminate the noise from our database, to clean it from old and outdated data and to create domain-related collections offering advanced filtering possibilities.

Facing this new challenge for terminology, we also have to reconsider our policy consisting in inserting our resources and features in the pre-translation phase and to focus more on the post-editing and quality checking tools.

In this vertiginous evolution of the tools facilitating translation, we notice that whenever the machine demonstrates a new way of “thinking”, there is every time concern about the quality and linguistic consistency and there is always a pressing request to us to ensure the easy and efficient access to and use of the terminology. It is and will always remain the means to ensure quality.

From now on and at every future step of the evolution of linguistic tools, IT professionals need to be aware of the importance of terminology but also a closely cooperate with linguists in order to integrate it in the best way and to profit from the terminology resources at the most appropriate stage of the process.

Written by Rodolfo Maslias

Terminology Coordination

European Parliament