Catalan terminology for IATE


The collaborative work as a means of upgrading IATE’s terminological records

termcat snip from video 1

The Terminology Coordination Unit of the European Parliament (TERMCOORD) and the Catalan Center for Terminology (TERMCAT), have initiated a collaboration to enrich the contents of IATE, the European Union multilingual terminological database. To be precise, TERMCAT will provide the internal IATE version with new terminology in Catalan, in a joint effort to improve and extend it.

TERMCOORD and TERMCAT have reached an agreement with the aim to enrich and update the contents of IATE with relevant terminological data from TERMCAT. As a first step to materialize this agreement, TERMCAT will provide TERMCOORD with Catalan designations for concepts already included in IATE. Existing terminological entries, thus, will be completed with Catalan terms.

Institutional cooperation

SONY DSCBoth TERMCOORD and TERMCAT carry out a common task of terminology management and diffusion. TERMCOORD is in charge of coordinating the terminology work in the European Parliament’s translation units and co-managing the IATE database toghether with other EU institutions. TERMCAT has the mission of coordinating terminological activities in Catalan, making Catalan terminology available and enhancing its use. In order to fulfill this undertaking, TERMCAT creates terminological products, standardizes neologisms and provides advisory services.

However, TERMCOORD and TERMCAT are not the unique institutions to take part in this project. The challenging goal of introducing Catalan terms to IATE would not be achieved without the expertise of Universitat Oberta de Catalunya (Open University of Catalonia, UOC) and its research group specialized in natural language processing and automatic extraction of terminology.

Providing Catalan terms to IATE

IATE is a vast terminological database with approximately 8.7 million entries. Due to the magnitude of data, it has been necessary to delimit the number of terminological entries to be completed with Catalan designations. For that reason, only the most reliable entries have been selected; that is, only reliable and very reliable entries (those having a 3-star and 4-star reliability code) have been provided with Catalan terms. As a result, around 35,000 terminological entries have been selected.

The process of providing Catalan terms consists of two major steps: in the first place, the automatic search and extraction of Catalan designations equivalent to English terms, and in the second place, the validation of the obtained data.

Park Guell in Barcelona, SpainCatalan equivalences have been automatically completed by UOC making use of TERMCAT’s dictionaries and terminological resources as a source for Catalan terms. Matches between English and Catalan terms have been established using English, as the main source language, and Spanish, as a secondary source language (that is to say, using Spanish as a pivot language). This means that the match between an English term and a Catalan term has been established either through the English designation or through the equivalent Spanish designation, contained in both IATE and TERMCAT databases. The equivalences have been obtained regardless of the domain area. In case no match has been found, Catalan equivalents have been extracted from Wikipedia articles that have English and Catalan versions.

This first step has already been executed and at present a team of specialized terminologists coordinated by TERMCAT is verifying the appropriateness and accuracy of the results obtained from the automatic extraction of Catalan designations. In other words, the role of terminologists is to make sure that all Catalan terms are correct in accordance with the concept and its domain area by checking them one by one. So far, approximately 8,000 IATE terms have a validated Catalan equivalent.

Once the validation process is performed, Catalan terms will be incorporated to IATE. These terms will be inserted in the internal version of IATE; that is, the data will not be open to the general audience, as the public site of IATE contains exclusively terminology in the 24 official European Union languages. Thus, Catalan terms will be available for internal use only and just translators and terminologists of the European institutions will have access to it. This contribution is of remarkable relevance, since Catalan terminology could be part of the open content of IATE if Catalan ever becomes an official language of the European Union. In the meantime, TERMCOORD plans to publish TERMCAT terminology on a separate page of TERMCOORD website.

A view towards the future

2012IATE is a broadly used terminological database serving as a source of documentation for many language professionals, including, apart from the European Union translation units, a great number of translators in different countries. Thanks to its multilingual content in a wide range of domains, IATE has become a must in a specialized translator’s bookmark list. For that reason, providing Catalan terms constitutes an opportunity for IATE to expand in terms of new potential users.

As an example, having Catalan terms would be a plus for translators who use computer-assisted translation tools, now that IATE terminology can be downloaded in a translation memory exchange format. Translators could add IATE terms to their own terminological database and, thus, would be able to find accurate terminology in any language pair (with Catalan being either a source or target language) in a direct and simple manner, without using a relay language.

The cooperation between TERMCOORD and TERMCAT can be seen as well as a first step to give Catalan greater visibility in the European context, which would have special significance with regard to the institutional and commercial relations between Catalan-speaking territories and the European Union.

Anna Nin Aranda