Interview with Koen Kerremans

2282

Koen Kerremans is an assistant professor at the department of Linguistics and Literary Studies (Faculty of Arts and Philosophy) of Vrije Universiteit Brussel (VUB) where he teaches courses on research methodologies in applied linguistics and translation studies, terminology, technical and scientific translation and translation technology.

Why is Terminology your passion? banner - Photo of Koen Kerremans

1. You are an assistant professor at the department of Linguistics and Literary Studies (Faculty of Arts and Philosophy) of Vrije Universiteit Brussel (VUB) where you teach courses on research methodologies in applied linguistics and translation studies, terminology, technical and scientific translation and translation technology. What is your background? When did you discover your passion for these fields and how did your interest in terminology start?

I obtained my master’s degree in Gemanic Philology (English-Dutch) at the University of Antwerp. During my studies, I became interested in language technology (I had always been fond of technology) and wanted to learn more about the underlying mechanisms of natural language processing (NLP). I decided to pursue an advanced inter-university master’s degree in language sciences (KU Leuven, University of Antwerp, University of Ghent, Vrije Universiteit Brussel) in which I took several theoretical and practical courses on computational linguistics, machine learning, software programming and NLP. During this period, I remember that I was already fascinated by translation technology – which I only associated with machine translation at that time – and doing research. I was very keen about the idea of working as a researcher at the university (although I knew that the job opportunities were scarce and chances of obtaining a job position were therefore very unlikely).

Soon after my graduation, however, I received news from my supervisor about a job offer at the department of applied linguistics of Erasmushogeschool Brussel. Prof. dr. Rita Temmerman – who was the research director of the Centre for Special Language Studies and Communication (which is now part of the VUB research group ‘Brussels Institute for Applied Linguistics’) – was looking for a linguist with programming skills to conduct research on multilingual terminology in a EU-funded project about ontology-based knowledge systems in the domains of fraud detection and prevention. I must admit that my understanding of terminology as a research discipline was very limited when I started working as a researcher in this project. But Rita Temmerman pointed me to the important books and other publications I needed to read about terminology and introduced me to many interesting researchers in the field. I went to conferences and started writing about my research and, gradually, my research interest focused more on terminology and technology applied to translation.

The former department of applied linguistics of Erasmushogeschool Brussel is now part of the department of linguistics and literary studies of Vrije Universiteit Brussel. As a member of the teaching staff, I’m very happy to be able to share my interests in terminology, translation and translation technology and my research expertise and methods with students enrolled in the master programmes of translation and interpreting.

2. Your doctoral research focused on terminological variation in multilingual Europe. What are your observations about the EU’s Interinstitutional Terminology Database, IATE?

In my PhD research, I studied how intralingual terminological variants (i.e. different ways of referring to specialised concepts) in a specialised corpus of (English) source texts tend to be translated into French and Dutch. I concentrated on EU texts related to environmental topics. My main argument throughout my dissertation was that choices regarding intralingual variation as well as the different translations of these variants in the target texts (interlingual variation) are determined by several contextual factors (which I described in more detail in my dissertation). I explained that for translators, it is important to know these different linguistic options when translating terms and to know in which situational contexts or registers certain options are more likely to be used.

To this end, translators can consult structured bi- or multilingual resources of several types, such as bilingual or multilingual glossaries, specialised dictionaries, terminological databases, etc. Such structured resources, however, can never fully cover the wealth of linguistic options that appear in specialised texts, nor do they represent the many translation choices that one can find for a source language term in specialised corpora. In order to illustrate these points, I compared the data derived from my corpus-based comparative analysis with data extracted from terminological records (for a selection of concepts) in IATE. I must say that this was quite a time-consuming process because, during my research, it was not yet possible to download IATE in the TermBase eXchange format (TBX) and so I had to manually look up relevant terminological records based on the terms extracted from the research corpus and save them one by one as separate html files. Fortunately, this task has become superfluous and IATE is easier to use for research or translation purposes now that it is available as a downloadable resource.

The IATE database is a successful product of the inter-institutional cooperation between the Translation Centre for the Bodies of the EU (CdT) – which launched the project in the beginning of 2000 – and other EU translation services. The aim of the project was to centralise the terminological activities within the EU and to reduce or eliminate the duplication of effort, which was due to the fact that, before IATE was launched, different EU institutions managed their own terminological databases. These databases were merged into the new inter-institutional terminology base IATE, which explains why up until today, the database still contains many overlapping or duplicate entries. This was one important observation during my research, which allowed me to study in more detail the variation between terminological entries created by translators/terminologists from different EU institutions. For my research purposes, it was therefore very interesting to be able to include these overlapping terminological entries in my study. I can imagine, however, that from the perspective of possible end-users of the database, such as translators, the overlapping entries may in some cases cause some confusion when choosing a proper term to use in a specific context.

3. In your study ‘Terminological variation in multilingual Europe. The case of English environmental terminology translated into Dutch and French’ you examine the intra- and interlingual variants that can be found in parallel corpora of European texts. Could you please comment on the representation of variants in IATE database?

The IATE database is an example of a conventional (onomasiologically-structured) terminological database. This means that it consists of concept-oriented terminological records which aim to describe or define specialised concepts. Ideally, each concept corresponds to one terminological record and information about the concept is considered to be language-independent. However, given the fact that IATE is used as one of the instruments to harmonise the EU conceptual system and its corresponding terminology, users can also find in it information about possible conceptual differences between institutions or member states in separate fields and language dependent information, such as terminological variants in different languages and specifications of register use or preferences. All this information, which has been manually added to the database and verified by terminologists, proves to be of important value for any user aiming at acquiring (contrastive) knowledge about specialised concepts and terms used in specialty areas of the EU.

Several possibilities are offered in IATE to structure or represent different types of terminological variants in different languages. Overall, a distinction can be made between information fields simply listing the possible forms – such as the ‘Term’ and ‘LookUp Form’ fields – fields providing extra information about the contextual usage of each variant – such as the ‘Note’, ‘Language usage’ and ‘Regional usage’ fields – and fields providing additional features of the different variants – such as the ‘Grammatical info’ or the ‘Reliability’ field. I studied how terminological variation was represented in a selection of more than 1000 IATE terminological records and was for instance able to observe quite some differences in the way different types of terminological variants (e.g. morphological variants, syntactic variants, formulas, shortened forms, etc.) were added to the terminological records. These differences are obviously due to the fact that all terminological entries are manually created by different people (working in different institutions) who probably have different opinions about what types of terminological variants should be included and to what fields these variants should be added.

One can also see that – due to the concept-oriented structure of the database – each terminological record lists terminological variants in several languages that are direct (or cognitive) equivalents (in the sense that they refer to the same concept). Consequently, this type of database structure does not provide translators with suggestions for potential ‘alternative’ translations for specific source language terms (other than direct equivalents in the target language). In my view, this is a limitation of such terminological databases because in reality, and depending on the context, translators may sometimes decide to translate a source language term by means of a conceptually-related term in the target language (a term which is not a direct equivalent of the term in the source language). Examples that I encountered in my corpus were, amongst others, the English term ‘air pollution’ translated in a staff working document of the European Commission as ‘qualité de l’air’ in French (‘air quality’) or the English term ‘biological invasions’ translated as ‘IS’ in Dutch – the abbreviated form of ‘Invasieve Soorten’ (‘invasive species’) in Dutch – in an opinion of the European Economic and Social Committee. In a conventional (onomasiologically-structured) terminological database, it is difficult to link a term to potential translations in a specific target language, other than the direct equivalents appearing in the same terminological record.

4. You suggested presenting intra- and interlingual terminological variants in graphs to visualize data in a more flexible and dynamic way. Could you tell us more about these graphs?

In a chapter of the recently published book entitled ‘Multiple Perspectives on Terminological Variation’ (edited by Patrick Drouin, Aline Francœur, John Humbley and Aurélie Picton and published by John Benjamins), I proposed a new type of multilingual terminological resource presenting terms and their possible translations in a dynamic graph visualisation. The best way to picture this is to think of a mindmap in which terms are linked to one another (within the same language and across several languages). Research has shown that graph visualisations facilitate learning processes in different domains. Thanks to technological innovations, such representations are now also increasingly used to visualise data in lexicographical and terminographical resources. An example of the former is Visual Thesaurus (for English general language). EcoLexicon is an example of a multilingual terminological knowledge base dealing with environment concepts, developed by the LexiCon Research Group at the University of Granada.

In the graph visualisation that I propose, links in the network represent relations of intralingual variation as well as interlingual variation. These links are derived from what I call ‘term-based translation units’ that are extracted from specialised translation (or parallel) corpora. Several automated methods and approaches have been worked out by computational linguists to automatically extract clusters or sets of terminological variants within the same language or even across several languages. Pattern-based systems, for instance, are able to identify intralingual variants based on a set of linguistic markers or patterns – for instance a pattern ‘[X] is also called [Y]’ whereby X and Y are variables – whereas statistical methods are able to identify patterns of intra- and interlingual variation based on the number of times the X and Y tend to co-occur, either in the same text (in order to recognize patterns of intralingual variation) or in a source text and its translation (in order to determine patterns of interlingual variation). I know that I am simplifying things here but the essence is that such methods or tools can help us in developing resources of term-based translation units, based on (large-scale) multilingual corpora.

Each term-based translation unit consists of a source language term and its translation and features additional metadata about the concept which the source language term refers to, and the text from which it was extracted (e.g. the author, text type or register). This meta-information is quite essential in the proposal because it allows for the mindmap to become ‘dynamic’. Metadata are needed to search for or filter out information in the mindmap in a more advanced way. A user could for instance be interested in looking at variants occurring in a specific subregister of the corpus or appearing in texts from a specific source. He or she could zoom in on or out of certain regions in the mindmap depending on a selection of contextual parameters. In other words, changing the contextual conditions causes direct changes in the network of intra- and interlingual variants.

5. In ‘Illusion of terminological precision and consistency: a closer look at EU terminology and translation practices’, you carried out a comparative study on denominative variation in EU source texts and translations. What were your main findings? Did you notice any specific patterns or trends that appear in specialized texts? How can the results of this study be incorporated in multilingual terminological resources for the benefit of future translations?

That article was presented during the conference ‘Meaning in translation: illusion of precision’, held in Riga (in 2012). I argued in my article that adhering to the principles of terminological precision and consistency is difficult to maintain in the context of EU translation practices. Many efforts have been made at the EU level in promoting terminological precision and consistency, mainly in a context of EU legal drafting. There are for instance EU guidelines on terminology use – e.g. the Joint Practical Guide for Persons involved in the drafting of EU legislation – or on how EU translators/terminologists need to enter data into the IATE database, with a view to harmonising terminology.

Despite these efforts, terminological variation is a common phenomenon in EU texts and this is what I tried to show on the basis of a comparative study of EU source texts and their translations. Reasons for this variation can be explained on different grounds. Studies emphasising the cognitive aspects of terminology, for instance, have shown that our knowledge of a subject field is not build up by concepts with clear-cut boundaries but rather by (what Rita Temmerman in her book ‘Towards New Ways of Terminology Description: The Sociocognitive-Approach’ has called) units of understanding. Seen from this perspective, for instance, an author might introduce multiple (cognitively-motivated) terminological variants in a text to emphasise different features or aspects of the same unit of understanding. Think of the example of ‘killer slug’ – emphasising the destructive features of this type of invasive species – or the variant ‘Spanish slug’ – whereby emphasis is placed on its original location. Studies focusing on the communicative aspects of terminology show that variation may also be the result of different communicative intentions that are associated with text registers and genres. One can for instance hypothesise that an academic article is more likely to contain ‘neutral terms’ to refer to a phenomenon (e.g. ‘biodiversity reduction’ or ‘biodiversity loss’) in comparison to an opinion text or a policy document describing the same phenomenon (e.g. ‘biodiversity destruction’).

Observations resulting from my comparative study of EU texts were that patterns of variation in (English) source texts tend to be reflected in the target texts (in other words: terminological variation in the source texts tends to be translated) and that, due to different contextual parameters that translators need to take into consideration, more interlingual variation was encountered in the translations.

As I explained earlier, I think that intra- and interlingual variants, obtained on the basis of corpus-based analyses, can be used to develop (what I consider) a new type of translation resource – because it is not a conventional terminological database or a translation memory. On the basis of this resource, we would be able to show to translators what typical terminological variants are used to refer to a specific unit of understanding in a specific register and how these can be rendered in the target language.

6. In your opinion, what is the current role of terminology in applications related to machine translation or software localization?

Correct and consistent use of terminology has always been an important parameter for assessing the quality of localised/translated software products. I don’t think that the role or function of terminology (and terminological databases) has changed with the emergence of these new products and accompanying technologies.

Setting up and maintaining a terminological database can be a very time-consuming process but is essential in any software localisation project, not only because of the many releases of new versions of the same product but also because of the fact that the same terminology also needs to appear in the software manuals or help files, as well as in the marketing materials related to the software product (e.g. the company’s webpages). A terminological database is needed to make sure that all these localised products are, terminologically speaking, aligned.

Whereas in software localisation projects – involving human translators – a terminological database is used for speeding up the translation process but also for consultation purposes, I think that these aims are becoming less relevant in machine translation projects. Previous knowledge-based machine translation systems heavily relied on the availability of high quality dictionaries and grammars to produce high quality output. However, the new MT systems based on machine learning are able to learn translation patterns on the basis of large-scale (specialised) corpora, which means that terminological databases are more likely to be used as (post-translation) sources for quality assessment, rather than sources to be used during the translation process.

7. You are involved in the work of the Brussels Institute for Applied Linguistics. Could you please tell us about the projects the research group is currently focusing on?

We are a research group within the department of linguistics and literary studies, mainly conducting research within the domains of translation, interpreting and foreign language acquisition. One important line of research, within the context of translation and interpreting, pertains to multilingual terminology, special language and communication. With respect to this line of research, we are currently involved in the Termraad Academy project, which involves a collaboration between the Dutch language units of the Directorate-General for Translation of the European Commission (EC) and the European Council, and the bachelor/master programmes in applied linguistics, translation and/or interpreting in Flanders and the Netherlands. The aim of this collaboration is to further enrich the IATE database by conducting terminological research in areas that are important for the EU. A list of topics is updated every year at the start of the academic year. The nice thing about this project is that it involves the participation of students who can contribute to the IATE database by carrying out terminology research in the context of a short-term traineeship or a master’s thesis.

Setting up research collaborations between our research group and professional organisations in the translation and interpreting sector, is an important way for us to valorise and to continue our research. We have recently conducted a small-scale study in collaboration with Brussel Onthaal (a public organisation providing interpreting and translation services) in which we examined how organisations in Brussels providing public services manage to communicate with ‘customers’ speaking a different language. Very often, this communication is hampered because of the language barrier. The situation has even become more complex in Brussels due to the recent massive influx of refugees to the EU. Consequently, service providers more often need to rely on the use of different types of solutions (e.g. using public service interpreters/translators; using translation technologies; etc.) to bridge the communication gap. Further building on this research, we are now planning to launch a research project on multilingual terminology in the domain of public service provision because we noticed that, especially for public service interpreters, it is important to identify the different knowledge areas and topics that are associated with this broad domain.

8. How do you think terminology will develop as a discipline in the future?

A question about the future of any discipline is always difficult to answer, I guess, and so I do not have a straightforward answer, unfortunately. I think it is important to first make a distinction between terminology as an academic discipline and as a practice-oriented discipline. In terminology as an academic discipline, research is carried out to broaden our understanding of terminology (its features and functions) by developing theories or theoretical models that are able to account for different aspects of terminology or typical phenomena related to terminology (such as domain metaphors, neologisms or variation). In terminology as a practice-oriented discipline, terminology research pertains to the search, creation, organisation and management of domain-specific vocabulary. Such research usually leads to the creation of a terminological product, either prescriptive or descriptive, such as a terminological database or specialised dictionary.

Terminology research was for a long time mainly carried out from a prescriptive point of view, in which the conviction reigned that terms should be used unambiguously to refer to clearly delineated concepts. Descriptive terminology approaches have jointly contributed to a more pragmatic view of the relationship between terms and concepts and these views have considerably opened up the possibilities for studying terms from multiple perspectives (e.g. social, communication, cognitive, cultural). Theories resulting from these descriptive approaches benefit from empirical methods in which technology has played an important role. It can only be expected that the technological impact in terminology as an academic discipline will only increase as current and new technologies will be deployed to empirically test and verify the theoretical hypotheses postulated within the field. Our research methods for studying phenomena such as neologisms or terminological variation, for instance, are improving thanks to the fact that more linguistic data (in several languages) are available and easier to process, amongst others thanks to technologies supporting the analysis of Big Data or technologies for data visualisation.

Such technologies also have an unmistakable impact on terminology as a practice-oriented discipline. Concordancers have been around for quite some time now and help users (translators/terminologists) to study terms in contexts in order to detect patterns of term usage. Different types of tools for extracting terms or additional terminological information have been developed and support users in creating terminological products. Terminology cloud services support the creation of terminological databases as a collaborative effort. Linked Data technology facilitates the automatic creation of new terminological databases, based on combining existing ones.

With machine learning being used more frequently in different types of applications, I expect that future ‘intelligent’ technologies in practice-oriented terminology will take over some of the tasks when creating terminological databases, such as creating definitions for new concepts. Perhaps we will witness a meaning extension of the term ‘post-editing’ in the near future as it will probably also become partly a reality in practice-oriented terminology.

9. What advice would you give to your students and to young professionals who would like to start a career in terminology?

First of all, I try to show my students why terminology matters in the practice of translation and interpreting and I teach them how they can carry out terminological research and the skills they need to acquire. Training students to become professional terminologists, however, is not the ultimate aim of the master programmes at Vrije Universiteit Brussel. Nevertheless, we do offer our students opportunities for terminology internships in organisations and I always advise them to do so because it is an excellent chance for them to start a career in the field of terminology.


Serena GrementieriInterviewed by Serena Grementieri – Terminology trainee at the Terminology Coordination Unit of the European Parliament, BA in Intercultural Linguistic Mediation and MA degree in Specialized Translation at the School of Foreign Languages and Literatures, Interpreting and Translation (ex-SSLMIT) of Forlì, University of Bologna.

Born in 1990 in Faenza (RA), Italy, Serena Grementieri completed her BA in Intercultural Linguistic Mediation and MA in Specialized Translation at the School of Languages, Literature, Interpreting and Translation (former SSLMIT) in Forlì, University of Bologna. During her degrees, she had the opportunity to study under the Erasmus Programme at the Exeter University (UK) and at the Moscow State University (Russia) thanks to a university exchange. Interested in terminology and translation in international organizations, she wrote her MA’s thesis on terminology in collaboration with FAO organization. She knows Italian, English, French, German, Russian and a bit of Chinese.

Prepared by Pedro RamosTranslator, Social Media and Content Manager, Communication Trainee at the Terminology Coordination Unit of the European Parliament (Luxembourg).