Tatiana Gornostay is an experienced researcher in Computational Linguistics and Terminology. She completed her PhD studies in the Herzen University of Russia in 2009 at Applied Linguistics department (PhD thesis “Latvian-Russian Machine Translation in the System of Social Communication”, 2010). Since 2005 she has been working at Tilde as a Russian language system project manager, senior researcher and Terminology service manager. She has participated in several national and international R&D projects related to language technologies (FP6 TRIPOD, FP7 ACCURAT, TTC and TaaS, ICT PSP META-NORD, as well as EUREKA Eurostars and ERDF projects). She is an author and co-author of more than 30 scientific publications. Her research interests lie in the area of human, computer-assisted and rule-based machine translation, terminology, terminography and comparative linguistics (studies in Germanic, Baltic and Slavic languages), as well as linguistic resources.
1. Iulianna van der Lek: First, could you tell me a bit about your academic background and your relationship with terminology?
Terminology is everywhere – we come across with terms every day! I am very much grateful to my father – he cultivated my love for new words, dictionaries of various kinds and languages when I was a child. Since then my life has been devoted to linguistics. During my university years I was interested in specialised languages and was inspired by computational terminography. That resulted in my Bachelor’s research on the description of ambiguous terminology in electronic dictionaries and then my Master’s thesis on the thesaurus-based terminology modelling. Later on I was actively involved in machine translation (MT) projects and defended my PhD thesis on MT of specialised texts in the domain of social communication with great support from my research advisor and my colleagues at Tilde. These days I continue to work on computational terminology and terminography trying to find innovative models of terminology acquiring, processing and utilisation in various applications within different usage scenarios.
2. Iulianna van der Lek: What is the latest or most important research that you have done in the field of computational terminology and/or terminography?
There are several projects related to computational terminology and terminography that I have actively participated in. For example, TTC is a three-year FP7 project on Terminology Extraction, Translation Tools and Comparable Corpora. The main result of the project is the TTC web-based platform for comparable corpora collection and terminology processing and management.You are also welcome to join us in our LinkedIn TTC group.Our team participates in the project as a partner and is mainly responsible for research tasks related to the Latvian language and the development of terminology management platform based on EuroTermBank. Another project is TaaS: Terminology as a Service – a two-year FP7 project coordinated by Tilde and started in June 2012. The main goal of the project is to create an innovative cloud-based platform for acquiring, processing, sharing and reusing multilingual terminological data – one of the most important language resources for industry, academia and society in general. I was also involved in the project ACCURAT: Analysis and Evaluation of Comparable Corpora in Under-Resourced Areas of Machine Translation. You are welcome to access the ACCURAT toolkit for comparable corpora collection, analysis and data extraction from comparable corpora on the project website.
3. Iulianna van der Lek: Could you please give some examples of the terminology services that will be provided via the TaaS platform and what is your role in the project?
Taas will provide a variety of core terminology services for key terminology tasks:
- Automatic extraction of monolingual term candidates from the documents uploaded by users
- Automatic recognition of translation equivalents for the extracted terms from different public and industry terminology resources (e. g, TAUS, IATE, EuroTermBank and others);
- Automatic acquisition of translation equivalents for terms not found in existing terminology resources from parallel and/or comparable web data
- Facilities for cleaning up automatically acquired terminology by users
- Facilities for terminology sharing with major term banks and reusing in various applications within different usage scenarios
My role in the project is to lead the team and manage terminology products and services.
4. Iulianna van der Lek: How will you control the direct user involvement in the clean-up process? Who should be allowed to edit and validate the data available through such a platform?
TaaS registered users will be involved in the cleaning up of terminology stored in the TaaS Shared Term Repository. Any user will be allowed to edit and validate terminology extracted from his/her documents and then to share it (make it public) with other users, as well as browse other users’ public terminology. Term banks, in their turn, will evaluate public terminology they are potentially interested in to be shared also via their interfaces. TaaS will serve terminology needs of different user groups within a wide range of applications. The three usage scenarios are elaborated within the project in which TaaS will demonstrate the efficacy of reusing the acquired and user-cleaned terminological data. Multilingual consolidated and harmonised terminology is already utilised as data in the process of human translation, and one of the main TaaS user groups are language workers. TaaS will simplify the process of the preparing, storing and sharing of task-specific multilingual term glossaries. Nowadays, terminology is also being developed as a web-based service with machines as users (e.g., MT systems, indexing systems, search engines and others). Thus, within the project TaaS will provide an instant access to term translation equivalents and translation candidates for professional translators via APIs for computer-assisted translation tools, or translation environment tools, as well as will enhance the domain adaptation of MT systems by dynamic integration with TaaS-provided terminological data. Beyond the EU-funded project TaaS will broaden the application of its services for human and machine users.
5. Iulianna van der Lek: I know that you participated in setting up EuroTermBank, one of the most important terminology resources for translators and terminologists. What does it mean for you and are there any important updates that you could share with us?
EuroTermBank was the eContent Programme project (2005-2007) initiated and coordinated by Tilde.EuroTermBank was one of the major efforts in the consolidation of multilingual terminology resources – a centralised publicly available term bank for the languages of the European Union, also providing a federated access to five interlinked external term bases including IATE. Beyond the EU-funded project EuroTermBank has been hosted and administrated by Tilde and is being actively developed to enhance the consolidation and harmonisation of multilingual terminology resources, to broaden the language coverage of terminology resources and to introduce advanced and innovative terminology services. The new version of EuroTermBank has been released this year – it is targeted to a broad audience of translators, terminologists and other language workers.
6. Iulianna van der Lek: What are the biggest challenges that a coordinator of such large-scale terminology projects is facing?
Terminology is developing rapidly and every day the volume of terminology grows along with the explosion of information available on the web. Current static models for the acquisition, processing and sharing of terminological data cannot keep up with this increasing demand. Moreover, in the context of the multilingual Europe the role of terminology is even more important than ever to insure that people communicate efficiently and precisely. One of the main goals we pose in large-scale projects is to monitor the situation at the scientific and industrial levels to have a grasp of the overall picture in terminology practice. We are also keeping in touch with our end users and clients to understand their needs and meet their requirements.
7. Iulianna van der Lek: What advice could you give to aspiring terminology managers?
Terminology is a spine of a document within its life cycle and within the professional communication in general. Furthermore, terminology is of vital importance for brand consistency and customer satisfaction within businesses. According to recent surveys, terminology inconsistency is the main challenge in the translation and localization industry, and translation and terminology project managers should pay attention to this at the very first stage of a document life cycle. Another aspect I would like to emphasise is terminology processing: unfortunately, the most part of terminology extraction work is still performed manually. Nowadays a number of tools are available for text and terminology processing and I would encourage translation and terminology project managers to widely exploit such tools in their teams’ work. The developers of such tools could design and offer tutorials and workshops to demonstrate the power of new techniques in the translation and terminology work. These materials could be also available via TaaS.
8. Iulianna van der Lek: How do you see the future of terminology as a discipline and what innovations do you expect in the future?
Terminology is multidisciplinary and comprises a wide range of tasks. We always have to keep in mind this interdisciplinary status of terminology and collaborate across various theoretical and applied directions of terminology. The future initiatives should be oriented on collaborative, customised and cloud-based solutions meeting the requirements of different groups of users. Terminology resources per se are of vital importance, however, terminology will also widely serve to facilitate the language technology applications. TaaS will analyse the needs of machine users by studying several sample systems to identify the type, structure and format of terminological data needed by this kind of users. In MT, for example, an MT system will be able to integrate with TaaS and utilise terminology to adapt the system to a specific domain. We foresee the collaboration of the two platforms – TaaS and LetsMT!. LetsMT! is the platform to build your own MT system on the cloud – Do-It-Yourself! MT solution.
“I would like to thank a lot the TermCoord team and Iulianna van der Lek and Rodolfos Maslias in particular, for their kind invitation of this interview that I accepted with great pleasure. Hope the interview would be interesting for your trainees and all of the specialists in the terminology field. Thank you and best wishes for your work!”
Interviewer: Iulianna van der Lek
Iulianna van der Lek-Ciudin is a researcher at the KU Leuven University, Faculty of Arts, Campus Sint-Andries in Antwerp and an independent translation tools trainer. Her research focuses on translation technologies, translators’ workflows and terminology strategies.
Iulianna has a BA in English and French Language and Literature, an MA in English Language and Culture (University of Amsterdam) and a Postgraduate European Master in Specialised Translation (KU Leuven, Faculty of Arts). Before joining the translation industry and academia, she worked several years for a biotech multinational in the Netherlands.