The ARTES database


The ARTES database (Aide à la Redaction de TExtes Scientifiques / Dictionary-assisted writing tool for scientific communication) 1 is a multilingual multidomain language resource that offers a comprehensive approach to lexical resources: terminological, phraseological, domain-specific, domain-free, semasiological and onomasiological (Pecman 2004, 2007, Pecman & Kübler 2011, Kübler & Pecman 2012, Gledhill & Kübler 2015, Pecman & Gledhill 2018). Designed in 2010 by the CLILLAC-ARP2 team of researchers from Paris Diderot University, the ARTES DB is the outcome of several previous projects aimed at the creation of resources for LSPs and specialised translation, namely BasTet designed in 2006 by Claudie Juilliard, Terminom1 developed in 2004 by Kübler, a domaine-free phraseology resource developed by Pecman in 2004, and the LangYeast combinatory dictionary developed by Mestivier Volanschi in 2008 (cf. Kübler 2004, Pecman 2004, Mestivier Volanschi 2008). The DB is edited by Master’s-level students in translation3, who create records as a part of their Master’s Dissertation. The information stored by the students in the DB-records is based on corpus analysis and exploited as part of a practical translation project. Although mainly used by students, ARTES also targets various LSP users, namely professional translators and domain experts, who are also involved in specialised communication. The DB is also used by researchers from Paris Diderot University to explore terminology and phraseology, and is used to conduct linguistic analyses to improve language description in the DB and to find solutions to various specialised translation problems. Thus, one of the objectives of the ARTES DB project is to explore the interaction between research and teaching in the areas of applied linguistics such as specialised translation, LSP communication, corpus-based studies, and discourse analysis.


ARTES provides information on technical terms in various specific domains and on collocations found in those domains. There is no restriction in relation to domains and languages represented in the database. However, most term records belong to one of the following languages: English, French, Spanish and German. Concerning domains, the terms collected in the DB generally come from scientific and technical subject fields, but also from legal and commercial ones. The information fields provided are: domain(s), definitions, contexts, synonyms or variants, collocations, equivalents, semantically-related terms, notes / observations, and sources. The DB also offers a collection of items relating to discourse phraseology, that is a list of most frequent (and productive) lexico-grammatical patterns or structures which are common to several domains, e.g. the most complete account of this problem is found in…, our conclusions focus on aspects such as…, in this paper/report/study we conclude/show/suggest that X, Failure to (comply/follow these instructions, etc.) may result in (damage/malfunction injury, etc). Another important feature of this DB is its information on semantic or conceptual relations between terms, all of which are recorded in the DB and also made visible via tree diagrams.

The ARTES database was thus designed to cater for both the teaching and learning needs in specialised translation at the department of Applied Languages of Paris Diderot University and also for the research needs of the CLILLAC-ARP team. Students on the Masters in Specialised Translation (Master ILTS) are introduced to the theories, methods and applications of terminology, lexical resource creation, and corpus linguistics, with an emphasis on corpus linguistic tools and information retrieval. A combination of these courses allows the students to develop skills and acquire knowledge crucial for achieving a high quality translation of LSP texts. The ARTES database offers students a template for analysing specialised lexical resources in relation with the text they translate. The students at Master’s level are also invited to question the theoretical and methodological premises on which the description of language data in ARTES is based, by testing these notions in the context of “real life” translation problems. In turn, the database offers useful features for teachers in order to help them follow students’ work in progress and evaluate the resources compiled by students, but also for conducting research in relation to terminology, phraseology and specialised discourse analysis.

Currently, there are 81590 terms recorded in the ARTES, 60422 collocations related to these terms and 2345 generic collocations related to different discourse types. There are some 700 terms and some 200 generic collocations added every year. The data recorded in the database is accessible via an online application designed to take into account various LSP communication contexts:

The design of the ARTES database is supported by CLILLAC-ARP and the Department for Applied Linguistics (UFR EILA)4 of Paris Diderot University, and benefits from the maintenance and development by Brice Bricaud and Pascal Cabaud from the ‘Direction du Système d’Information (DSI)’ at Paris Diderot University.

1 Web page of ARTES project
2 Web page of CLILLAC-ARP research center
3 Web page of Master’s Studies in Specialised Translation at EILA Department
4 Web page of EILA Department of Paris Diderot University


Gledhill (C.) & Kübler (N.) 2015, “How Trainee Translators Analyse Lexico-Grammatical Patterns”, in Phraseology, Phraseodidactics and Construction Grammar(s), M. I. González-Rey ed, Special issue of Journal of Social Sciences 11/3, pp. 162-178.

Kübler (N.) 2004, “Using Webcorp for building specialized dictionaries” In Advances in Corpus Linguistics, K. Aijmer K ed. Proceedings of the ICAME Conference, May 2002, Göteborg, Suède. Amsterdam: Rodopi, pp. 387-400.

Kübler (N.) & Pecman (M.) 2012, “The ARTES bilingual LSP dictionary: from collocation to higher order phraseology”, In Electronic lexicography, S. Granger and M. Paquot eds, Oxford, Oxford University Press, pp. 186-209.

Mestivier Volanschi (A.) 2008, Étude et modélisation des phénomènes collocationnels : Implémentation dans un système d’aide à la rédaction en anglais scientifique, Thèse de doctorat, 5 déc. 2008, Dir. Natalie Kübler , Université Paris Diderot.

Pecman (M.) 2004. Phraséologie contrastive anglais-français : analyse et traitement en vue de l’aide à la rédaction scientifique. Thèse de doctorat. Dir. Henri Zinglé. Université de Nice-Sophia Antipolis. 467 p. Soutenue 9 décembre 2004. PECMAN (M.) 2007, « Approche onomasiologique de la langue scientifique générale », in Lexique des écrits scientifiques, A. Tutin éd., Revue française de linguistique appliquée XII/2, pp. 79-96.

Pecman (M.) 2008, “Compilation, formalisation and presentation of bilingual phraseology: problems and possible solutions”, in Phraseology in language learning and teaching, S. Granger and F. Meunier eds, Amsterdam/Philadelphia, John Benjamins, pp. 203-222.

Pecman (M.) 2012, « Etude lexicographique et discursive des collocations en vue de leur intégration dans une base de données terminologiques », Terminology, Phraseology and Translation, M. Rogers ed, special issue of The Journal of specialised translation (JoSTrans) 18, pp. 113-138.

Pecman (M.) and Gledhill (C.) 2018, How trainee translators and their teachers deal with phraseological units in the ARTES database. « Des unités de traduction à l’unité de la tradction », 7ème édition de la Traductologie de plein champ, Université Paris Diderot, 7 juillet 2017, Paris, France ; ISTI, 21 octobre 2017, Bruxelles, Belgique ; FTI, 9 décembre 2017, Genève, Suisse. In Balliu, Christian (dir.) Université libre de Bruxelles, Equivalences 45/1-2 : 237-259.

Pecman (M.) & Kübler (N.) 2011, “ARTES: an online lexical database for research and teaching in specialized translation and communication”, Proceedings from International Workshop on Lexical Resources (WoLeR) 2011 at ESSLLI. August 1-5 2011, Ljubljana, Slovenia, pp. 86-93.

Written by Mojca Pecman, Maître de Conférences, UFR EILA, Université Paris Diderot