In 2007, European Commission‘s Directorate-General for Translation (DGT) made its translation memory (TM) accessible to the public, aiming at supporting the multilingualism and language diversity, and at making the Commission information reusable. This first version of the DGT-TM contained documents up to 2006, while the second version, DGT-TM-2011, was released in 2012 with data up to 2011. Since then, the DGT-TM data is updated annually, and we have now the last updated version, DGT-TM-2015, with documents of 2014.
The content of the TM consists of the majority of the documents comprising the Acquis Communautaire, which is the entire body of the EU legislation, i.e. regulations, treaties and directives adopted by the EU, as well as documents that are not part of the Acquis. The Acquis Communautaire is translated in all 24 official EU languages, with the exception of Irish for which there are special rules in place. This has resulted in the creation of a large parallel corpus with 276 possible language pairs. Apart from its growing size and the significant number of available languages, this TM offers the advantage of rare language combinations (e.g. Finnish-Maltese).
The legislative documents in the different languages, contained in the TM, are aligned to produce translation segments as they are provided by EURAMIS (European Advanced Multilingual Information System). In this file, you can find information on the number of translation segments, words and characters available in each language, as well as the updates performed every year.
Although the database of DGT-TM is exclusive property of the European Commission, it is downloadable for free under specific Conditions for Use. The translation segments can be extracted in the eXchange (tmx) format. The TM is available through a collection of zip files no bigger than 100 MB each, containing the tmx files. For more detailed information and access to the files, which are grouped according to their release year, click here.
Written by Ioanna Kotsia
Translation Trainee at TermCoord