Sharing Multilingual Terminologies with a Semantic-based Model

1424

lsp2015
Terminological resources provide rich multilingual descriptions of knowledge needed for various applications, such as machine translation or information extraction. They frequently comprise large sets of manually curated well-designed entries. Examples of such resources are biomedical terminologies, such as the formalized and semantic-based GALEN, SKOS-based controlled vocabularies, such as GEMET, and TBX-based term bases, such as the InterActive Terminology for Europe (IATE). However, a general proliferation of representation formats, data semantics, and ranging linguistic complexities of terminologies challenges their sharing and re-using. By interlinking terminologies and ontologies with the semantic-based Terminology Interchange Model (T-Mint), rich natural language descriptions are grounded in formal semantics. The model utilizes a domain ontology as a common point of reference to interchange complex terminological data from different resources. T-Mint is based on standards and best practices of the terminology as well as ontology community.

The idea to base heterogeneous language-related data on ontologies – shared and language-independent conceptualizations – as a common point of reference is not new. In fact, a combination of natural language resources and ontologies has been accomplished, for instance, for lexicons in the form of the ontolex format. Such ontology-lexicon models simultaneously benefit from the formal semantics of the ontology and rich natural language descriptions of the lexicon.

T-Mint aims at interchanging heterogeneous, multilingual data in reference to existing domain ontology entities by adapting the ontology-lexicon approach to terminologies. In transforming terminologies to ontologies, the epistemological aspect of the data is lost. That is why it is important to keep both levels separate, but still allow a reference between them to benefit from the advantages of both knowledge management resources. By using T-Mint, the terminological data are transformed to the Resource Description Framework (RDF) format, which facilitates linking them to data from the Linguistic Linked Open Data (LLOD) cloud. Terminology standards of the International Organization for Standardization (ISO) serve as a starting point for designing and evaluating an ontology-terminology model. This solid basis has been enhanced with existing approaches, research findings, and best practices from the ontology and terminology community. As it is a modular ontology, T-Mint can be freely extended and re-used. In order for it to be applied, only the core-structure module resembling the metamodel of the Terminological Markup Framework (TMF) and the data category module are required.

T-Mint is specified independent from any specific format but also implemented in the Web Ontology Language (OWL). The implementation is available at this link, which also presents an example use case for the model in the domain of finance. The use case shares terminological data in the domain of finance from three terminological resources – Termium, IATE, and Investopedia – in relation to the Financial Industry Business Ontology (FIBO) of the Object Management Group (OMG). The data from the three resources are harmonized in relation to individual FIBO entities, which enriches to originally English ontology with multilingual complex natural language descriptions.


Mag. Dagmar Gromann

Research Assistant
Institute for English Business Communication
Vienna University of Economics and Business