Research on Typographical Errors in Dictionaries

June 25, 2019 5:30 pm

Research on typographical errors in twenty-two Spanish specialised bilingual paper dictionaries (Ariel and Gestión 2000). An overview

Academic and professional background

My name is Santiago Rodríguez-Rubio. I live in Seville (southern Spain). I have a bachelor’s degree in English Philology, a three-year undergraduate degree in Tourism, and a master’s degree in Translation and Interpretation.

I work as an English-Spanish, French-Spanish translator and interpreter. My fields of expertise are: tourism and gastronomy, travel, adventure, natural world, finance and accounting, car restoration shows. I have worked as a translator and liaison interpreter (both in-house and freelance) in several Spanish engineering companies having subsidiaries abroad. Those engineering companies operate in the following fields: wastewater and drinking water treatment plants, power generation (electric power, solar power, and wind power).

I have always been greatly interested in formal correctness, both in written and spoken language. I suppose this tendency comes from my taste for reading, and from a mental framework built upon the satisfaction you get from work well done.

My research on typographical errors in specialised bilingual dictionaries (materials)

Since 2016, I have been conducting research on typographical errors in a corpus of twenty-two Spanish specialised bilingual paper dictionaries. The corpus is around 15,000 pages long, and the works are mainly bidirectional English-Spanish. The dictionaries were published by two Spanish publishing houses belonging to “Grupo Planeta” media group, namely, “Ariel” and “Gestión 2000”.

The dictionaries were chosen according to their academic standing and the prestige of the publisher. Fourteen of those twenty-two works conform the so called “Alicante Dictionaries” (Mateo 2018, in Fuertes-Olivera, The Routledge Handbook of Lexicography), including works on international trade, international taxation, banking, insurance, stock market, advertising and marketing, tourism, footwear industry, etc. These fourteen dictionaries were published by Ariel. They are linked to IPA (“Professional and Academic English”) research group, from the University of Alicante, as well as to IULMA (“Institute of Applied Modern Languages”), from the Community of Valencia (eastern Spain). Some of those fourteen dictionaries are landmarks in Spanish specialised bilingual p-lexicography and English for Specific Purposes academia. The first two dictionaries of the Alicante series were Diccionario de Términos Jurídicos/A Dictionary of Legal Terms (Alcaraz and Hughes, 1993), and Diccionario de Términos Económicos, Financieros y Comerciales/A Dictionary of Economic, Financial, and Commercial Terms (Alcaraz and Hughes, 1996). These two works were last reedited in 2012 and reprinted in 2014.

Origin and development of my research

My research originated by chance. In my translation work, I used (and continue using) the above-referred two dictionaries on a regular basis. It would not be long before I began to spot different typographical errors. At first, I supposed that it would be a matter of just a few occasional errors, but very soon I noticed there was a sort of “model of errors”, as errors appeared in a pervasive way in all positions of the text (lemma, definition, illustration, cross-reference, top of page, etc.), and some errors were repeated in different places within the same section or in another section of the works. As my research progressed, I found that not only some errors were repeated, but also that some of them were repeated in similar sentences, or even in the same sentences. I also found that some errors were not only repeated within the context of a particular dictionary, but also in different works. For instance, a particular mistyped term (variants not included) appeared eleven times in three different dictionaries. Another mistyped term (variants not included) appeared sixteen times in three different dictionaries. A mistyped term such as accomodation (or the variants accomodations, acommodation, accomodate, accomodated) appeared more than forty times in eight different dictionaries.

The corpus of twenty-two dictionaries was entirely analysed, page by page, in a homogeneous way, that is, using the same error detection criteria. Apart from the bodies (“English-Spanish”/”Spanish-English”), the rest of the hyperstructure of the works was also analysed (i.e. cover pages, forewords, introductions, etc.). Given the length of the corpus, different sub-corpora were established for analysis and comparison purposes. More precisely, the establishment of sub-corpora or subgroups of dictionaries allowed us to depict mechanisms through which errors were repeated or reproduced in related works. Related dictionaries were established according to different criteria: works having the same authors, works belonging to the same fields of study, or works belonging to the same editorial sub-collection (e.g. The Dictionary of Terms of the Natural Stone and Allied Industries and the Dictionary of Terms of the Footwear and Allied Industries make up the series “Dictionaries of Industrial Terms” by Ariel, and, therefore, they conform a relevant sub-corpus).

Typographical errors were classified in categories and subcategories, such as “non-word errors” (letter omission, insertion, substitution, or transposition), and “real-word errors” (word omission, insertion, substitution, or transposition). Within each category or subcategory, errors were grouped according to repetition or similitude criteria. This allowed us to establish the intratextual error repetition rate for each dictionary, from which a number of relevant conclusions were drawn.

Roughly speaking, two types of similar errors were distinguished. On the one hand, similar errors featuring the same underlying term (e.g. terapeutics, therapeutc, therpeutically, therapeutic or spectrotophometer, spectophotometry). On the other hand, similar errors featuring different underlying terms (e.g. objetive, subjetive). In the latter case, there is a relationship of antonymy between the terms involved, but other mistyped terms feature other relationships, such as word structure (e.g. accomplisment, establihment or markmanship, guardianshp, censorhip, sponshorship), or the fact that they belong to the same semantic field (e.g. terminatation, liquidatation, exhaustation).

One of the most shocking real-word errors are those resulting in the substitution of a word for another word. Two types of word substitutions were described in my research: intralingual substitution (e.g. gastronomy for gastrostomy), and interlingual substitution (e.g. dispense as griten for dispense as written). Some of those substitution errors are likely to have been generated automatically. In fact, the data I compiled may have an application in the area of machine learning for spell-checkers, as the latter could learn from the set of errors detected, provided that the correct terms are also entered in the same way as in a parallel corpus.

Sequences of errors are worthwhile mentioning, from a qualitative perspective. Sometimes, several errors were found in a particular sentence or in a particular entry. For instance, in the entry “hypnosis (… es una estado similar al sueño… inducida… en el que pueden surgir… alucionaciones… ◊… a full range of options from aspirin hypnosys)”, four errors were found: two examples of gender disagreement (“una estado… inducida” for “un estado… inducido”), an insertion error probably caused by psychomotor anticipation of segment “cio” (“alucionaciones” for “alucinaciones”), and a substitution error probably caused by psychomotor perseveration of letter “y” (hypnosys for hypnosis).

Interlingual interference is another psycholinguistic aspect of the compiled errors that could be studied in the future. Thus, a mistyped term such as iniciatives (for initiatives) could have resulted from the interference of the Spanish counterpart “iniciativas”. Similarly, infectiones (for infections) could have derived from “infecciones”, inestability (for instability) from “inestabilidad”, fragance and fragant (for fragrance and fragrant) from “fragancia” and “fragante”, and so on.

Follow-up. Communication with the publisher and the authors of the dictionaries

I would like to clarify that there is no professional or academic relationship between the publishing houses or the authors and me. I am conducting an independent research, within the framework of a doctoral programme in the Pablo de Olavide University (Seville, Spain). In any case, I contacted the managing editor of Ariel and the director of IULMA, as well as the authors. I gave all of them thorough information about my research, and, needless to say, I offered my collaboration so as to correcting the errors found in their dictionaries.

I would recommend the more relevant dictionaries under study to be corrected. “Zero-errors” is simply unattainable in such complex works. Any lexicographer would tell you so. It is also a matter of common sense. Still, I think a reference work should feature a very high degree of correctness. Besides, what I found is not just a mere accumulation of errors, but something I have referred to as a “model” or “paradigm” of errors. In A Terminological Dictionary of the Pharmaceutical Sciences (Domínguez-Gil, Alcaraz and Martínez, 2007), I found (among many other errors belonging to other categories that were also registered) nearly seven hundred non-word errors and real-word errors; in other words, one error belonging to any of those subcategories every 1.58 pages. Now, one error every 1.58 pages in a text 1.5 pages long would mean just one error; however, the same frequency of errors in a work 1,000 pages long (such as the dictionary involved) reveals a sort of “error consistency”.

See details about the errors found in A Terminological Dictionary of the Pharmaceutical Sciences in my paper “A quantitative analysis of typographical errors in A Terminological Dictionary of the Pharmaceutical Sciences, English-Spanish/Español-Inglés (Ariel, 2007)”, published in Panace@, 2018:

http://www.tremedica.org/panacea/IndiceGeneral/n47-analisis.pdf

In the corpus of twenty-two dictionaries, I found (among many other errors that were also registered), more than 4,000 non-word errors and real-word errors, including a wide range of grammatical errors (e.g. gender disagreement, number disagreement, wrong verb tense, etc.). See title and year of publication of those twenty-two works in the Appendix to my paper.

Conclusions

“The Alicante Dictionaries” make up a relevant collection of specialised bilingual paper dictionaries, not only in Spain, but also at an international level (see “The Alicante Dictionaries”, in Fuertes-Olivera 2018, The Routledge Handbook of Lexicography). Based on our findings in terms of frequency of errors and error repetition rate, we would recommend the more prominent works of the corpus to be revised, including A Terminological Dictionary of the Pharmaceutical Sciences (Ariel, 2007).

Our research is unprecedented, and we believe it provides a clear added value. It could definitely contribute to expanding knowledge on typographical errors in dictionaries, from a two-fold perspective: error generation and error detection/correction. More specifically, valuable insights are offered regarding mechanisms of repetition or reproduction of errors in related dictionaries.

As previously mentioned, our data could also be applied to Natural Language Processing (NLP), more precisely, to the field of machine learning for spell-checkers.

Read more

Written by Santiago Rodríguez-Rubio, translator (Spain)

1,758 total views, 3 views today

Tags: , ,

Categorised in: , ,