Francesca Chiusaroli (Professor of General and Applied Linguistics at the University of Macerata), Johanna Monti (Professor of Computational Linguistics and Translation Studies at the University “L’Orientale” of Naples) and Federico Sangati (independent researcher) have created a unique research project.

In February 2016, F. Chiusaroli launched the translation project of Pinocchio in emoji on Twitter. Simultaneously the three researchers developed EmojitalianoBot, the first Emoji-Italian translation bot on Telegram, the popular instant messaging platform.


The translation of the famous childrens’ novel is carried out by the followers of the blog “Scritture brevi” by F. Chiusaroli and Fabio Massimo Zanzotto.

Every day F. Chiusaroli tweets sentences taken from Pinocchio, and the followers suggest their translations in emoji; at the end of each day, the official version of the translation is validated and published.

For their work, translators can use EmojitalianoBot, the open and free tool that contains the Emoji-Italian dictionary and Emoji-English descriptions based on Unicode and a glossary with all the uses of emojis in the translation of Pinocchio. The project is associated with the Emojitalia discussion group, where users meet to “communicate” in emoji, discuss problems and solutions, and suggest improvements of the bot, in addition to the translation choices for Pinocchio.

Since its release on Telegram, the project was an instant success, becoming a viral web phenomenon thanks to the “Scritture brevi” community and the Pinocchio translation in emoji, so that the bot has now over 500 users.

Starting from the integration of different methodologies, derived from the different research background of the three founders of the project (combining general linguistics, historical linguistics, translation studies, language teaching, computational linguistics and computer science), @Emojitalianobot is an ideal test bench to experiment new approaches like crowdsourcing and gamification in the field of Natural Language Processing (NLP).

Crowdsourcing, or the use of knowledge, ideas, and content obtained soliciting contributions from large groups of people (particularly from online communities), is increasingly used to collect large quantities of linguistic data. The use of crowdsourcing in fact allows the collection of linguistic information in a short time for a variety of applications, from machine translation and assisted translation to the compilation and updating of lexical and terminological data. In this context, the use of gamification, or the use of games, contributes to the active, challenging and fun involvement of large groups of people. @Emojitalianobot offers games and features to learn or guess the meaning of emoji.

Researchers are also working on the implementation of projects aimed to use @Emojitalianobot within inclusive educational approaches for people with disabilities, in teaching contexts for the reception of migrants, and in sentiment analysis applications.



On the basis of (both linguistic and technological) experience with @Emojitalianobot, the three Italian researchers together with Martin Benjamin and Sina Mansour of the Kamusi International Project and EPFL (Switzerland) designed a new bot on Telegram in April 2016: @EmojiWorldBot, a multilingual dictionary that uses Emoji as a pivot language from dozens of different languages. Currently the emoji-word and word-emoji functions are available for 72 languages ​​imported from the Unicode tables (see http://www.unicode.org/cldr/charts/29/annotations) and provide users with an easy search capability to map words in each of these languages to emojis, and vice versa. All tags are linked to their corresponding Wordnet senses.


The bot is also used within the Translation studies classes at the university “L’Orientale” of Naples by J. Monti along with a large group of students from four different universities of the People’s Republic of China in order to reflect from a theoretical point of view about the use of emoji in intercultural contexts.

The bot has two simple games implemented: a tagging game and a translation game. In the first one, users can suggest additional tags for single emojis in any language (for example adding “egg” to the tag list for 🐣 in English). In the second game, users are asked to map tags from a source language (e.g. English) to a target language (e.g. Swahili). Data validation is achieved via a consensus model through which answers are accepted as correct if the same result is provided by a threshold number of respondents.

The goal is to reach a uniform and comprehensive list of tags across multiple languages with a precise mapping between any language pair, which may serve to bootstrap a massive multilingual dictionary.

A new version of the bot is already under development. It wil allow to:

  1. add new languages
  2. add new terms to the current languages ​​(including the names of the countries for national flags)
  3. compare the languages, using the bot to collect information by means of very accurate techniques of crowdsourcing and gamification related to bilingual dictionaries in any language.

This may be the first crowdsourcing project to use bots for data collection and validation; it is almost certainly the only such effort for linguistics, and is unique in its attempts at engaging participants for hundreds of under-resourced languages using the cheap and simple technologies available to their billions of speakers around the world.

This project attempts to address the data chasm for natural language processing for most languages by distilling data collection to simple micro-tasks using techniques adapted to least-common-denominator technology. Emojis are an important starting point because they are a standardized global iconography that has been installed on every smartphone. Emojis also represent the resurgence of images as a communications medium and are essential elements of contemporary methods of conveying information in multilingual contexts.

