Applying Natural Language Processing to Localization - Jonckers

Applying Natural Language Processing to Localization

Applying Natural Language Processing to Localization

Natural Language Processing

One of the most useful areas of Artificial Intelligence for Localization is Natural Language Processing (NLP). We deal in translation, our aim is to provide users with translation that has predictable quality (in terms of accuracy, language, style, formatting etc). NLP provides tools to assess language in many ways, and therefore can support this process.

How can it support localization; take Google’s Natural Language API. This is an Application Programming Interface, content is uploaded to Google’s servers for processing, however this might not be suitable for all content, but it does offer a glimpse of what is possible with NLP. With just a few lines of code you can report on the type of information contained within the text (for example; names, locations, organizations) and the sentiments (on a scale of negative to positive). This information has multiple uses to the localizer, for example:

  • For person names, we can extract all names before translation to make sure they are handled correctly after translation (for example in transliteration to Asian scripts).
  • Extract Key terms for glossaries.
  • Locations: We can make sure that localizations are localized as suitable (for example a support centre in US can be highlighted as needing to be changed to one in China. We can also ensure that we are correctly localizing tricky geo-political place names correctly by highlighting these for special attention.
  • Content in TMs can be evaluated to ensure they best match on positive/negative scale is suggested over a match that has a different sentiment.

It really is quite remarkable what can be done with NLP, a process that allows for more focused translation. Google is one of several companies providing a Natural Language API, and this can allow us to prototype what we need. To create our own solutions without using third party APIs takes more coding but it still achievable. An industry preferred language, Python, has a module called the Natural Language Toolkit which allows for fast and easy results, other languages will have their own packages. The sentiment analysis would require some work with Machine Learning algorithms, and even though this may require more tagging and coding work, it can provide a more satisfactory outcome as corpus and requirements can be tailored for your specific needs rather than relying on the algorithms of someone else.

Is this hard?

Actually, no. To get valuable instant insights using Google Natural Language API, you can begin using only a few lines of Python. Using a basic script with just 16 lines of code to get a report on content. If you want to do it yourself (for example with the Python Natural Language Processing Toolkit) it will require some additional lines of code, but you can get still access valuable information from less than 100 lines of code (and a few hours of trial and error). The key to success here is not the technical aspects, it is how you use that information to drive quality; what to extract from what content, how to build these insights into actionable information, used to drive fast, accurate high quality translations. NLP gives you the basic tools, and once you start using these new found insights, then the fun really begins.

Blog written by : Quality Director  – David Brown