For most of the last decade, big data was the technological phenomenon, more than any other, which companies tried to harness to their benefit. The idea that the thousands of actions people take every day in their online lives could be exploited to target advertising, identify trends and even predict behavior was a tantalizing one. Marketing departments and entire companies built themselves around the premise of data as the ‘oil of the 21st century’.
These companies mined our search history, our browsing behavior and our internet viewing habits, and some of them made huge profits from selling it on or using it to target their marketing.
Now there’s a new phenomenon coming. Recent data misuse scandals have taken the shine off the big data movement, shifting the industry’s focus to a more sustainable model. Enter Big Content.
Think about how big the internet is. No, really. There are 55 billion web pages out there. If each of those pages has, say, 100 words, that’s 5.5 trillion words. And that’s probably a conservative estimate. Obviously not all of that content is useful to the companies that own it. Many of those pages contain filler text or legalese. The pages that really interest marketers from a big content point of view are the ones that can be harvested to be useful to data analysts. eCommerce product descriptions and reviews are the best examples here – there are millions of products for sale on the internet and each of them needs a well-written description to sell effectively. Similarly the vast majority of those products has one or more customer reviews available on the site. If they are informative and clear, they can help the buyer make his decision, and also help the retailer take any feedback on board.
So there’s a lot of content there, waiting to be useful. One important aspect of this mountain of content from a translation point of view is that it exists and is needed in multiple languages – so to sell products in multiple countries, localization is needed. However the huge word-counts involved rule out the traditional 2,000 words/day approach to translation as far too time-consuming.
This is where MTPE and crowd-sourcing come in. The only way to efficiently process such a mass of content is to machine-translate it wholesale and to use a large crowd of vetted linguists to post-edit it and optimize it’s impact on readers.
This is why crowd-sourcing continues to be an important focus for Jonckers and to the industry in general – in the age of big content, it’s the only sustainable solution.