Wednesday 15 February 2012

Mind your language, Hitchhiker: Imagining a universal translator


As Ford Perfect landed his Improbability Drive* powered rocket in the suburbs of London, he could hear a whirr – the Babel Fish** in his ear just shuddered and died, suffocated by the smell of the strange blue planet. Ford’s plans of breeding billions of Babel Fish to start a universal translation service were rendered irrelevant. He now had to resort to technology to start up the service, aided by the gazillion speed computer - the Improbability Drive. Enter Hitchhiker Corporation.   

Language Connections
The service should be able to take input in any language and should be able to instantaneously translate into any other. Do we attempt the translation service to address all the ~6800 languages in the world? Should the system have the flexibility to translate from Chamicuro (Peruvian language spoken by 8 people) to Ongota (Ethiopian with 6 native speakers)? Will Lord God help Ford Perfect cut this problem to a manageable level?

He leafed through the pages of a well worn King James Bible. The popular 1611 master piece had emerged from the earlier from the Hebrew, Greek, Latin, German and English versions. From Jamaica to Ethiopia, from Bombay to Maldives, it travelled well with the English sailors and pastors, and was the root of most biblical translations. He found his answer: The translator service will work in the 2270 languages in which the Holy Bible has been translated.

“If you map every language to every other, it is 5150630 combinations (2270*2269 – 2270p2)”, the computer demurred, “figure out a simpler business model”. Ford toyed with an idea of creating a “neutral language mark up language (NLML)" that will serve as the base for all translation. He then abandoned it - it sounded corny, and the idea of creating a world neutral language didn’t appeal. He decided to create a hub and spoke concept.

World languages, he realized were bunched into about 30 groups. The “core language” of the group provides the root for words and grammar for all those “derived languages”.  English, much to his chagrin, owed its roots to the Germanic language, along with German, Dutch, Icelandic and Swedish.

He created the groups and designated the tongue with maximum number of speakers as the “hub”. Others were “spokes”. In his definition, English (with 1.5b speakers – native and non-native) was the hub and German (with 190m speakers) was the spoke. When you need to translate across groups, do it through the “hub”. If a user wants a translation from Bengali to Icelandic (they connect via fish anyways), these are the steps:

Bengali (spoke) è Hindi (hub) è English (hub) è Icelandic (spoke).

The permutations were much lesser – in solvable proportions.  He cut out 270 more languages: they weren’t part of any group and were orphans of the linguistic world.

The ingenuity of the human brain
Ford next had to handle the problem of understanding a statement in a given language. It becomes especially difficult for a computer due to the human ingenuity. What William (Wordsworth) means with:

“She is a phantom of delight,
When first she gleam’d upon my sight;
A lovely apparition, sent,
To be a moment’s ornament;
A dancing shape, an image gay,
To haunt, to startle, and waylay.”

is the same as what his namesake, Will.I.Am of Black Eyed Peas means with:

“Girl, you know you got me, got me, With your pistol shot me, shot me...
No, no, no, don’t phunk with my heart...”

How can the computer understand both mean the same – “Hi, you are my hot babe... “ ?

Ford had other problems to solve. He had to handle homonyms: words with same spelling but multiple meanings. And of course synonyms: different words with same meaning. “Figure of speech” was the next tripping point: “kicked the bucket” does not transliterate its meaning in any language, including in English. What about hyperbole: “I told you a billion times = main tumhe hazaar bar bol chuka huun”, even if it is a mathematical in-equation. Metaphor, alliteration, sarcasm, insult, simile- these were veritable battle grounds for the language translation system. 

The slangs were hard to keep track - developing daily in the streets of Beijing, Chicago and Mumbai. Grammar codes for popular languages were already available – he just adopted from them.

Becoming Context Sensitive
All these were handled in the umbrella of "context".  Ford had to do a mathematical model for being “context aware”.  Context is determined by the words in the neighborhood of the text. Using “Operation” in the neighbourhood of “Theatre” is different from using “Operation” in the neighbourhood of “Commando”. 


The problem had great connections with quantum physics - the "state" of any one particle can be guessed with reasonable amount of certainty, if you know the "state" of the particles around it. Similarly, the meaning of one word can be understood, only if you understand the meaning of the other words around it - without that context a written word has no meaning - "no state".

The computer picked up the key words and the neighbourhood words, compared across multiple texts and encoded them as different contexts. Millions of contexts. Mapped against the relevant figure of speech. Mapped across languages.

To further optimize, Ford fed the computer the text of the Holy Bible in 2000 languages.  Feeding in Harry Potter series was good: rich in context/ sub-text and available in 66 languages. He then fed in the latest movies with their sub-titles. You tube videos for slangs. The Improbability drive hummed on, understanding the subtlety of human language, cross referencing meanings and contexts across languages. 

Ford programmed the hierarchy as:
Word ßRelevant meaning from the dictionary ß Determined by the context

The translation worked as follows:
Origin language (Word ßMeaning ß Context) è Target language (Word ßMeaning ßContext)

The Hitchhiker Services
The first service from the company – the “Hitchhiker Voice2Text Translator” – was enabled through “Translation Goggles”. The translation engine was on the cloud, powered by the Improbability Drive.

The user’s phone app logs into the cloud and feeds in the raw material – from a video, movie, scanned document or a real life speaker. This is translated and sent back as a sub-title to the “Translation Goggle” the user wears. With enhanced connectivity and great computational power, the service is near instantaneous. So, though Khan’s academy videos are in English, you can get Tulu sub-titles on your Translation Goggles.

In about a year, Ford is planning to launch the holy-grail - Hitchhiker Voice 2 Voice translator. Here, the input voice is translated and played back in the voice and tone of the original speaker! Ford is working on the “Deep Throat” system that reconstructs the larynx pattern of the speaker from the spoken words. This will help create same voice tone translation.

So, when Amitabh thunders in Hindi with his characteristic voice, it can be instantaneously heard, in the same baritone, in Malagasy (the language of Madagascar).  Imagine talking to a customer on phone in English and he hears it in his native German - in your original voice and tone. Great possibilities open up.  

The beginning of the end  
Just then, the Improbability Drive strafed across the song “Kolaveri di” in You Tube. It pondered for a moment. The absurdness was stupefying. The lack of context was complete. It was an assault on all its 9 senses in 7 dimensions. A fit of distortion ran through its cerebral cortex .It evoked memories of the Vogans%.

The computer froze. Forever.

The Vogan ship was trudging 42# weeks away from Earth, laying the inter-galactic highway.  Any tress-passers would have to be evicted.

NEXT: JADES è Just Another Desperate Education Start-up



----------------------------------------------------------- 
Attribution: Many white papers on context aware systems and language groups have been referred to. I confess these ideas are neither original nor mine. They have been happily cross-pollinated from different areas.

The following references are from the book: "The Hitchhiker's Guide to the Galaxy" by Douglas Adams:  
* The Improbability Drive:  One of the most powerful computers in the universe. It powers a rocket that can travel in intergalactic speeds.      
** "The Babel fish”: Yellow leech like fish found in more developed planets. If you stick one in your ear, you can understand anything said to you in any other language. For inter-galactic hitch hiking.   
%: Vogans: Are the baddies in the inter-galactic play - they write bad poetry. They eventually destroy the planet known to us as Earth to build an intergalactic highway.  

Wednesday 11 January 2012

Siroomba, can you come here with the mop?

How about incorporating the iPhone personal secretary Siri inside your family's cleaning robot -the Roomba? You can yell, "yeah siroomba, come here, divya just dropped some milk" and you will have him there, in just a few seconds, with the right mop for cleaning up the mess. 

A huge array of technologies will have to work in sync before the floor is clean... a sound sensor deducts the origin of your scream. The language recognizer understands the language (or goes back to google translate engine on the web, just in case you choose to scream in your mother tongue). The language layer gives the right command to the Robot OS. With the care of a fighter pilot selecting the right missile for his mission, the right mop is picked up. The physics engine kicks in to move Siroomba in the most optimal route. Finally, the work done, he stands there and cheekly flashes the clean floor on the HDTV. 

The world is clearly a  befuddling place to our own crebral processor that has evolved over millions of years. How will the robot understand the circumstance it is in? No one storage can keep all the experiences for the Robotic processor to respond. Sensors will take in visual, audio and nasal inputs and make sense out of it from a giant cloud of world experiences. What the brain knows as a result of evolution, the cloud will know as a result of the learning engine.

Imagine Mr. Cook - your own cooking assistant. Your chicken will smell and look just about right, thanks to the Robotic Recipe on the cloud from Khana Khazana. Mr. Cook will make it and verify it with the gaint cloud of cooking experiences. 

And if you had imagined the robotic Mr. Cook looks like an Honda's ASIMO - a biped, you are likely to be mistaken. The  cooking robot is more likely to look like a spider, with its eight arms tossing around the ingredients. And of course, in the center is the robotic brain with positronic pathflows that Asmiov described. 

Instead of the laws of robotics, we will have the most critical experiences embedded in the brain. Physics. Language. Movements, with flexible axis. Color. Smell. Geography. How about robots with a sense of history? A robot which says, "Last time when I made the chicken, you just about nibbled... are you sure you want this recipe again?" or "How about some Pasta, Siri told me about the run tomorrow morning?"

Imagine combining input of full body scanners and gesture recognition applications into the art of tailoring. Entire industries can shift geographies. You walk through the scanner, select your cut, select your fabric - somewhere in a tailoring center in Arizona a robot will have stitched and shipped your dress before you leave the store in NY. This is almost applying the Dell model manufacturing to the apparel industry. I will be shorting the shares of the Bangladesh shipping industry. 

Imagine the compression this will cause in the electronics supply chain? LED/LCD, Smart/Normal, Touch/Non-Touch, all made in a robotic manufacturing center just before you leave the store. The end of labor arbitrage in manufacturing. In a world of near shore production, the baltic dry index will be rendered meaningless. 

As the cloud's ability to respond to real world experiences improve, as the mechanics incorporates more degrees of freedom inspired by nature, as we cross pollinate the possibilities, a whole new exciting world will emerge. Countries and companies that start accumulating IP in these areas will win. 

Siroomba will not be cleaning up just your dining table. Entire industries will be wiped clean and rebuilt. 

Next: Universal Translator.