Friday, April 1, 2016

Building a better ‘Tay’

microsoft-tay.jpg (600×450)

Conversation will be our new User Interface. – Satya Nadella

My day is filled with goal-directed tasks.  They range from simple (scheduling, search) to complex (negotiation, persuasion).  The more complex the task, the more likely it is to involve a dialog, passing messages to exchange information, to signal agreement, to assign actions.

True Conversational dialogues are held between people, composed of exchanges involving language, non-verbal cues, emotional nuances, and storytelling asides.  When I turn to the computer, it is usually for simple tasks (search, compose, compute, connect) that are command/respond interactions.  I can’t imagine how those could be made easier by making them conversational.

Microsoft envisions creating conversational apps with three levels of capability to support three different kinds of user tasks.  Mediating people-to-people conversations invokes assistance in checking spelling, making connections, and translating languages.  A step up finds Digital Assistants that know your context, and are able to take on tasks like scheduling appointments, planning travel, discovering music, and finding nearby places to eat.  ‘bots add technical capabilities for machine learning and natural language understanding, and so are able to inform, advise, and anticipate user’s needs, enhanced versions of Cortana / Siri.  (Somehow, they are always depicted as a bit creepy…and too often as young female servants)

cortanahero+copy.jpg (630×349)

Tay was intended to be one of the latter apps, but failed miserably when released into the wild to learn about people.  Set upon by trolls, the algorithm was taught a set of wild untruths, which it happily used without any contextual understanding or moral discrimination.

image

At this year’s Build conference, Microsoft released the libraries and APIs that allow anyone to have  a go at making a better version of ‘Tay’.  There are basic tools for recognizing speech and faces, libraries that sense emotions and others to identify faces and voices.  Some claim to Explore relationships among academic papers, journals, and authors  or Contextually extend knowledge of people, locations, and events

I confess to being especially intrigued by the Text Analytics API: Detect sentiment, key phrases, topics, and language from your text. Imagine imagegetting writing and pitching advice at that level…

If it were true.  I’ve been playing with the Linguistic Analysis tools, parsing structure from sentences.   it’s pretty good on straightforward text.

But, maliciously, throw it some good colloquialisms and its a much bigger challenge:

image image image

Arguably, it’s assigned the tokens in some justifiable way, but I don’t know how you could make sense of the returned array.

I know I stagger through replying to ‘Don’t you want to go with me?’:  Yes, I don’t…not…want to go with you…there, maybe.

And this is, I think, where the project, grandly named Microsoft Cognitive Services, goes wrong.  There is nothing cognitive about deterministic logic, isolated from intuitive thought, consciousness, and free will.  Computers win chess games through deep search, they translate languages by statistical matching, they win at Go by neural network learning

These are marvelous technical advances, but they are pattern matching exercises within closed problem spaces with well-defined rules.  Release the same algorithms onto the road in an autonomous car, and it hits a bus when the situation exceeds it’s experiences.  It can learn, but it can’t intuit.

2 comments:

Austin Wilkinson said...

Really interesting post, Dave.

I agree with your point regarding linguistics, there's a huge gulf between what can be currently achieved and the layers of sophistication we as humans acquire over our first 20 years or so of life... Whether this is simply a matter of complexity (as a physical materialist would argue) or down to something transcendent in humans (consciousness, intuition) as you suggest is moot.

Where I'm not so sure is your allusion to autonomous vehicles. Humans driving cars with all of our vagaries of emotion, ego, distraction, and even "intuition" as you describe has to be literally a "car crash".

Given sufficient sophistication, I think that the safety of vehicle autonomy will outstrip that of humans: easily and soon. Anyway, we'll see!

Take care, Austin

David Hampton said...

Hi, and thanks! I've been watching the technet tutorials for how to interact with the cognitive APIs and they are kind of neat once I started feeding them photos and texts. I'd like to try to build it into some analytics for data sets that I'm working with to see if sentiment adds anything to the discriminant that I'm working on.

I don't disagree about humans being imperfect drivers, perhaps worse so than the autonomous logic in Google's cars. If each makes its own sort of errors, different than the other, which is 'right' in a given situation? You've seen the articles about how people and robots would reach different conclusions in whether to hit one child running into the road when the alternative was to veer into several bystanders. I would likely just react, steering to the side: I couldn't (even on reflection) justify driving ahead. But objective logic might.

'kind of a good'Shrink and Sage' question. I hope that things are going well with all of you.