We participated in the Alexa Prize again. We developed Alquist 2.0 during which we learned a lot about conversational AI. I would like to share our newly gained knowledge with you. This post is intended as a continuation of Experience taken from Alexa prize. The information contained in the previous post still holds, and I highly recommend it. There are our new eleven points which we find out helpful this year.
1. Divide Dialogues Into Small Parts
We decided to build the system out of a large number of small dialogues focused on a single thing. This means that we have separate dialogues about where user watch movies, about how he chooses a movie to watch or dialogue asking for user’s favorite movie. Each dialogue consists of four turns at max. Their small size brings several benefits. Such dialogues are easy and fast to create, easy to debug and any problem in single dialogue (wrong recognition for example) applies only to the small portion of the whole content of the conversational AI.
2. Create An Interconnected Graph Of Topics
A large number of small dialogues brings a new challenge. You have to combine them somehow to form an interesting dialogue which doesn’t jump between topics randomly. We solved this problem with the Topic Graph. It’s a graph which connects topics and dialogues. A basic building block of the graph is a topic (Movies, Movie, Music, Person, Actor). Each topic contains dialogues. The topic Music contains dialogues about music in general, like the dialogue about user’s favorite genre or dialogue asking if the user is a good singer. The topics are connected by the edges of the graph. There are edges from specific topics to general topics. There is an edge from Movie ( it contains dialogues about a specific movie) to Movies (dialogues about movies in general) or edge from Actor to Person for example.
If we detect that the user wants to talk about Matrix, we select the topic Movie. We randomly select some dialogue from the Movie topic. We select randomly next dialogues when the previous dialogue ends and until we haven’t used all dialogues from Movie topic. When we used all, we can select randomly any dialogue from the topic which is connected by an edge to the Movie topic. It would be any dialogue from Movies topic in our case. We can create the coherent dialogue which smoothly transitions between relative topics by this method. Also, each conversation is slightly different, because we select the next dialogue randomly.
The interesting problem to solve can be to determine the next dialogue which will maximize user satisfaction. This means to use some smarter function instead of random selection. I guess that reinforcement learning might help.
3. Develop Tools To Speed Up Content Creation
We developed 17 topics in 12 months last year. The topics consisted of dialogues represented by state automata, which we created from 60% of premade states and 40% of special hardcoded states. It was a painful and slow task.
This year we had 8 months only. We decided to invest one month to develop a graphical tool which speeded up the development of the dialogues. This increased our productivity to the point that we were able to create 27 topics.
We used the editor to create dialogue trees, from which we later generated training data for our newly developed dialogue manager based on the Hybrid Code Networks.
4. Use Machine Learning As A Fast Way To Make Rules
Machine learning is great if you have a lot of data. But in most cases, you don’t have enough and obtaining more data is costly. This is the reason, why it was important for us to use models which aren’t data hungry like already mentioned Hybrid Code Networks. We used it for dialogue management. It combines learned parts with hardcoded rules. This allowed us to make decisions with a small number of examples.
But the problem with the small number of examples is, that the machine learning algorithm will struggle with unseen examples. However, if you think about it, it is not that huge problem. Why? Because what are your other options? You can throw away machine learning entirely and hardcode some rules. But are you sure that you will be able to write rules to cover all examples? Also, how long will it take?
This is the reason, why we use machine learning. We don’t expect that our machine learning will generalize and handle all examples. We think about machine learning as a method to produce comparable results as hardcoded rules, but which is created automatically and faster. To sum it up, we know that our models aren’t flawless, but they save us development time.
5. Not Everything User Says Is Important
We had a problem with long user’s messages last year. Some users simply answer with long sentences and not everything in that sentence is important. An example of such message can be “You know, I’m a really terrible cook. But I would like to ask you, what’s your favorite food?” The most important part of the sentence is “what’s your favorite food,” and it is much easier to process only this part and throw away the rest. The solution which we used was to split the sentence by punctuations and process only the last part.
But here is a catch. The Alexa’s ASR doesn’t recognize punctuations. It returns tokens only. This was a problem which we had to solve via the neural network which adds punctuation to sentence. The good news is that there is plenty of data for this task. You just need any text corpus which contains punctuation.
6. It’s Hard To Determine That User Doesn’t Answer Your Question
Our AI asks a lot of questions. Some of them are ordinary, some of them are a little bit strange. One of our biggest problems was to detect that user doesn’t answer the question which we asked or that she answers the question but in some way which we didn’t expect. If this happens, our prepared answer doesn’t make any sense. We decided to solve this. Our initial thought was that this is an easy problem to solve. We will create a neural network with two inputs, the question, and reply to it. And the task of the neural network will be to classify whether the reply can be considered an answer to the question. We used all questions and answers in our dataset of dialogues as positive examples and questions with some random sentences as negative examples. We kept it training and voilà, the performance was terrible. Sadly we didn’t have time to find out the reason for this failure. It may be caused by a small amount of data or by unbalanced classes. More effort is needed to crack this important problem.
7. Add Generic Dialogues
Our big goal which we set for this year was to be able to have a dialogue about any entity or topic. This means that we wanted to be able to speak about fishing, iron production, war or Queen Victoria. You can’t prepare dialogues about all topics, and also in many cases, you aren’t even able to find the type of entity which user wants to talk about. We call such entity “Generic Entity.” The only thing which you know about it is its name. Fortunately, you can use the name to find News Articles, Fun Facts, and Shower Thoughts about it. You can combine these into dialogues which work for any topic or entity. The problem is that these dialogues are not funny enough, so they work only for a couple of turns and then you should try to change the topic of the dialogue to the more fun parts of conversational AI.
8. Create Opinions
We noticed that users ask questions like “What do you think about X?”, “What’s your favorite Y?”, or “Do you know Z?” quite often. So we couldn’t keep these questions unanswered.
We solved “What do you think about X” by getting recent Tweets containing X and measuring their average sentiment. We obtained a number between 0.0 and 1.0 and set the two thresholds. If the average was below 0.4 we replied that we don’t like X. If it was between 0.4 and 0.6 we replied that there are positive and negative aspects of X and if the score was above 0.6 we answered that we like X. We had to cache the scores because it was too slow for real-time usage. However, I advise you to be careful with this approach, because the calculated score for “terrorism” lies between 0.4 and 0.6 for example. This resulted in the answer “There are positive but also negative aspects of it.” which is something which you definitely don’t want to answer. We had to filter answer for sensitive topics such like this.
We used the Microsoft Concept Graph for the handling of “What is your favorite Y.” If the Y is Book, we found concept Book in it and answered the entity with the biggest popularity. The answer would be “My favorite book is encyclopedia” in our example.
The last question “Do you know Z” was simple as you would expect. We just found the Z on the Wikipedia and answered by “Yes, I know Z. It’s…”
You may think that these answers are not that much interesting. Yes, you are right, but this was not a goal. The goal was to make a system, which will be able to answer these questions for any entity or topic. Which they do. And production of engaging answers was the goal of dialogues which we executed right after.
9. Create Content For Returning Users
You should also spend some time to come up with the content for the users who already talked with you. We reacted to the returning users last year on a limited scale. This means that we remembered her name and topic which we talked about. We decided to go further this year. Our approach was to use the information which she told us and try to have conversations using this information when she spoke with us next time.
If she told us that she has a pet, we asked how is the pet doing the next conversation. If she told us that she has a sibling, we asked about his name next conversation. This creates the feeling that we remember what the user told us. But this technique is useless if you don’t have good content for the first time users. The reason is that unsatisfied first-time user will never return. So focus on the first-time users first and then add content for returning user. But I think that this advice is quite obvious.
10. Talk About Ordinary Things If You Get Lost
It sometimes happens. You receive some strange input, and despite your hard work all your classifiers fail to capture any topic or appropriate answers. What to do in such case? Our approach was to say “Sorry” and start with some very basic conversation like “Do you drink tap water?”, “Do you have any pet?” or “Do you hit a snooze button in the morning?” We were able to save the day and keep the conversation going thanks to this strategy.
11. Paraphrasing Is Cool Trick
Paraphrasing is a useful trick for communication. It is a restatement of the meaning of the text using different words. Its main purpose is to signalize to the communication partner that we understand him. We implemented it and were surprised that dialogue sounded more natural with it.
Our system paraphrased user’s message “I like pancakes and my parents like cake” by “So you are trying to say to me that you like pancakes and your mum and dad like cake.” The main part of the paraphrasing system was a database containing pairs of phrases which we swapped. There were pairs “I” – “You”, “I was” – “You were”, “my parents” – “your mum and dad” and so on. We also had a database of beginnings of paraphrased sentences containing “So you are trying to say to me”, “So what you’re saying is that” or “If I understand correctly” for example. And if the user’s message was a question (which we determined by the presence of wh-word) we had prepared different phrases like “You’re asking”, “You wanna know” or “You asked me an incredibly interesting question.” We executed the paraphrasing system randomly with a small probability.
These were my eleven tips which we learned thanks to Alexa Prize ’18 and our conversational AI Alquist. I believe that we achieved big progress in reducing the time necessary to create the content of the conversational AI, we created techniques which allow us to talk and have an opinion about any topic or entity and also practically tested several conversational tricks like content for returning users or paraphrasing. I hope that this post will inspire you to take part in the development of conversational AI.
P.S. The last thing which may be interesting for you is the question of how far is conversational AI now? Right now we are building only an illusion of AI which can conversate. We use a lot of tricks to make it look smarter, but in reality, it isn’t. You may think that my answer is very skeptical, but in reality, it isn’t. It means that there is a lot of space for improvements and we will need a lot of clever people to get to our goal, which is truly smart conversational AI. It will be challenging. But remember! We chose this goal not because it is easy, but because it is hard!