There is one rule of cinematography which holds since its beginning. You can’t talk to actors on the screen. We broke that rule thanks to AI in our experimental short movie The Story of Alquist.
It was last year when we decided to give ourselves a challenging task. We dreamed about the first artificial actor and we decided to make it into reality. The result of our work is the first short film in which artificial intelligence plays the main role. All of this was possible thanks to unusual cooperation between Czech Technical University in Prague, Rebel & Glory and famous Czech director Jiří Sádek. You can download The Story of Alquist to your smartphone.
How Do We Use AI In The Story of Alquist
The movie is inspired by the Czech theatre play called R.U.R. by Karel Čapek in which the word robot was used for the very first time. The story follows two young people who discover long forgotten mysteries of a robot manufacturing plant. The conversational AI called Alquist takes a role of the narrator who introduces the story and talks with spectators. You can talk with Alquist about the events happening in the movie, backstory of R.U.R. or life of Karel Čapek. It is the first usage of AI in a movie of this kind to the best of our knowledge.
What Consequences Can It Have
Our new application of conversation AI can bring new interactivity into movies, which can lead to the creation of a new section of the movie industry. The viewer can now take an active role in his experience. He can ask additional questions which interest him in documentaries, interrogate suspects in murder mystery games or ask virtual reviewer about a product he is considering to buy.
We don’t think that all movies in the future will have virtual actors and interactive elements. However, we believe that there is a huge opportunity for a new type of experiences which can be discovered thanks to it. And that’s why we did it in in the first place.
Which Tools Did We Use
Alert, technical mumbo-jumbo approaching!
We decided to pack the whole experience as the mobile app which gives us the necessary tools. However, it also brings a whole lot of technical challenges. The most important tools were voice recognition and text-to-speech services. We used the native Android voice recognition developed by Google. It was an easy choice. We had a harder time picking the right text-to-speech technology. We initially experimented with the native Android text-to-speech, but it sounded robotic and the final feeling wasn’t as good as we aimed for. Eventually, we selected Amazon Polly after some testing. Amazon provides their TTS to the third party developers and the quality is significantly better.
We also didn’t want to pay for the cloud infrastructure on which the AI runs. Why? We would have to pay for it. And more people using it means higher demands on the cloud, which means more money. So there was the only place left for our AI. We had to fit it into mobile hardware. It’s a challenging task. We had only a tiny memory and mobile CPU available instead of nearly unlimited resources of cloud infrastructure. Also, the loading time of models was a big issue which we usually don’t face. But we did it!
How Is It Technically Possible
These problems forced us to use the latest techniques of NLP and question answering. We build our conversational AI on top of the Sent2Vec word embeddings. We also used quantization and significant dictionary reduction in order to decrease the memory requirements of word vectors.
These quantized vectors are the main part of our AI. We wrote a lot of examples of message-response pairs in the format which was described in 13 Lessons We Have To Learn From Amazon Alexa Prize (point 11). We expand all messages out of our format and make an average of the word embeddings for each message. So for each expanded message we have one vector. We apply cosine similarity to each of these vectors and vector which we create from user’s input in a similar fashion. We select the answer which is assigned to the vector achieving the highest cosine similarity to the user’s input.
Our main problem was that our examples of messages which are divided into 43 classes expanded into 1.7M vectors. This number was unbearable. We had to reduce it to thousands. Our solution was to use k-means clustering on each class of vectors. This algorithm selected 100 most common vectors for each class. This reduced the number of vectors to 4.3k with a drop in accuracy of around 6%. And when we combined quantization, dictionary reduction and k-means clustering we obtained AI that requires just 250kB for embeddings and 2MB for message vectors. This is a little engineering marvel. A little, do you get it? 😀
Don’t forget to download The Story of Alquist to your smartphone.