The uncanny Mr. Turing and his deep learning of with LSTM neural networks

I (kwatters) recently attended some training classes around a deep learning framework called Deeplearning4j.  It was provided by the company SkyMind.IO, they’re the ones that created, maintain, and support the dl4j open source project.
The class generally covered many topics around neural networks and training them.  As some background a neural network is a data structure that tries to model how a brain works.  It models individual neurons and their connections.  The idea is that each neuron has a bias, and it’s connections to other neurons has a weight.  These networks have shown that with training data and some fancy math, you can adjust the biases and weights of the neural network so that it gets really good at modeling the training data.  This means that you can use this trained network to classify new data that it hasn’t seen yet.  There are many types of neural networks such as feed forward networks, or convolutional networks.
One network that I found was particularlly interesting was a particular type of recurrent neural network (RNN) called Long Short Term Memory (LSTM) networks.  These LSTM networks change the model of the neuron slightly differently so that the neurons have a little bit of short term memory associated with them, such as a previous input value.  Because there is some memory, it means that these types of networks are good at dealing with data that has a definite direction in time.  One such type of data could be things like temperature measurements over time.  In the same way, textual data has a temporal nature in that sentences are read from left to right.  (except for a few languages)
So, I began looking for training data that I could use to train a chatbot using one of these LSTM networks as it’s brain.  Luckily captain grog has about 300k messages from the shoutbox over the past few years!
I made a few changes to one of the examples so that it could read the training data in from the shoutbox history.  The way it works is, we give the model some text, and ask it what it thinks the next letter is in the sequence.  We repeat this task until the model generates a few hundred characters.
At first the model just generates garbage.. but after a few iterations of training, it begins generating things that look like words.. a few more iterations, and the words start being spelled correctly, a few more iterations and some loose grammar begins appearing from it’s responses.
One other thing that’s very interesting about this is that the input training data was in a JSON format that looked like the following
  “msg”: “I am GroG”,
  “msg”: “GroG I am”,
Interestingly, very early on in the training iterations of the model, it was able to generate sequences of text that contains valid JSON syntax…  Imagine that, not only did the model learn how to generate words and sentences, but it did so by generating valid JSON including the ending quote mark and comma.
So, we started training and along the way we would pass it the following string
  “msg”: “ahoy!”,
  “msg”: “
and ask it what comes next.  Early on in the training (after iteration 39) it spit out it’s first words..

Here you can see everything after the first quote mark on the second line was generate by the model and it’s basically gibberish…
After iteration 359 , it’s learned that each line should start with “msg”… and should end with a “,

But in deep learning, assuming you’re not overfitting the model, the model gets better as you train it more… so we fast forward.. here’s iteration 759 we see the model , for the first time, says InMoov

Around iteration 1599 it seems to say MyRobotLab (almost ) for the first time.

At iteration 1759 it says Gael for the first time..

Iteration 5239 it seems to associate Grog as being a bike rider?!  What , how?!

After 20000 iterations.. it almost seemed like it was talking ..

So, keep in mind, the first message “ahoy” is the seed that generates the rest of the text.  In examples above you’ll see it usually generated about 4 or 5 additional messages , as you ask it to generate more text, it starts loosing its mind a bit, which is why the last message in each of these outputs starts looking like the bot got drunk somewhere between the first thing it said, and the last thing it said …
I just wanted to share some of these responses,  I’ll be playing with this technology a bit more and seeing how we can make it more useful.  Another thing that might be interesting is to have it train on the blog posts here, so it would generate a blog post, rather than a shoutbox message.  I’m still blown away at how the network figured out the json syntax…
This LSTM network is largely modeled after the work documented here :
I for one…