NeuralNine Shakespeare

We will make a python AI that generates shakespere like poetry.

These are the functions we will use:

ln 4: sequential is the barebones neural network template. We add layers to it to make it the complete neural network.

Ln 5: the different types of layers there are. Biggest difference is that they are made for different types of neural networks. Recurrent is LSTM, dense is core, activation is also core.

Ln 6: RMSprop to optimize compilation of the model. Before we used adam, this time we use RMSprop. Deepmind loves RMSprop.

keras.utils.get_file()

We will get the data from here:

https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt

we could download it, we could also just use tensorflow to get the data straight.

Use tf.keras.utils.get_file()

fname is the filename we give it so you can give it any name. I just name it ‘shakespeare.txt’. Origin is the https website it can use a webcrawler and just download it. It will be downloaded on ur pc anyways.

and yes, it downloads it to your computer so, when you run the program again it is saved in your cache so it doesn’t download again.

https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file

this is where it stores it:

C:\Users\Digit.keras\datasets

know that the filepath variable only stores the file path. To get the binary data, we need to open the filepath and read binary.

so we open it in read binary mode, we read it to get the binaries, decode it in utf-8 which is common for these files and lowercase it. The reason why we lowercase it is because we want to model to predict the next character, and it will be so much of a hassle to train it with capitalization so we just make it lowercase for now to make it easier.

Ok reading the text gives this:

lots and lots of shakespeare texts.

The text is a little too much and will take too long for the neural network to process.

a fuckin million words.

Lets shorten it down to like half a million then.

Ascii to number lookup tables

Ok but the problem is that the neural network cannot read ascii text directly, it must read a numerical value.

so we grab all the characters that the text uses. Remember that there are a couple hundred ascii characters and a lot of them we probably wont even use.

So we only get the character we will use and put them all in a set. Characters don’t repeat in a set so we have only 1 copy of each character.

suprisingly, the only number used is 3.

to map it, my first idea was to use ord() and chr() to convert ascii to decimal and decimal to ascii. However, I learned later that these will be mapped to coordinates and it doesn’t make sense for coordinate indices to jump from like 10 to 194. so instead, neuralnine has this solution of making 2 dictionaries. One for converting decimal from char, another to convert decimal to char

enumerate assigns each character a number. He made a dictionary using 1 line of code.

im gonna do it in 3 lines of code. I don’t like that hypercompactness, makes it difficult to understand yknow.

did the same for the other lookup table

these are what the 2 look like. Chartoindex is the first dictionary, indextochar is the second.

Predict next character

We want to model to be able to predict the next character. So we need a list of sentences and a list of characters. Sentences will be the target, characters will be the future data we use for testing.

We feed the AI like 40 characters from the sentences and we want it to give us the 41st character from looking at each character neuron we give it. How many input neurons we give is always a flat value. So if we give a sentence with 40 chars, there are 40 neurons. I think 40 is a good number, neuralnine thinks 40 is a good number too. make a variable that stores 40.

aswell, we will have a variable called stepsize. The stepsize tells us for the next part of the sentence, how much do we move?

We move 3 steps so if we had a sentence with SEQLENGTH 5 like: nigge

a stepsize 3 would shift the starting character 3 right and then its like: elina

notice 3 steps from n is e in the sentence.

So this is how we implement the algorithm:

it will stepsize 3, grab the current sentence from the point it is at, and save the next character right after the sentence.

The next step is to make the x and y data in numbers.

this next part is difficult for me to understand. 70% understood, transcribed meaning from video.

Lets make our x and y data now.

x is a 3d numpy array. We have a dimension for all possible sentences we can have, we have a dimension for all the individual positions inside the sentence and we have a dimension for all possible characters we can have(from the lookup table).

Whenever in a specific sentence a, at a specific position, a specific character occurs, we want that cell to be true, and all other cells to be false. So for example I have sentence # 5, at position #7 and I want to check if the character has enum #8(from the lookup tables). If it is, then that specific cell is True and all others is false.

Y is a 2d numpy array. This is for the future values. It says at sentence # 5, the future value is character with enum #8.

then we fill up the x and y data

For x:

so for every sentence, you have a 2d list of positions characters the 2d list spans the SEQLENGTH, so 40 and the number of possible characters is around 39? i think? for my case its around 39.

so its 40*39 2 list.

every row in the 2d list is a 39 length row. it is the character enums row. so there is only always 1 TRUE value for every row.

again, this is just turning the strings into numbers, very complicated way but it is organized i guess.

above is an example of one of those character enum rows.

For y:

just grabs the position and the character enum. Pretty similar to the x list just that its missing one dimension(which is sentence #) and it also predicts future values.

I don’t understand this completely, but I will see later in the future.

Building the network

We will build the neural network now.

We get the barebones framework,

then we decorate it. First layer is the LSTM input layer. This is what allows our model to be recurrent. LSTM(long short term memory) it is the memory of our network. It will remember the past input characters. If we didn’t have this, then our model would only look at the input layer it has now, and forget the ones that came before.

I believe the 128 is the neurons for the hidden layer after it. SEQLENGTH is 40, the length of characters is 39(for me) so 40*39 = 1560, that would be the input layer neurons?

We give a dense layer as a hidden layer for more sophistication.

and finally the activation output layer.

we softmax it so it is distributed percentage of all neurons

at last we compile the model.

loss function is categorical cross entropy, and the optimizer we use is RMSprop with a learning rate of 0.01

finally feed it the training data with .fit()

fit the x and y data and the batch size is 256 so we show it 256 examples at the same time, epochs is how much times it will read the same data over and over.

We now want to save the model. Model.save() style

run the program and wait….

The loss is:

not too bad

now instead of building the model over and over again, we can just reference that model is the shakespeare model file.

Generation

What our model does is it takes a sequence then predicts the next character. We need to make it somehow generate text. To do that, we need a helper function:

we create a helper function to grab the most likely outcome from a probability array.

Remember that the output of our network is softmax of the likely characters that will be used.

The helper function takes one of them depending on the temperature, it will pick either very conservative(highest activation value) or somewhat experimental(2nd or 3rd highest activation values)

again, it only picks 1 character.

Higher temperature will pick a character thats a little more risky to pick, lower temperature sticks more closely to the generated results

the last function we need is the function to generate text.

Sample requires a prediciton. Text gen will give it that prediction

it has 2 arguments.

the length of the text we want to generate and the temperature that we directly pass to the sample function.

What we want to do in this function is start with a starting text. The starting text is the input for our neural network. The neural network will predict the first next character. We can make this input the first 40 characters of shakespeare’s text. Everything that follows after that 40 characters is generated by our neural network. If you want a text completely generated by the neural network, you will need to cut off the first 40 characters, its impossible for it to generate on its own without input.

So we will give it the starting text now. Lets first slice the text to get that starting 40 characters.

we grab it directly from the text. Start index is a random character in the text, it can be anywhere except in the final 40 characters because we want to get the 40 characters after it to. Then we slice text from the starting index to starting index + 40 to get our sentence.

Then lets generate the text.

We have a generated string which will hold the final generated sentences.

add the sentence from the original text so we can generate from it.

Now lts generate. Inside a for loop for the length of length. We generate length ammount of characters.

inside this loop, we are going to turn our sentence text into a numpy array

now lets feed that into our model.

predictions we feed it the x value. Then we get the next index which is the number representation of our character, we convert that to the character

we do a few touches to make the loop work for the next iteration.

save the next character in our generated text, then shift the sentence left a bit to include the next character and finally return the generated characters

🍈 Zettelkasten

Explorer

NeuralNine Shakespeare

keras.utils.get_file()

Ascii to number lookup tables

Predict next character

Building the network

Generation

Graph View

Table of Contents

Backlinks