Monday, June 27, 2016

Stateful Parameter in Keras

According to the keras documentation; a stateful recurrent model is one for which the internal states (memories) obtained after processing a batch of samples are reused as initial states for the samples of the next batch. This allows to process longer sequences while keeping computational complexity manageable.  Tome, this is not completely clear,so I wanted to test this parameter to see how it affects a neural net.

I used lstm_text_generation.py as a baseline, and I modified it to  test for stateful=true and stateful=false.  In order to make the code run for stateful=true, I needed to make some changes.


First, I needed to include a parameter batch_input_shape() instead of simply input_shape(), which needed to be passed the batch size, like this:
model.fit(X, y, batch_size=batch_Size, nb_epoch=1,callbacks=[history])
Another requirement of using stateful=true is that the number of samples passed as input need to be evenly divisible by the batch size.  This was problematic since it limited which batch sizes I could use, but I tried a batch size of 518, which is a factor of the number of samples in each iteration, 15022.

This solution didn't quite work, because I got another error:
ValueError: non-broadcastable output operand with shape (1,512) doesn't match the broadcast shape (518,512) 
512 refers to the number of nodes in each hidden layer of the neural net.  The error seems to be a matrix multiplication error.  When the number of nodes in each layer was made 518 (from the original value of 512) to match the batch size, I received a similar error.
ValueError: non-broadcastable output operand with shape (1,518) doesn't match the broadcast shape (518,518) 
 I could not figure out a proper solution to this problem, so I settled with a batch size of 1 so the matrix multiplication would work.  This dramatically increased the computation time, so in order to run the test in a matter of hours instead of days I changed the number of nodes in the neural net to 256 from 512.  I also ran only 10 iterations rather than the original 60.  My results are below:

Notice for both graphs the loss function is increasing.  For stateful=True, the loss increases monotonically, whereas with stateful=False, there are some downward changes, but the trend is overall increasing.  This is the opposite of what I would expect, because the loss should decrease as the network learns.  It's possible it needed more iterations, or that the smaller network kept it from learning, but I'm not yet sure of the reason for the loss increasing.

1 comment:

  1. You wrote: "Another requirement of using stateful=true is that the number of samples passed as input need to be evenly divisible by the batch size. This was problematic"

    So why not simply change the sample size? Doesn't that mean change the total number of characters in the input text? Much easier to do.

    ReplyDelete