Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow and modified to work with tensorflow.js and ml5js
Based on char-rnn-tensorflow.
- Set up a python environment with tensorflow installed. This repo is compatible with
python 3.6.x
andtensorflow 1.x
only! This video tutorial about Python virtualenv may help.
RNNs work well when you want predict sequences or patterns from your inputs. Try to gather as much input text data as you can. The more the better. Compile all of the text data into a single text file and make note of where the file is stored (path) on your computer.
(A quick tip to concatenate many small disparate .txt
files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt
)
This first step of using a python "virtual environment" (venv video tutorial) is recommended but not required.
$ python3 -m venv your_venv_name
$ source your_venv_name/bin/activate
Note you can also download this repo as an alternative to git clone
.
$ git clone https://github.com/ml5js/training-charRNN
$ cd training-charRNN
$ pip install -r requirements.txt
$ python train.py --data_path /path/to/data/file.txt
Optionally, you can specify the hyperparameters you want depending on the training set, size of your data, etc:
python train.py --data_path ./data \
--rnn_size 128 \
--num_layers 2 \
--seq_length 50 \
--batch_size 50 \
--num_epochs 50 \
--save_checkpoints ./checkpoints \
--save_model ./models
When training is complete a JavaScript version of your model will be available in a folder called ./models
(unless you specify a different path.)
Once the model is ready, you'll just need to point to it in your ml5 sketch, for more visit the charRNN() documentation.
const charRNN = new ml5.charRNN('./models/your_new_model');
That's it!
Given the size of the training dataset, here are some hyperparameters that might work:
- 2 MB:
- rnn_size 256 (or 128)
- num_layers 2
- seq_length 64
- batch_size 32
- output_keep_prob 0.75
- 5-8 MB:
- rnn_size 512
- num_layers 2 (or 3)
- seq_length 128
- batch_size 64
- dropout 0.25
- 10-20 MB:
- rnn_size 1024
- num_layers 2 (or 3)
- seq_length 128 (or 256)
- batch_size 128
- output_keep_prob 0.75
- 25+ MB:
- rnn_size 2048
- num_layers 2 (or 3)
- seq_length 256 (or 128)
- batch_size 128
- output_keep_prob 0.75
Note: output_keep_prob 0.75 is equivalent to dropout probability of 0.25.
- Blog post describing how to train and use an LSTM network in ml5.js
- Video showing how to train an LSTM network using Spell and ml5.js to generate text in the style of a particular author.