This is a basic distributed training enabled platform for training Tensorflow.
Install OpenMPI 4.x.x or any other MPI implementation.
pip3 install mpi4tf
Clone the mpi4tf
repo then,
python3 setup.py install
In the development mode use the following command to build the libraries.
python3 setup.py develop
mpirun -n 4 python3 test/test_mpi.py
Run with Parallelism 4
./bin/run_mnist_dist.sh 4
As this is a MPI backend you can use all the MPI flags to add different functionality in running experiments.