This is a simple demonstration to combine BERT with elasticsearch to improve search quality.
All setups are composed using Docker. In order to replicate the project, please just follow the steps below:
- Download HackerNews public data from Google BigQuery Public Dataset, and save it locally and set the path to dataset as environment variable:
export DATA_PATH=path_to_your_csv
- Download the BERT pre-trained embeddings. There are many pre-trained embeddings
available, for instance, you could use
wget
:
wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip
And then unzip
the folder, and set the absolute path of the folder as environment
variable MODEL_PATH
.
export MODEL_PATH=path_to_your_pretrained_model
- Create search index for elasticsearch, to make elasticsearch work, an index is needed to find search items, so simply do
export SEARCH_INDEX=any_search_index_name
- Move into the cloned repo, build and run dockers, there is the
docker-compose
file which composes of several dockers:
cd HackerBERT
docker-compose build
docker-compose up
- Create search indexes:
python main.py
- Play with it on
http://127.0.0.1:1111