3rd Project of Modern Information Retrieval course.
Crawls research gate and indexes papers in the site. Clusters papers, authors and calculates rank for papers based on their citation and references.
Crawler is written from scratch. Indexing and retrieval is done with elastic search 2.1, web interface is powered by flask and bootstrap, numpy helps a lot in performing ranking and clustering calculations.
Install requirements from requirements.txt file. Creating a python virtual environment is a really good idea.
pip install -r requirements.txt
python ui/ui.py # requires python3.4 or higher
And open http://127.0.0.1:5000/admin/ in your browser. Crawl, calculate page ranks, perform clustering and finally add documents to index. Now your mini version of google can be used. Just point your browser to http://127.0.0.1:5000/search.
Note: You should setup elastic search before adding documents to index. For more information read here