Google(mini)

3rd Project of Modern Information Retrieval course.

Crawls research gate and indexes papers in the site. Clusters papers, authors and calculates rank for papers based on their citation and references.

Crawler is written from scratch. Indexing and retrieval is done with elastic search 2.1, web interface is powered by flask and bootstrap, numpy helps a lot in performing ranking and clustering calculations.

How to use

Install requirements from requirements.txt file. Creating a python virtual environment is a really good idea.

pip install -r requirements.txt
python ui/ui.py  # requires python3.4 or higher

And open http://127.0.0.1:5000/admin/ in your browser. Crawl, calculate page ranks, perform clustering and finally add documents to index. Now your mini version of google can be used. Just point your browser to http://127.0.0.1:5000/search.

Note: You should setup elastic search before adding documents to index. For more information read here

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
clustering		clustering
crawler		crawler
docs		docs
elastic		elastic
pageRank		pageRank
ui		ui
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
settings.py		settings.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google(mini)

How to use

Contributors

About

Releases

Packages

Contributors 3

Languages

MJafarMashhadi/MiniGoogle

Folders and files

Latest commit

History

Repository files navigation

Google(mini)

How to use

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages