trim conceptnet's ~34,000,000 multilingual assertions (about 10gb of tsv) into a tidy ~3,400,000 english-language assertions (in json format).
- clone this repo
- download the latest version of conceptnet (5.7.0 at the time of writing)
- extract it to
data/assertions.csv
in the root of this repo - run
cargo run -r
to run in release mode. the trimmed assertions will be written todata/trimmed.json
.
or download a pre-trimmed file from the releases page.