Web: https://www.reddit.com/r/LanguageTechnology/comments/sa9n1f/problem_downloading_wikipedia_split_used_for/

Jan. 22, 2022, 7:10 p.m. | /u/shreddedcheese893

Natural Language Processing reddit.com

Hi all,

So I've been trying to download the wikipedia split that is used for evaluating Dense Passage Retrieval (DPR)

I just pulled up a google colaboratory session and followed their instructions to download the dataset as shown below.

python data/download_data.py \

`--resource {key from download_data.py's RESOURCES_MAP} \` `[optional --output_dir {your location}]` 

I don't know why, but for some reason, the colab automatically exits the download cell (Output shows "^C"), and all I get is a .tmp file. …

languagetechnology wikipedia

Data Analytics and Technical support Lead

@ Coupa Software, Inc. | Bogota, Colombia

Data Science Manager

@ Vectra | San Jose, CA

Data Analyst Sr

@ Capco | Brazil - Sao Paulo

Data Scientist (NLP)

@ Builder.ai | London, England, United Kingdom - Remote

Senior Data Analyst

@ BuildZoom | Scottsdale, AZ/ San Francisco, CA/ Remote

Senior Research Scientist, Speech Recognition

@ SoundHound Inc. | Toronto, Canada