Task Details


All of our datasets (released and evaluation dataset) are sampled from the same billion-scale vector dataset, which consists of Bing queries encoded by Turing AGI v5 that trains Transformers to capture similarity of intent in web search queries.


# Name Description Size
1 dummy-data.bin dummy data for packing submission in reprozip 104
2 contest-data-release-1m.bin medium scale released data 106
3 contest-data-release-10m.bin large scale released data 107
4 secret-1m.bin medium scale data, used for evaluation before March 10 106
5 secret-10m.bin large scale data, used for evaluation after March 10 107

You can use AzCopy for downloading large scale datasets.