Task Details
All of our datasets (released and evaluation dataset) are sampled from the same billion-scale vector dataset, which consists of Bing queries encoded by Turing AGI v5 that trains Transformers to capture similarity of intent in web search queries.
# | Name | Description | Size |
---|---|---|---|
1 | dummy-data.bin | dummy data for packing submission in reprozip | 104 |
2 | contest-data-release-1m.bin | medium scale released data | 106 |
3 | contest-data-release-10m.bin | large scale released data | 107 |
4 | secret-1m.bin | medium scale data, used for evaluation before March 10 | 106 |
5 | secret-10m.bin | large scale data, used for evaluation after March 10 | 107 |
You can use AzCopy for downloading large scale datasets.