Task Details
All of our datasets (released and evaluation dataset) are sampled from the same billion-scale vector dataset, which consists of Bing queries encoded by Turing AGI v5 that trains Transformers to capture similarity of intent in web search queries.
| # | Name | Description | Size |
|---|---|---|---|
| 1 | dummy-data.bin | dummy data for packing submission in reprozip | 104 |
| 2 | contest-data-release-1m.bin | medium scale released data | 106 |
| 3 | contest-data-release-10m.bin | large scale released data | 107 |
| 4 | secret-1m.bin | medium scale data, used for evaluation before March 10 | 106 |
| 5 | secret-10m.bin | large scale data, used for evaluation after March 10 | 107 |
You can use AzCopy for downloading large scale datasets.