lh42

Smaller subset of the images to train on

More of a note for new folks entering the competition but it would be incredibly helpful if there was a smaller sample training set that we could download and test things out on before needing to write 80GB to disk!

2 Replies

Alexander Kreuzer
moderator
Hi lh42,

Thank you for your suggestion.
The training set consists only of a part of the tobacco documents set.
It is about 2.3GB in size.
For legal reasons, we cannot directly provide you with it. Therefore you have to create it yourself and download all the documents.
If it is too much disk space or transfer, you could try creating the dataset on a virtual machine.

- Alex
lh42
Understood! I'm all set now but thought that I should at least ask.
Let these people know about your message