More of a note for new folks entering the competition but it would be incredibly helpful if there was a smaller sample training set that we could download and test things out on before needing to write 80GB to disk!
Alexander Kreuzer
Hi lh42,

Thank you for your suggestion.
The training set consists only of a part of the tobacco documents set.
It is about 2.3GB in size.
For legal reasons, we cannot directly provide you with it. Therefore you have to create it yourself and download all the documents.
If it is too much disk space or transfer, you could try creating the dataset on a virtual machine.

- Alex

Understood! I'm all set now but thought that I should at least ask.
