Thank you for your suggestion.
The training set consists only of a part of the tobacco documents set.
It is about 2.3GB in size.
For legal reasons, we cannot directly provide you with it. Therefore you have to create it yourself and download all the documents.
If it is too much disk space or transfer, you could try creating the dataset on a virtual machine.