We are from ML challengers Team and are currently working on SAP's Data Anonymization Challenge.
We needed some details in order to complete Task
1. We have worked on and created a generalized Ml algorithm to identify Information and their respective Bounding Boxes,
however our concern is:
1) As per the Dataset in labels.zip file, we have files such as train,test and Val, we are quiet confused regarding their usage, since as per the files content it contains just
the image filePathName and the File type which may not be helpful/sufficient for making any predictions for identifying the Personal Information in any document.The same is useful if we have to identify the file type, which is anyway not mentioned in the challenge description/scope.
2) Regarding the Handwritten Text, can we assume it to be only for English language?
3) Do we have to work only for Invoices file type? As it is the only document type referred in the Challenge.
Please provide us with the necessary inputs so that we can proceed further.
Thanks in Advance
ML Challengers Team