We encourage you to integrate these into your solution. We are not expecting the winning solution to be built from scratch.
The terms of the challenge require you to grant us a royalty free and perpetual license to commercially use what you submit for this challenge. This also includes transfer and assignment rights in case we want to use it in a subsidiary or a JV. It is also important if our company is acquired. The acquiring company will want us to assign the license so they can continue to run the company seamlessly. Our license must also be exclusive for any use in the Energy industry.
We are not taking ownership of any part of your submission. You continue to own it and can exploit it as you wish so long as it doesn't violate our license.
In the Challenge agreement, all competitors are required to grant an exclusive license for the Energy Industry for their submission to Dynamic Risk. This does not mean that Dynamic Risk will own the solution, the competitor will still own all of the IP and the code for the solution itself.
This does not mean that Dynamic Risk will accept all license grants either. It is unlikely that we will want a license for a submission that is not the winner, but it could be a possibility so we want to leave that open.
The intent of the exclusive license is to discover a usable solution for our business while at the same time we do not want to encourage the development of a solution that will be available to our competitors.
The HeroX team also has standard competition terms which require all competitors to submit a license as a condition of entry. From their experience, based on past challenges, I am told this is necessary to ensure the intent of a competitor is to win the challenge rather than joining to access data and collect feedback at no cost and intentionally come in second. The exclusive license requirement, is one extra condition that we added because we do not want to lose the competitive advantage we gain by sponsoring this Challenge.
There are some concerns stated that we could obtain a license without awarding the prize. This is not the intent. If there was a solution that met our business needs, we certainly will award the prize. However, the problem is quite difficult and it is possible that no solutions will meet our needs. In this case, we will consider extending the timeline for the challenge, or funding some of the best approaches separately. Our intent is to incent the development of an appropriate solution.
If you wish to suggest a licensing solution that meets our competitive advantage concerns we are willing to consider it. Perhaps a solution is an exclusive license only for the winning solution and the standard non-exclusive license grant for the non-winning solutions? I would love to hear your thoughts.
The fundamental research and papers presented are open source and in the public domain. However, what we are looking for is a usable system that a reasonably skilled user can operate. An exclusive license for that executable should be granted to us and should not violate open source agreements. Please contact us directly at firstname.lastname@example.org should you have any questions or concerns.
The judging is split into three parts:
1. Accuracy (60%)
Your trained solution will be used to process a separate set of documents that are of the same type as the training documents provided. This will be compared to a database which we have created that represents the correct solution for the data extraction. The two databases will be compared field by field to calculate Precision and Recall statistics. A simple definition of these parameters can be found here: http://en.wikipedia.org/wiki/Precision_and_recall. These statistics will be combined into a single number with a f measure that is weighted 2 to 1 in favor of Precision. This will serve as the value for this element of the competition.
2. Usability (15%)
We intentionally left the user interface requirements vague because we wanted competitors to focus on the data processing accuracy for the challenge rather than build an elegant UI that is consistent with a finished commercial product. The weighting of this is relatively low compared to the quality of the data extracted to reinforce that. The user interface can be anything you wish. Command line is fine. A simple graphical UI is fine as well even if it is rough around the edges. We are a dev shop so it doesn't need to be pretty. But it does need to work reliably and it should be reasonably easy to figure out by someone with programming skills. The scoring for Usability will be based on how quickly one of our engineers can get up to speed. Therefore a pretty design in the UI won't likely result in an advantage in the scoring.
3. Questionairre responses (25%)
We are specifically grading for the extensibility of the solution to other domains, for other document types, other languages, the ability to extend to charts/drawings/graphs, the ability to derive context from the text, etc. This functionality is not required for the challenge but if the possibility is there to extend and improve it over time to cover these things then we are more interested in this solution vs one that cannot. In other words, the more generic the solution is, the higher it will grade. If it is a custom solution that will only work for the datasets in the challenge, then it will not grade well and it won't likely pass judging.
The documents for this challenge will be presented in English. However, submissions will be evaluated for their ability to process in other languages.
We have found that the following FTP clients will connect successfully to our server:
Windows 7: Firefox (v39), Chrome (v32.x), and IE 11.x
OSX: Chrome (v43.x). Note Safari has trouble connecting. We do not recommend using Safari.
We chose two datasets that we thought a layperson could understand. The first dataset is real estate listings which everyone should be familiar with. The second data set is incident reports which are technical, but are not so technical that the average person couldn't understand the complete content. We did not annotate the fields on purpose because that is part of the scoring for the Precision and Recall statistics we are calculating. In some cases, the adjectives, adverbs, and interjections used in the sentences can convey some data as well. We are curious to see which solutions are able to capture those subtleties.
For judging, we prefer that you submit remote access to your working solution. If it is web based that is great too. In order to streamline the evaluation process, we would like to eliminate the need to duplicate your environment to install your deliverables.
We are targeting a week to review each submission. It will depend on the timing of the submissions by the competitors. If there is a flood all at once, it may take a little longer. We will be in contact with each team to let them know when to expect a response.