Geographic Information Systems (GIS) power our world. GIS helps us find the quickest way to a destination, map out property boundaries across a county, and even allows emergency responders to better prepare for natural disasters. GIS is the underbelly of so many functions we rely on, and yet, it still has a long way before being fully optimized, reliable, and efficient for daily tasks, or major problem-solving.
The GIS Solutions Challenge is seeking innovators to build a set of tools which the open source GIS community can use to discover specific, scalable, useful, and reliable business insights.
Extremely large organizations using GIS have developed internal systems to increase the accuracy, efficiency, and reliability of their GIS processes when handling large amounts of data. While many large organizations utilise expensive GIS systems, many resource-constrained organizations and individual innovators turn to open source and affordable platforms. Although small organizations have access to open source GIS tools, these technologies do not allow for the analysis of large datasets. Be it a lack of computational power, speed, or accuracy, current open source tools for smaller organizations are lacking. Bringing the open-source and GIS communities together to solve this issue can not only help us at Tax Management Associates derive new business insights in for local governments, but it can help to improve the very systems of direction, safety, and business we all rely on.
The GIS Solutions Challenge asks innovators to develop scalable, efficient, and effective open source tools that generate useful business insights from geospatial data, which can solve three specific GIS problems for large datasets (please see the challenge guidelines for a complete description):
Innovators will be provided three sample data sets to solve the above challenge and will be asked in Phase 1 to create and share a proof-of-concept, which can then be used in Phase 2, where innovators will need to develop a fully functional GIS solution that will be tested against a number of technical requirements, such as efficiency, effectiveness, usefulness, innovativeness, and accuracy, among other factors. Competitors can enter Phase 2 even if they did not enter Phase 1. Beyond a cash prize, the winners will have contributed to creating an open-source GIS solution that that can benefit people and organizations globally.
The GIS Solutions Challenge asks innovators to develop scalable, efficient, and effective open source tools that generate useful business insights from geospatial data, which can solve the three specific GIS problems listed below.
Competitors will need to develop an solution to answer one or more of the following questions using an open source analytics platform:
1) What is the geodesic distance between two features?
E.g., A particular street corner in Detroit is known to be a crime hotspot. How far is this hotspot from the area the police actively patrols? This distance would be measured as a straight line from point to the edge of a polygon.
2) What is the network distance between two features?
E.g., What is the actual distance police must travel from the edge of their patrol to reach a crime hotspot? This distance would take into account the specific route the police must travel to reach the hotspot.
3) Is a point inside or outside a polygon?
E.g., Is the crime hotspot within a police patrol area?
The timeline for the challenge can be viewed here.
Provide a proof-of-concept for your GIS Solution. This needs to include a diagram explaining the solution, a sample workflow of your solution, and descriptive information, including why the business insight generated from your tool will be useful to your target audience.
Develop a fully functional GIS Solution. This involves submission of open source code for the solution and finalised documentation, including metadata on each aspect of your solution. Solutions will be evaluated based on speed, scalability, overall architecture, and the business insight they produce. Please see complete judging criteria below.
All solutions must adequately meet the Phase 2 Minimum Requirements in order to be eligible for a prize. Solutions which meet these Phase 2 Requirements will be ranked against the Judging Criteria below.
Java, R, or Python are the acceptable languages for the challenge. These languages have been selected for their broad open source community in relation to data science, as well as their compatibility with the KNIME Analytics platform. It is possible to use another language so long as that language can be run in a Java, R, or Python environment (such as Scala can be compiled into a JAR and run with Java).
Submissions should be contained in a git repository hosted on https://github.com. The repository can be public, or marked as private and shared with tma1-dev. At the root of the repository should be a README file that contains instructions for configuring and running the solution (see Documentation below for README requirements).
The solution must be able to run in a linux terminal (headlessly). The headless part of the solution will be used to create evaluations for quantitative judging metrics.
Although not a requirement, it is strongly recommended that your solution integrates with the KNIME platform. Integration with the KNIME platform makes GIS tools more accessible to local governments and other end users. Integration with the KNIME platform can be done as a new KNIME node, as a workflow containing the solution, or simply as a set of instructions that explains how to integrate the source code of the solution into a given set of KNIME nodes.
At a minimum, the solution must accept CSV and shapefiles as input and generate CSV files as output. Additional marks will be given to solutions that accept other types of data input and generate more user friendly output (e.g., map plots like a png or shapefile) as detailed in the judging criteria.
All components of the solution must be freely available for commercial use or licensed as LICENSE_LIST_GOES_HERE.
Submissions must be well documented for ease of use and ease of understanding. A submission with poor documentation will not be eligible for a prize.
Innovators must have a README file at the root of their git repository that contains instructions for setting up the solution. The README must contain the following:
Innovators must also provide USAGE documentation either in the README file or separately. Code should be well commented.
All solutions will be tested headlessly in a linux terminal on Google Cloud Compute Engine using an n1-standard-4 instances in the us-east1-c region. The instances will run Ubuntu 18.04 and KNIME Analytics Platform (Desktop) 3.6. For the terminal based installation, any relevant dependencies will be installed based on the solution’s README in order to run the solution in a headless fashion. We reserve the right to reject dependencies that require insecure configurations to run (such as adding an unknown apt repository).
Solutions will be evaluated against a baseline and against all other submissions for speed. Each solution will be evaluated for accuracy, and scalability. For the Detroit crime dataset, comparing geodesic distances of patrol area centroids to crimes for 10,000 data points of crime takes 145 seconds on average according to our baseline. Performing a spatial join of 162,449 points from the Africa conflict dataset to a shapefile of the African Continent took an average of 6.4 seconds
Quantitative evaluation will be performed using the criteria provided below. Your solution will be ranked out of all available solutions and your position in the ranking will determine your score. In order to be eligible for a prize, your solution must meet or exceed the baseline for speed, accuracy, and scalability as detailed in the Performance section above.
Qualitative evaluation will be performed by the judging panel. Solutions will be scored based on their ability to meet or exceed the judging criteria
Ease of Understanding
We have included to sets of sample data for use when testing your submission. These sets are for sampling and testing purposes, and other datasets are welcome and encouraged. If possible, we strongly recommend providing any additionally tested data sets with your solution.
The Africa Excel file dataset contains 150,000+ incidents of conflict that have a geolocation available as a longitude and latitude. We have also provided a ESRI Shape file format of the African continent which was sourced from http://www.maplibrary.org/library/stacks/Africa/index.htm. These sets of data have been provided to explore point-in-polygon, resource leveling (optimal distance to center of most conflicts), and other solutions.
The Detroit CSV file contains 130,000+ crimes committed in Detroit available with a geolocation available as longitude and latitude. These can be juxtaposed with the Detroit patrol areas shape files that have also been provided. Some of the solutions that can be explored in this dataset are point-in-polygon as well as various distances. These distances can be point to point or point to closest polygon edge. They can be computed using euclidean, geodesic, or network distance algorithms. (In this case, the network is Detroit streets, a dataset that has not been provided). All of the Detroit data was gathered from https://data.detroitmi.gov/.
The challenge is open to all adult individuals, private teams, public teams, and collegiate teams. Teams may originate from any country. Submissions must be made in English. All challenge-related communication will be in English.
No specific qualifications or expertise in the field of GIS is required. Prize organizers encourage outside individuals and non-expert teams to compete and propose new solutions.
To be eligible to compete, you must comply with all the terms of the challenge as defined in the Challenge-Specific Agreement.
Registration and Submissions:
Submissions must be made online (only), via upload to the HeroX.com website, on or before the deadlines outlined in the Timeline. Please see the submission form for any document upload format requirements. No late submissions will be accepted.
Intellectual Property Rights:
If an innovator is awarded a prize, the Sponsor will require all content and assets submitted as part of a Finalist’s Submission to be released under open source licenses that permit free distribution, derivative works, and use in commercial and non-commercial settings. Please see the Challenge-Specific Agreement for complete details.
All Innovators are welcome and encouraged to depend on or make use of other components, libraries, content, assets, and code. All such materials must be available under any Open Source Initiative (OSI) or Creative Commons license compatible with the OSI or Creative Commons license under which the Submission will be released. “Compatible” means that each Innovator’s entire Submission must be usable without violating the license terms of those components licensed under the CC BY 4.0 license, Apache License 2.0, or respective OSI license for the components. Source code licensed under the LGPL, BSD, MIT, or Apache licenses currently meets this criterion; other open source licenses may also meet it. If Innovators make modifications to existing open source projects, they are strongly encouraged to submit patches upstream and work to have them accepted. Patches that are not accepted upstream may be submitted as part of the code developed by the Innovator, under the same Apache License 2.0. Content and assets must be licensed under terms that permit commercial usage. The Creative Commons CC BY and CC-BY-SA licenses currently meet this criterion. Innovators cannot submit entries that include or rely on software or content that is either closed-source, proprietary, illegally sourced, or depends on per-seat licensing.
Selection of Winners:
Based on the winning criteria, prizes will be awarded per the Judging Criteria section above. In the case of a tie, the winner(s) will be selected based on the highest votes from the Judges.