You have some experience with R or Python and machine learning basics. This is a perfect competition for data science students who have completed an online course in machine learning and are looking to expand their skill set before trying a featured competition.
Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
The Ames Housing dataset was compiled by Dean DeCock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.
Photo by Tom Thain on Unsplash.
You cannot sign up to Kaggle from multiple accounts and therefore you cannot submit from multiple accounts.
Privately sharing code or data outside of teams is not permitted. It's okay to share code if made available to all participants on the forums.
Team mergers are allowed and can be performed by the team leader. In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed as of the merge date. The maximum allowed is the number of submissions per day multiplied by the number of days the competition has been running.
There is no maximum team size.
You may submit a maximum of 5 entries per day.
You may select up to 2 final submissions for judging.
Start Date: 8/30/2016 1:08 AM UTC
Merger Deadline: None
Entry Deadline: None
End Date: None
It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
The file should contain a header and have the following format:
Id,SalePrice
1461,169000.1
1462,187724.1233
1463,175221
etc.
You can download an example submission file (sample_submission.csv) on the Data page.
Kaggle Learn offers hands-on courses for most data science topics. These short courses prepare you with the key ideas to build your own projects.
The Machine Learning Course will give you everything you need to succeed in this competition and others like it.
Ensemble Modeling: Stack Model Example
A Clear Example of Overfitting
Comprehensive Data Exploration with Python
A Study on Regression Applied to the Ames Dataset