Competitive programming is a lucrative business if you have the chops for it. It has however been subject to little analysis AFAIK. This has partially been because of the unavailability of consolidated data set to facilitate this. To resolve that I scraped together data from the
Codechef website amounting to a little over 1 million programs submitted to the site. You can get the dataset on Kaggle (
link to data).
The purpose of this work is to get started with the
OpenAI special project which states
Build an agent to win online programming competitions. A program that can write other programs would be, for obvious reasons, very powerful.
To solve a problem, the first step is always to understand the problem and know it's quirks. This blog post is possibly the first of many exploring this dataset with the ultimate aim of being able to generate a solution program given the statement of the problem.
An Exploratory Data Analysis has been done by
Swaraj Khadanga. You can see it at
Simple EDA.
A brief description of the data
The dataset presents a table with columns as variables and rows as observations, making it
tidy in the process. The columns we are interested in are:
- QCode : The Question Code, denoting a unique question
- Solution : The program written by the user
- Status : Was it a correct attempt or not?
- Language : The language that the code was written in.
Program Visualization