Last week I pushed to GitHub a repo of the 2013-2014 NBA Schedule by team in JSON and CSV formats. The schedules were announced a few days earlier and at the time they weren’t available anywhere to download. So I created a few Python scripts that scraped the schedule from ESPN.com and included information about where each game would be played, including coordinates geocoded from the Google Maps API.
Not entirely sure why I did this! Well, I’m a total basketball nerd for one. And I was annoyed that the data wasn’t available in a format that it could easily be used. Since then basketball-reference.com and the team sites on nba.com have included CSV downloads, but the formatting differs from team-to-team and sometimes game times are in the local time of the team, which is annoying. But my repo is standardized and would be useful if you were making a NBA app. And I happen to have one that is due for a re-write for the next season.
Storing shareable, scraped, static data (the three S’s!) on GitHub has me thinking of what else I can do. What’s stopping me from updating these schedules weekly with results once the season starts? What about scraping ALL of the data for each game? Why not? Storage is free and once I write the script it would be a snap to update.
I’ll have to think about this as the season approaches. In the meantime hopefully someone comes across this repo out of the millions on GitHub and finds it useful.Send a pull request for this post on GitHub.