where I’m getting the data I’m mining
I’ll have a longer post for what I’ve been working on:
I have a good data set from retrosheet.org that I have converted into a database in excel, a pivot table and have been working in Jupyter Notebook.
So: I’ve taken the data through scraping with powershell, Access, Excel, Python and am working on R and KNIME next.
While combing through the data, I’ve been looking for the easiest way to add headers. I needed to add quotes and a comma via vba for an excel macro I was working on. I found the easiest way to do that on this page:
Go to format-> custom
and enter this in the “type” field \”@\”,
The beautiful thing about using Major League Baseball data is that you can cross reference it over a ton of different variables and fields.
I’ve recently downloaded a bunch of tables from http://www.retrosheet.org/gamelogs/index.html
Keep an eye out on what I do with the data via SQL, Python, R, etc.