Access and SQL

basic excel data manipulation in sql, access, python, R and KNIME

I had something verified the other day: I needed to know and understand “database 101” more. I haven’t done enough basic manipulation to really understand R, Python, etc.

Thanks to being told that at my new day job, i decided to do a lot of work on my access, sql and excel basic skills…and then translate that to Python, R and KNIME.

First: I am currently working with a vast array of tables in different databases trying to do a reconciliation. I have managed a large database before but it was a totally different software structure with different set processes.

I decided to create a situation as close to what I work during the day as I can at home.

I downloaded the game logs from Major League Baseball history stats site retrosheet.org. I have different sets of data in different relational tables that stretch back as far as 1871.

manipulating xlsx files in powershell, access, sql, excel, python, R and KNIME

I downloaded them all through powershell using the invoke web-request command. Information on it is available here and here. Why use it? I wanted to use new (to me) tools and scripts to manipulate the data from retrosheet. That’s the only way I learn.

Once I downloaded the sheets, I saved them in .xlsx format. I also have some copies in CSV.

Due to the day job, I needed to focus more on Access, sql and excel more then anything else.

If you want the data and retrosheet.org gives an open license for their datasets here. They only ask that this disclaimer be used:

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

sql, access and excel

To start, I have two tables and excel sheets I’m working with: data from the 1871 and 1960 major league baseball seasons. Here is an example:

dateday of weekvisiting teamhome teamvisiting scorehome score
18710504ThuCL1FW102
18710505FriBS1WS32018
18710506SatCL1RC1124
18710508MonCL1CH11214
18710509TueBS1TRO95
18710511ThuCH1CL11810
18710513SatWS3CL1128
18710513SatCH1FW1145
18710515MonWS3FW1612
18710516TueTROBS12914
18710516TueWS3CH1414
18710517WedWS3RC11215
18710518ThuNY2TRO143
18710519FriWS3CH179

Of course…the tables are much larger than this. Over 100 columns and rows. They have the same headers…but of course not the same information in each column.

First, what kind of sql joins can I use in Access to mimic what I’m doing at work?

Here is a fairly simple left join sql script:

SELECT

GL1871.ID,
GL1960.date,
GL1960.[visiting team],
GL1960.[home team],
GL1960.[visiting score],
GL1960.[home score],
GL1871.[forfeit info]

FROM GL1871 LEFT JOIN GL1960 ON GL1871.ID = GL1960.ID
;

I am joining data from 1871 and 1960 on the unique identifier ID. I am maintaining the ID column from 1871 and matching all the data from 1860 to the same column because the unique identifiers are the same. (I added the forfeit column since there is so little data) Here is the output in the xlsx file format:

IDdatevisiting teamhome teamvisiting scorehome scoreforfeit info
119600412SLNSFN13
219600413CHNLAN40
319600413SLNSFN16
419600414SLNLAN23
519600414MLNPHI45
619600414CINPIT013V
719600414CHNSFN65
819600416SLNLAN23
919600416MLNPHI133
1019600416CINPIT113
1119600416CHNSFN16
1219600417SLNLAN57
1319600417MLNPHI84H
1419600417CINPIT05
1519600417CINPIT56
1619600417CHNSFN34
1719600418BOSWS1110
1819600419WS1BAL23
1919600419NYABOS84
2019600419KC1CHA910
2119600419DETCLE42
2219600419PHIPIT43
2319600419LANSFN40
2419600419CHNSLN25
2519600420WS1BAL87
2619600420NYABOS17
2719600420DETCLE64H
2819600420CINMLN105
2919600420PHIPIT24
3019600420LANSFN01
3119600421WS1BAL65
3219600421NYABOS40
3319600421KC1CHA56H
3419600421CINMLN02
3519600421PHIPIT511
3619600422CHADET56H
3719600422CLEKC114
3819600422BALNYA05
3919600422BOSWS154
4019600422SFNCHN108
4119600422CINPHI710
4219600422MLNPIT26
4319600422LANSLN711
4419600423CHADET26
4519600423CLEKC156
4619600423BALNYA23
4719600423BOSWS183
4819600423SFNCHN182
4919600423CINPHI23
5019600423MLNPIT45
5119600423LANSLN59
5219600424CHADET412
5319600424CLEKC170
5419600424BALNYA915
5519600424BOSWS11011
5619600424SFNCHN49
5719600424CINPHI59
5819600424CINPHI104
5919600424MLNPIT37
6019600424LANSLN27
6119600425CHAKC116
6219600425BALWS132
6319600425SFNSLN98
6419600426CLEDET21
6519600426CHAKC180
6619600426BOSNYA75
6719600426BALWS11110
6819600426LANCHN81
6919600426MLNCIN43
7019600426SFNSLN24
7119600427BOSBAL38
7219600427DETKC113
7319600427WS1NYA54
7419600427LANCHN94
7519600427MLNCIN85
7619600427PITPHI32
7719600428BOSBAL16
7819600428CLECHA13
7919600428DETKC112
8019600428SFNLAN75
8119600428PITPHI30
8219600429NYABAL12
8319600429DETCHA13
8419600429KC1CLE45
8519600429SFNLAN210
8619600429PHIMLN35
8719600429CHNSLN616
8819600430NYABAL160
8919600430WS1BOS12
9019600430KC1CLE12
9119600504CINSFN32
9219600504PHISLN35
9319600430PITCIN127
9419600430SFNLAN63
9519600430CHNSLN45
9619600501NYABAL59
9719600501DETCHA36
9819600501DETCHA25
9919600501KC1CLE46
10019600501KC1CLE13
10119600501PITCIN132
10219600501PHIMLN45
10319600501CHNSLN54
10419600502CHABAL611
10519600502CLEWS132
10619600502PHICHN78
10719600502CINLAN65
10819600502MLNSFN05
10919600502PITSLN34
11019600503CHABAL53
11119600503KC1BOS34
11219600503DETNYA310
11319600503CLEWS167
11419600503PHICHN109
11519600503CINLAN53
11619600504CHABAL46
11719600504KC1BOS53
11819600504DETNYA24
11919600504CLEWS167
12019600504PITCHN15
12119600504MLNLAN21
12219600505CLEBAL116
12319600505CHAWS153
12419600505PITCHN97
12519600505MLNLAN34
12619600505CINSFN43
12719600505PHISLN31V

Since 1960 was a world series year for my hometown Pittsburgh Pirates, I wanted to run a few formulas:

My data has Pittsburgh playing at home 10 times in the query and scoring 63 runs.

I also wanted to create an access search form…

 

Leave a Reply