Wednesday, April 20, 2011

Retrosheet MySQL Queries

Here are a few Retrosheet queries I have found helpful. You can download Retrosheet using the steps outlined here. These queries assume you followed those exact steps. If your columns are named something different, you'll have to change that part of the code a bit. All of this was done by trial and error, so there may be a better way to do the code.




Add Season Column:
This should add in a column on the events table that puts the year next to the Game ID.
ALTER TABLE events
ADD Season YEAR AFTER GAME_ID;
UPDATE events
SET Season = (SELECT MID(GAME_ID,4,4))

Add Pitcher Splits:
This should add pitcher splits per year for Left/Right matchups.
CREATE TABLE Pitcher_Splits(
SELECT PIT_ID,
CONCAT(id.first," ",id.last) AS Pitcher, Season, BAT_HAND_CD AS Batter_Hand,
COUNT(IF(BAT_EVENT_FL='T',1,NULL)) AS PA,
COUNT(IF(AB_FL='T',1,NULL)) AS AB,
SUM(EVENT_RUNS_CT) AS R,
COUNT(IF(H_CD<>'0',1,NULL)) AS H,
COUNT(IF(EVENT_CD='20',1,NULL)) AS 1B,
COUNT(IF(EVENT_CD='21',1,NULL)) AS 2B,
COUNT(IF(EVENT_CD='22',1,NULL)) AS 3B,
COUNT(IF(EVENT_CD='23',1,NULL)) AS HR,
COUNT(IF(RUN1_SB_FL='T',1, NULL)) AS SB1,
COUNT(IF(RUN2_SB_FL='T',1, NULL)) AS SB2,
COUNT(IF(EVENT_CD='6',1,NULL)) AS CS,
COUNT(IF(EVENT_CD='14',1,NULL)) AS BB,
COUNT(IF(EVENT_CD='3',1,NULL)) AS K,
SUM(H_CD) AS TB,
COUNT(IF(DP_FL='T',1,NULL)) AS GDP,
COUNT(IF(EVENT_CD='16',1,NULL)) AS HBP,
COUNT(IF(SH_FL='T',1,NULL)) AS SH,
COUNT(IF(SF_FL='T',1,NULL)) AS SF,
COUNT(IF(EVENT_CD='15',1,NULL)) AS IBB,
COUNT(IF(EVENT_CD='18',1,NULL)) AS RBOE
FROM EVENTS, id
WHERE EVENTS.PIT_ID=id.id AND (BAT_HAND_CD="L" OR BAT_HAND_CD="R")
GROUP BY Season, PIT_ID, BAT_HAND_CD);

Add more splits:
This should add some empty new columns to the new Pitcher_Splits table.
ALTER TABLE Pitcher_Splits
ADD SB SMALLINT AFTER SB2;
ALTER TABLE Pitcher_Splits
ADD KPERBB FLOAT(3,2) AFTER K;
ALTER TABLE Pitcher_Splits
ADD BA FLOAT(4,3) AFTER KPERBB;
ALTER TABLE Pitcher_Splits
ADD OBP FLOAT(4,3) AFTER BA;
ALTER TABLE Pitcher_Splits
ADD SLG FLOAT(4,3) AFTER OBP;
ALTER TABLE Pitcher_Splits
ADD OPS FLOAT(4,3) AFTER SLG;
ALTER TABLE Pitcher_Splits
ADD BABIP FLOAT(4,3) AFTER RBOE;

Fix SB:
This creates a new column that adds the two partial stolen base columns and then drops the useless one.

UPDATE Pitcher_Splits
SET SB=SB1+SB2;
ALTER TABLE Pitcher_Splits
DROP COLUMN SB1;
ALTER TABLE Pitcher_Splits
DROP COLUMN SB2

Fill Calculated Columns:
This fills in the columns you created above.
UPDATE Pitcher_SPLITS
SET KPERBB = K/BB;
UPDATE Pitcher_SPLITS
SET BA=H/AB;
UPDATE Pitcher_SPLITS
SET OBP=((H+BB+HBP)/(AB+BB+HBP+SF));
UPDATE Pitcher_SPLITS
SET SLG=((1B+2*2B+3*3B+4*HR)/AB);
UPDATE Pitcher_SPLITS
SET OPS=OBP+SLG;
UPDATE Pitcher_SPLITS
SET BABIP=((H-HR)/(AB-K-HR+SF));

Add Hitter Splits: 
Add yearly L/R splits:
CREATE TABLE Hitter_Splits(
SELECT BAT_ID,
CONCAT(id.first," ",id.last) AS Hitter, Season, PIT_HAND_CD AS Pitcher_Hand,
COUNT(IF(BAT_EVENT_FL='T',1,NULL)) AS PA,
COUNT(IF(AB_FL='T',1,NULL)) AS AB,
COUNT(IF(H_CD<>'0',1,NULL)) AS H,
COUNT(IF(EVENT_CD='20',1,NULL)) AS 1B,
COUNT(IF(EVENT_CD='21',1,NULL)) AS 2B,
COUNT(IF(EVENT_CD='22',1,NULL)) AS 3B,
COUNT(IF(EVENT_CD='23',1,NULL)) AS HR,
SUM(RBI_CT) AS RBI,
COUNT(IF(EVENT_CD='14',1,NULL)) AS BB,
COUNT(IF(EVENT_CD='3',1,NULL)) AS K,
SUM(H_CD) AS TB,
COUNT(IF(DP_FL='T',1,NULL)) AS GDP,
COUNT(IF(EVENT_CD='16',1,NULL)) AS HBP,
COUNT(IF(SH_FL='T',1,NULL)) AS SH,
COUNT(IF(SF_FL='T',1,NULL)) AS SF,
COUNT(IF(EVENT_CD='15',1,NULL)) AS IBB,
COUNT(IF(EVENT_CD='18',1,NULL)) AS RBOE
FROM EVENTS, id
WHERE EVENTS.BAT_ID=id.id AND (PIT_HAND_CD="L" OR PIT_HAND_CD="R")
GROUP BY Season, BAT_ID, PIT_HAND_CD);


Add more splits:
This should add some empty new columns to the new Splits table.
ALTER TABLE Hitter_Splits
ADD Batter_Hand varchar(1) after season;
ALTER TABLE Hitter_Splits
ADD BA FLOAT(4,3) AFTER K;
ALTER TABLE Hitter_Splits
ADD OBP FLOAT(4,3) AFTER BA;
ALTER TABLE Hitter_Splits
ADD SLG FLOAT(4,3) AFTER OBP;
ALTER TABLE Hitter_Splits
ADD OPS FLOAT(4,3) AFTER SLG;
ALTER TABLE Hitter_Splits
ADD BABIP FLOAT(4,3) AFTER RBOE;

Fill Calculated Columns:
This fills in the columns you created above.
UPDATE Hitter_SPLITS
SET BA=H/AB;
UPDATE Hitter_SPLITS
SET OBP=((H+BB+HBP)/(AB+BB+HBP+SF));
UPDATE Hitter_SPLITS
SET SLG=((1B+2*2B+3*3B+4*HR)/AB);
UPDATE Hitter_SPLITS
SET OPS=OBP+SLG;
UPDATE Hitter_SPLITS
SET BABIP=((H-HR)/(AB-K-HR+SF));

2 comments: