作者abc12812 (Sudba tseloveka)
看板MLB
標題[翻譯] 球員價值評估:緒論
時間Sun Oct 5 15:02:52 2008
http://tinyurl.com/3h6ou9
This is the first part of a multi-part series on how to estimate player value.
I've been doing an awful lot of reading, thinking, and discussing these issues
over the past several weeks, which is part of the reason that it's been
relatively quiet around here. Because writing things out is the best way that
I know to master a complicated topic like this, my hope is that this series
will help me crystallize my thinking on player valuation and get up to speed
on the most significant research to date. It will also serve as a nice set of
papers to which I can refer to justify my methods moving forward...and who
knows, maybe it'll be useful to others who are working through similar issues
as well.
To be clear, little of the big ideas that follow are based on my own work,
though I may supplement them with a small study here and there. Because this
is a supposed to be Reds blog, I will often use the Reds in case studies. In
general, though, you can think of this as a popular science review article ...
or maybe a college term paper, given that I'm still new to much of this info.
:)
以上是廢話,可以跳過
General Principles of Player Valuation
Wins vs. Runs
What do we mean by value? The answer can be fairly nuanced, as evidenced by
essays like this or this. I'm going to be a bit generic about it, however: I
want to know how much players did to help their team win.
Based on that definition, the ultimate goal would be to have statistics that
quantify player value in terms of wins. There have been several efforts on that
front, including Win Shares, WARP, and WPA. At this point, however, I'm not
satisfied with how any of those stats handle fielding (among other things),
so I'm not ready to make the leap to those stats.
The alternative, then, is to use stats that give their units in runs, for
which we have many "good" stats that can quantify hitting, pitching, and
fielding. How much are we losing by going with runs instead of wins? To get
some handle on this, I ran a quick regression of all teams from 1996-2004 with
data pulled from the Lahman database and looked at one variable models that
predict wins:
Predictor of Wins R-Square MSE
Run Differential 0.90 14.78
Runs Allowed 0.43 86.32
Runs Scored 0.35 97.92
R-Square, in this case, indicates the proportion of variation in wins explained
by the different predictors. Therefore, this quick 'n dirty analysis indicates
that we can explain 90% of variation in the number of team wins by just knowing
a team's run differential (the difference between a team's runs scored and its
runs allowed). The remaining 10% is presumably due to the timing of when those
runs are scored, or variation in run environments (e.g. a run in Coors' field
is worth less in terms of wins than a run in PETCO Park, simply because more
runs are scored in Coors' than in PETCO, so each one contributes less to wins).
Research to date indicates that most, though not all, of timing-based events
tend to be associated with events that involve very little unique player
skill -- clutch hitting and pitching, for example, have very low repeatability
in and of themselves, meaning that clutch performances are best predicted by a
player's overall stats. Others "timing" events, like those having to do with
baserunning (SB's and CS's happen more often in close games than in blowouts),
tend to result in relatively few net runs per year. Finally, we can make
adjustments for variation in the run environment of games via park factors
and other techniques. Therefore, I'm ok with using runs instead of wins, at
least for the time being, because of the gains in precision that we get from
using the available runs-based statistics.
什麼是價值?答案可以是有些微妙的,像這兩篇文章所試圖去定義的:
(
http://tinyurl.com/4fy3r7 http://tinyurl.com/4a5kwz)。我則是用較一般化的角度
去了解它:我想知道這個球員為球隊帶來了多少勝利。
從這個定義出發,那最終的目標是要去找到可以直接把球員的價值用勝利表示出來的數據
。在這方面我們的努力已經有一些成果,像是WS,WARP,WPA。然而,我對這些數據處理
防守的方式不是很滿意,所以在這邊我暫時不會討論這些數據。
從另一個角度來看(價值),則是把它用得分的方式呈現。在這方面我們已經有許多完善
的現成數據可以來衡量打擊、投球和防守。問題是,分數和勝利之間有多少偏差?我在這
裡跑了一下從1996-2004年間的數據:
Predictor of Wins R-Square MSE
Run Differential 0.90 14.78
Runs Allowed 0.43 86.32
Runs Scored 0.35 97.92
這個粗糙的分析顯示得失分差和勝利間的相關係數高達0.9。至於剩下的0.1,我想可能的
原因是因為得分的時機或是環境上的不同所致。(同樣的一分在Coors' field或是PETCO
其價值有所不同,因為Coors'的環境比較容易得分)
截至目前為止的研究顯示,關鍵能力幾乎不(但不完全)屬於球員的特殊能力。以關鍵投
球和打擊來說,數據上顯示這些"能力"只有很低的可重複性,表示關鍵表現基本上就是球
員本身能力的展現。其他的關鍵能力,像是跑壘技巧,每年只能產生極少得分的差異。
至於環境變因,我們則可用park factor和其他方式來調整。因此,我認為以分數來呈現
球員的價值是個可行的方法。
Offense vs. Defensive Contributions
You'll note that in the table above, runs allowed alone predicts wins better
than runs scored alone. This is interesting: it indicates that winning teams
are slightly more likely to have good run prevention than good run scoring.
This could mean two things: a) runs prevented are more important than runs
scored, or b) it's "easier" to build an offense-oriented team than a
defense-oriented team.
One way to get at this is question to use a two variable model that includes
both runs scored and runs allowed by teams, instead of just run differential
alone. Doing so on this same dataset results in a model that predicts wins
just as well as the run differential model (model R2 = 0.90), and assigns
coefficients that tell you roughly how many wins you get from a run scored vs.
run earned. It turns out that these coefficients are are virtually identical:
+0.099 wins per run scored, and -0.101 wins per run allowed (model R2 = 0.90),
indicating that the reason for the result in the table is mostly likely
attributable to the "easier to build good offensive team" hypothesis.
Furthermore, this means that preventing a run from scoring on defense is
worth just as much as scoring a run on offense.
What does this mean for how we evaluate players? Well, clearly, we need to
consider both aspects of a players' performance: offense and defense. For
position players, this means that we need to know both how many runs they
generated on offense, as well as how many runs they saved on defense, both
relative to some baseline. If you only consider offense--and let's face it,
that's what just about everyone does...at best, defense is used as a
tiebreaker--you're likely to severely overvalue players that are offensive
standouts but defensive disasters.
With pitchers, at least in the National League, we probably should consider
offense and defense as well. However, the offensive contributions of pitchers
these days are generally so meager, and involve such a small number of plate
appearances, that I tend to just ignore them. However, I recognize that for
pitchers like Micah Owings in '07, this might miss a substantial amount of
value. Consider it something to look at in the future.
In future articles, we'll go over more specifically how to go about evaluating
players.
可能你已經注意到,失分比得分更能精確預測勝利,這個現象很有意思。這表示贏球隊伍
似乎稍為比較能阻止失分而非稍為比較能得分。可能原因有二:1.防守比攻擊更重要。
2.建立一支打擊強的隊伍比建立一支防守強的隊伍容易。
解決的方法是同時考慮得分和失分兩個變數,而非只考慮得失分差一個變數。這麼做的結
果是預測勝率的相關性和只考慮得失分差一樣精確(r=0.9),同時可以知道
每得一分/每失一分 代表多少的勝利。每得一分=0.099勝,每失一分=-0.101勝。這結果
暗示"容易建立一支打擊強隊"的假設較可能是正確的。更進一步來說,這表示攻下一分
和守下一分是相等價值。
這和我們的主題有什麼關係?顯然要衡量一位球員的價值,我們必須同時考慮打擊和防守
。假如你只考慮打擊,那你最多就只能做得和其他人一樣好。防守好壞才是(衡量球員)
關鍵(Beane:科科)。你很有可能過度高估一位打強守弱的選手。
對於投手,至少在國聯,我們也要考慮到他的打擊。通常他們打擊上的表現少到幾乎可以
忽略不計。然而,像Micah Owings這種強投豪打,這麼做可能會過於低估他的價值。或許
未來我會對這部份做更進一步的探討。
在接下來的系列中,我會更詳細闡述評估球員的方法。
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.112.5.3
1F:推 appshjkli:Owings今年的強投好像被拔掉了;不過打擊還在 10/05 15:03
2F:推 dashboy:推 10/05 15:20
3F:推 Poleaxe:推 10/05 15:31
4F:推 eaquson: 10/05 16:26
5F:推 Geel:推薦 10/05 16:30
6F:推 NPLNT:推 10/05 16:34
7F:推 bbbruce:推 10/05 17:03
8F:推 abing75907:推 10/05 17:48
9F:推 Paparra:我看完前兩段後 才看到一行字.."以上是廢話" 10/05 21:52
10F:推 jayin07:XD 10/05 21:54
11F:推 gaga19900329:推 10/05 22:05
12F:推 airmike:版主翻譯辛苦 10/16 19:10