作者abc12812 (Sudba tseloveka)
看板MLB
标题[翻译] 球员价值评估:绪论
时间Sun Oct 5 15:02:52 2008
http://tinyurl.com/3h6ou9
This is the first part of a multi-part series on how to estimate player value.
I've been doing an awful lot of reading, thinking, and discussing these issues
over the past several weeks, which is part of the reason that it's been
relatively quiet around here. Because writing things out is the best way that
I know to master a complicated topic like this, my hope is that this series
will help me crystallize my thinking on player valuation and get up to speed
on the most significant research to date. It will also serve as a nice set of
papers to which I can refer to justify my methods moving forward...and who
knows, maybe it'll be useful to others who are working through similar issues
as well.
To be clear, little of the big ideas that follow are based on my own work,
though I may supplement them with a small study here and there. Because this
is a supposed to be Reds blog, I will often use the Reds in case studies. In
general, though, you can think of this as a popular science review article ...
or maybe a college term paper, given that I'm still new to much of this info.
:)
以上是废话,可以跳过
General Principles of Player Valuation
Wins vs. Runs
What do we mean by value? The answer can be fairly nuanced, as evidenced by
essays like this or this. I'm going to be a bit generic about it, however: I
want to know how much players did to help their team win.
Based on that definition, the ultimate goal would be to have statistics that
quantify player value in terms of wins. There have been several efforts on that
front, including Win Shares, WARP, and WPA. At this point, however, I'm not
satisfied with how any of those stats handle fielding (among other things),
so I'm not ready to make the leap to those stats.
The alternative, then, is to use stats that give their units in runs, for
which we have many "good" stats that can quantify hitting, pitching, and
fielding. How much are we losing by going with runs instead of wins? To get
some handle on this, I ran a quick regression of all teams from 1996-2004 with
data pulled from the Lahman database and looked at one variable models that
predict wins:
Predictor of Wins R-Square MSE
Run Differential 0.90 14.78
Runs Allowed 0.43 86.32
Runs Scored 0.35 97.92
R-Square, in this case, indicates the proportion of variation in wins explained
by the different predictors. Therefore, this quick 'n dirty analysis indicates
that we can explain 90% of variation in the number of team wins by just knowing
a team's run differential (the difference between a team's runs scored and its
runs allowed). The remaining 10% is presumably due to the timing of when those
runs are scored, or variation in run environments (e.g. a run in Coors' field
is worth less in terms of wins than a run in PETCO Park, simply because more
runs are scored in Coors' than in PETCO, so each one contributes less to wins).
Research to date indicates that most, though not all, of timing-based events
tend to be associated with events that involve very little unique player
skill -- clutch hitting and pitching, for example, have very low repeatability
in and of themselves, meaning that clutch performances are best predicted by a
player's overall stats. Others "timing" events, like those having to do with
baserunning (SB's and CS's happen more often in close games than in blowouts),
tend to result in relatively few net runs per year. Finally, we can make
adjustments for variation in the run environment of games via park factors
and other techniques. Therefore, I'm ok with using runs instead of wins, at
least for the time being, because of the gains in precision that we get from
using the available runs-based statistics.
什麽是价值?答案可以是有些微妙的,像这两篇文章所试图去定义的:
(
http://tinyurl.com/4fy3r7 http://tinyurl.com/4a5kwz)。我则是用较一般化的角度
去了解它:我想知道这个球员为球队带来了多少胜利。
从这个定义出发,那最终的目标是要去找到可以直接把球员的价值用胜利表示出来的数据
。在这方面我们的努力已经有一些成果,像是WS,WARP,WPA。然而,我对这些数据处理
防守的方式不是很满意,所以在这边我暂时不会讨论这些数据。
从另一个角度来看(价值),则是把它用得分的方式呈现。在这方面我们已经有许多完善
的现成数据可以来衡量打击、投球和防守。问题是,分数和胜利之间有多少偏差?我在这
里跑了一下从1996-2004年间的数据:
Predictor of Wins R-Square MSE
Run Differential 0.90 14.78
Runs Allowed 0.43 86.32
Runs Scored 0.35 97.92
这个粗糙的分析显示得失分差和胜利间的相关系数高达0.9。至於剩下的0.1,我想可能的
原因是因为得分的时机或是环境上的不同所致。(同样的一分在Coors' field或是PETCO
其价值有所不同,因为Coors'的环境比较容易得分)
截至目前为止的研究显示,关键能力几乎不(但不完全)属於球员的特殊能力。以关键投
球和打击来说,数据上显示这些"能力"只有很低的可重复性,表示关键表现基本上就是球
员本身能力的展现。其他的关键能力,像是跑垒技巧,每年只能产生极少得分的差异。
至於环境变因,我们则可用park factor和其他方式来调整。因此,我认为以分数来呈现
球员的价值是个可行的方法。
Offense vs. Defensive Contributions
You'll note that in the table above, runs allowed alone predicts wins better
than runs scored alone. This is interesting: it indicates that winning teams
are slightly more likely to have good run prevention than good run scoring.
This could mean two things: a) runs prevented are more important than runs
scored, or b) it's "easier" to build an offense-oriented team than a
defense-oriented team.
One way to get at this is question to use a two variable model that includes
both runs scored and runs allowed by teams, instead of just run differential
alone. Doing so on this same dataset results in a model that predicts wins
just as well as the run differential model (model R2 = 0.90), and assigns
coefficients that tell you roughly how many wins you get from a run scored vs.
run earned. It turns out that these coefficients are are virtually identical:
+0.099 wins per run scored, and -0.101 wins per run allowed (model R2 = 0.90),
indicating that the reason for the result in the table is mostly likely
attributable to the "easier to build good offensive team" hypothesis.
Furthermore, this means that preventing a run from scoring on defense is
worth just as much as scoring a run on offense.
What does this mean for how we evaluate players? Well, clearly, we need to
consider both aspects of a players' performance: offense and defense. For
position players, this means that we need to know both how many runs they
generated on offense, as well as how many runs they saved on defense, both
relative to some baseline. If you only consider offense--and let's face it,
that's what just about everyone does...at best, defense is used as a
tiebreaker--you're likely to severely overvalue players that are offensive
standouts but defensive disasters.
With pitchers, at least in the National League, we probably should consider
offense and defense as well. However, the offensive contributions of pitchers
these days are generally so meager, and involve such a small number of plate
appearances, that I tend to just ignore them. However, I recognize that for
pitchers like Micah Owings in '07, this might miss a substantial amount of
value. Consider it something to look at in the future.
In future articles, we'll go over more specifically how to go about evaluating
players.
可能你已经注意到,失分比得分更能精确预测胜利,这个现象很有意思。这表示赢球队伍
似乎稍为比较能阻止失分而非稍为比较能得分。可能原因有二:1.防守比攻击更重要。
2.建立一支打击强的队伍比建立一支防守强的队伍容易。
解决的方法是同时考虑得分和失分两个变数,而非只考虑得失分差一个变数。这麽做的结
果是预测胜率的相关性和只考虑得失分差一样精确(r=0.9),同时可以知道
每得一分/每失一分 代表多少的胜利。每得一分=0.099胜,每失一分=-0.101胜。这结果
暗示"容易建立一支打击强队"的假设较可能是正确的。更进一步来说,这表示攻下一分
和守下一分是相等价值。
这和我们的主题有什麽关系?显然要衡量一位球员的价值,我们必须同时考虑打击和防守
。假如你只考虑打击,那你最多就只能做得和其他人一样好。防守好坏才是(衡量球员)
关键(Beane:科科)。你很有可能过度高估一位打强守弱的选手。
对於投手,至少在国联,我们也要考虑到他的打击。通常他们打击上的表现少到几乎可以
忽略不计。然而,像Micah Owings这种强投豪打,这麽做可能会过於低估他的价值。或许
未来我会对这部份做更进一步的探讨。
在接下来的系列中,我会更详细阐述评估球员的方法。
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.112.5.3
1F:推 appshjkli:Owings今年的强投好像被拔掉了;不过打击还在 10/05 15:03
2F:推 dashboy:推 10/05 15:20
3F:推 Poleaxe:推 10/05 15:31
4F:推 eaquson: 10/05 16:26
5F:推 Geel:推荐 10/05 16:30
6F:推 NPLNT:推 10/05 16:34
7F:推 bbbruce:推 10/05 17:03
8F:推 abing75907:推 10/05 17:48
9F:推 Paparra:我看完前两段後 才看到一行字.."以上是废话" 10/05 21:52
10F:推 jayin07:XD 10/05 21:54
11F:推 gaga19900329:推 10/05 22:05
12F:推 airmike:版主翻译辛苦 10/16 19:10