作者eyelace (你的思绪在我之中)
看板FBaseball
标题[闲聊] When Samples Become Reliable
时间Sun May 24 13:03:40 2009
http://www.fangraphs.com/blogs/index.php/when-samples-become-reliable
by Eric Seidman - May 22, 2009 · Filed under Research
One of the most difficult tasks a responsible baseball analyst must take on
involves avoiding small samples of data to make definitive claims about a
player. If Victor Martinez goes 4-10, it does not automatically make him a
.400 hitter. We have enough information about Martinez from previous seasons
to know that his actual abilities fall well short of that mark. Not
everything, however, should merit a house call from the small sample size
police because there are some stats that stabilize more quickly than others.
Additionally, a lot of the small sample size criticisms stem from the actual
usage of the information, not the information itself. If Pat Burrell
struggled mightily after the all star break last season and started this
season with similarly poor numbers, we can infer that his skills may be
eroding. Isolating these two stretches can prove to be inaccurate, but taking
them together offers some valuable information.
The question asked most often with regards to small sample sizes is
essentially - When are the samples not small anymore? As in, at what juncture
does the data become meaningful? Martinez at 4-10 is meaningless. Martinez at
66-165, like he is right now, tells us much, much more, but still is not
enough playing time. What are the benchmarks for plate appearances where
certain statistics become reliable? Before giving the actual numbers, let me
point out that the results are from this article from a friend of mine, Pizza
Cutter over at Statistically Speaking. Warning: that article is very
research-heavy so you must put on your 3D-Nerd Goggles before journeying into
the land of reliability and validity. Also, Cutter mentioned that he would be
able to answer any methodological questions here, so ask away. Half of my
statistics background is from school or independent study and the other half
is from Pizza Cutter, so do not be shy.
Cutter basically searched for the point at which split-half reliability tests
produced a 0.70 correlation or higher. A split-half reliability test involves
finding the correlations between partitions of one dataset. For instance,
taking all of Burrell’s evenly numbered plate appearances and separating
them from the odd ones, and then running correlations on both. When both are
very similar, the data becomes more reliable. Though a 1.0 correlation
indicated a perfect relationship, 0.70 is usually the ultimate benchmark in
statistical studies, especially relative to baseball, when DIPS theory was
derived from correlations of lesser strength. Without further delay, here are
the results of his article as far as when certain statistics stabilize for
individual hitters:
50 PA: Swing %
100 PA: Contact Rate
150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA
200 PA: Walk Rate, Groundball Rate, GB/FB
250 PA: Flyball Rate
300 PA: Home Run Rate, HR/FB
500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate
550 PA: ISO
Cutter went to 650 PA as his max, meaning that the exclusion of statistics
like BA, BABIP, WPA, and context-neutral WPA indicates that they did not
stabilize. So, here you go, I hope this assuages certain small sample
misconceptions and provides some insight into when we can discuss a certain
metric from a skills standpoint. There are certain red flags with an analysis
like this, primarily that playing time is not assigned randomly and by using
650 PA, a chance exists that a selection bias may shine through in that the
players given this many plate appearances are the more consistent players.
Cutter avoids the brunt of this by comparing players to themselves. Even so,
these benchmarks are tremendous estimates at the very least.
==简评==
甚麽PA看甚麽数据,有0.7的正相关但还是会有bias喔 >.^
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.114.23.222