作者anovachen ( )
看板R_Language
标题[问题] 使用R辅助基础机率论教学
时间Sat Dec 14 22:12:27 2013
[问题类型]:
程式谘询(我想用R 做某件事情,但是我不知道要怎麽用R 写出来)
[软体熟悉度]:
入门(写过其他程式,只是对语法不熟悉)
[问题叙述]:
弱大数法则我还勉强能用丢铜板来"实验"给学生看...
但是CLT要怎麽做呢= =
所以想用R来做电脑辅助教学...
而且我想示范:
1.不是n>=30就叫做大样本。
所以除了常态分配和长得比较对称的均匀分配以外,
其他都选取很极端的机率分布,
结果n=30之下作Jarque Bera test都拒绝虚无假设。
2.期望值和变异数存在且有限才可适用CLT。
所以我故意选了Cauchy分配来示范。
(抽自柯西分配母体的样本其平均仍然是柯西分配)
不过,我最後用ks.test的结果怪怪的,
是不是参数没设定好?(难道没办法让他自己抓常态分配的参数吗= =)
(用SAS和SPSS都不用调参数....是不是用R作KS检定一定要指定参数?)
[程式范例]:
#请记得先安装TSA package
library(TSA)
#X~N(0,1)
set.seed(1)
par(mfrow=c(1,2))
popnorm<-rnorm(n=100000,mean=0, sd=1)
mean(popnorm)
var(popnorm)
plot(density(popnorm))
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=30,replace=TRUE))
}
mean(meannorm)
var(meannorm)
plot(density(meannorm))
ks.test(meannorm,"pnorm")
jarque.bera.test(meannorm)
结果:
> ks.test(meannorm,"pnorm")
One-sample Kolmogorov-Smirnov test
data: meannorm
D = 0.3406, p-value < 2.2e-16
alternative hypothesis: two-sided
> jarque.bera.test(meannorm)
Jarque Bera Test
data: meannorm
X-squared = 4.9045, df = 2, p-value = 0.0861
最後改用这个:
mean<-mean(meannorm)
sd<-sqrt(var(meannorm))
ks.test(meannorm,"pnorm",mean,sd)
结果:
One-sample Kolmogorov-Smirnov test
data: meannorm
D = 0.029, p-value = 0.3692
alternative hypothesis: two-sided
没有违背预期的结果。
[关键字]:
电脑辅助教学, 机率, 中央极限定理
附注:以下试验其他机率分布
http://ideone.com/UkhaJO
为节省版面,贴到网路上供参考。
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 111.255.26.132
1F:→ clickhere:忘了 sqrt(30) ? 12/15 08:05
2F:→ clickhere:再把 CLT 看一遍. 12/15 08:07
3F:→ andrew43:ks.test()若要检验分配是要指定参数没错。 12/15 11:50
4F:→ andrew43:至於其它软体不用指定参数,我猜是自动用样本估的。 12/15 11:50
5F:→ clickhere:no. sqrt(n) 才会给你正确的 CLT. n=30是猜不到的. 12/15 12:02
mean<-mean(meannorm)
sd<-sqrt(var(meannorm))
↑这里的SD=sqrt(样本变异数)=sqrt(母体变异数/30)=母体标准差/sqrt(30)
ks.test(meannorm,"pnorm",mean,sd)
结果:
One-sample Kolmogorov-Smirnov test
data: meannorm
D = 0.029, p-value = 0.3692
alternative hypothesis: two-sided
这个部份的sd应该不用再除sqrt(30)
因为样本的变异数已经是母体变异数除以30了。
> mean(popnorm)
[1] -0.002244083
> var(popnorm)
[1] 1.007059
> mean(meannorm)
[1] 0.002210398
> var(meannorm)
[1] 0.03560481
其中抽出样本的变异数约为母体变异数/30
> var(popnorm)/30
[1] 0.03356863
所以使用CLT的时候,标准差不用再除以sqrt(30)罗...
主要的问题是,不晓得有没有用KS检定却不需指定参数的方法?
类似SAS和SPSS的那种...
虽然用JB检定是满方便的,可是初等统计教科书大多没写这方法...
另外,抽样的时候设定replace=TRUE应该没问题吧?
我想说这样比较能确保每个样本都iid於母体分配。
---
程式又大幅度改写:
#X~N(0,1)
set.seed(1)
par(mfrow=c(1,2))
popnorm<-rnorm(n=100000,mean=0, sd=1)
meanpopnorm<-mean(popnorm)
varpopnorm<-var(popnorm)
print(paste("The mean of the population is",round(meanpopnorm,5)))
print(paste("The variance of the population is", round(varpopnorm,5)))
plot(density(popnorm))
cltnorm<-function(n){
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE))
}
print("----------------------------------------------------------")
print(paste("The mean of samples is",round(mean(meannorm),5)))
print(paste(", and the variance of samples is",round(var(meannorm),5)))
print(paste("According to CLT, the theoretical mean is",
round(meanpopnorm,5)))
print(paste(", and the theoretical variance of samples
is",round(varpopnorm/n,5)))
print("----------------------------------------------------------")
plot(density(meannorm))
jarque.bera.test(meannorm)
}
接下来重覆五次试验(n=30):
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is 0.00355"
[1] ", and the variance of samples is 0.03627"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 4.4989, df = 2, p-value = 0.1055
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is -0.00778"
[1] ", and the variance of samples is 0.03286"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 0.4475, df = 2, p-value = 0.7995
>
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is 0.00405"
[1] ", and the variance of samples is 0.03348"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 3.2785, df = 2, p-value = 0.1941
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is -0.0048"
[1] ", and the variance of samples is 0.03026"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 3.2825, df = 2, p-value = 0.1937
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is 5e-05"
[1] ", and the variance of samples is 0.03214"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 10.1117, df = 2, p-value = 0.006372
最後一次试验错误的拒绝虚无假设...
我该怎麽解释呢= ="
---
刚这样写:
testing<-function(n, m){
set.seed(1)
popnorm<-rnorm(n=100000,mean=0, sd=1)
error<-array(0,dim=c(m))
for (i in 1:m){
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE))
}
test<-jarque.bera.test(meannorm)
print(test)
p<-as.numeric(test$p.value)
print(p)
error[i]<-ifelse(p<0.05,1,0)
print(error[i])
}
return(list(matrix=error))
}
使用testing(30,50)测试,
可是传回的矩阵很怪...
变成有1000个元素的一维阵列,
而且第五笔资料应该要存入1,却反而是0。
※ 编辑: anovachen 来自: 111.255.24.52 (12/15 23:48)
6F:→ andrew43:可再抽出没错,不过你有大母体了也没什麽影响。 12/16 08:49
set.seed(1)
popnorm<-rnorm(n=100000,mean=0, sd=1)
error<-array(0,dim=c(30))
testing<-function(n){
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE))
}
test<-jarque.bera.test(meannorm)
print(test)
p<-as.numeric(test$p.value)
print(p)
return(ifelse(p<0.05,1,0))
}
for (i in 1:30){
error[i]<-testing(30)
}
这样写就OK了。
不晓得上面写法问题出在哪?? >"<
----
以下是某不愿具名者提供的程式码:
n <- 30
ret <- NULL
for(i in 1:100){
x <- rnorm(n)
x.hat <- mean(x) * sqrt(n) # CLT stat. goes to N(0, 1) in d.
ret[i] <- x.hat
}
ks.test(ret, "pnorm")
# Test 100 x.hat's against N(0, 1)
他指出我原本的写法是类似bootstrap,在有限的样本(N=100000)自助重抽样,
以模拟从真正的母体里抽样的过程。
而他的写法则比较贴近我想表达的。
依据他的写法,我们可以如此改写:
n <- 100
ret <- NULL
for(i in 1:100){
x <- runif(n,min=0,max=1)
x.hat <- mean(x)
ret[i] <- x.hat
}
hist(ret)
ks.test(ret, "pnorm", mean=1/2, sd = sqrt(1/12/n))
如此可验证从U(0,1)中抽样,n=100,则X-bar的分布
是否会渐近N(1/2, 1/1200)。
※ 编辑: anovachen 来自: 111.255.13.49 (12/19 00:47)
8F:→ adgjlsfhk123:希望对你有帮助 12/22 03:36
9F:→ anovachen:谢谢! 12/29 16:25