[问题] 使用R辅助基础机率论教学

时间Sat Dec 14 22:12:27 2013

[问题类型]: 程式谘询(我想用R 做某件事情，但是我不知道要怎麽用R 写出来) [软体熟悉度]: 入门(写过其他程式，只是对语法不熟悉) [问题叙述]: 弱大数法则我还勉强能用丢铜板来"实验"给学生看... 但是CLT要怎麽做呢= = 所以想用R来做电脑辅助教学... 而且我想示范： 1.不是n>=30就叫做大样本。所以除了常态分配和长得比较对称的均匀分配以外，其他都选取很极端的机率分布，结果n=30之下作Jarque Bera test都拒绝虚无假设。 2.期望值和变异数存在且有限才可适用CLT。所以我故意选了Cauchy分配来示范。 (抽自柯西分配母体的样本其平均仍然是柯西分配) 不过，我最後用ks.test的结果怪怪的，是不是参数没设定好？(难道没办法让他自己抓常态分配的参数吗= =) (用SAS和SPSS都不用调参数....是不是用R作KS检定一定要指定参数?) [程式范例]: #请记得先安装TSA package library(TSA) #X~N(0,1) set.seed(1) par(mfrow=c(1,2)) popnorm<-rnorm(n=100000,mean=0, sd=1) mean(popnorm) var(popnorm) plot(density(popnorm)) meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=30,replace=TRUE)) } mean(meannorm) var(meannorm) plot(density(meannorm)) ks.test(meannorm,"pnorm") jarque.bera.test(meannorm) 结果： > ks.test(meannorm,"pnorm") One-sample Kolmogorov-Smirnov test data: meannorm D = 0.3406, p-value < 2.2e-16 alternative hypothesis: two-sided > jarque.bera.test(meannorm) Jarque Bera Test data: meannorm X-squared = 4.9045, df = 2, p-value = 0.0861 最後改用这个： mean<-mean(meannorm) sd<-sqrt(var(meannorm)) ks.test(meannorm,"pnorm",mean,sd) 结果： One-sample Kolmogorov-Smirnov test data: meannorm D = 0.029, p-value = 0.3692 alternative hypothesis: two-sided 没有违背预期的结果。 [关键字]: 电脑辅助教学, 机率, 中央极限定理附注：以下试验其他机率分布 http://ideone.com/UkhaJO 为节省版面，贴到网路上供参考。 --

※ 发信站: 批踢踢实业坊(ptt.cc) ◆ From: 111.255.26.132

1^F：→ clickhere:忘了 sqrt(30) ? 12/15 08:05

2^F：→ clickhere:再把 CLT 看一遍. 12/15 08:07

3^F：→ andrew43:ks.test()若要检验分配是要指定参数没错。 12/15 11:50

4^F：→ andrew43:至於其它软体不用指定参数，我猜是自动用样本估的。 12/15 11:50

5^F：→ clickhere:no. sqrt(n) 才会给你正确的 CLT. n=30是猜不到的. 12/15 12:02

mean<-mean(meannorm) sd<-sqrt(var(meannorm)) ↑这里的SD=sqrt(样本变异数)=sqrt(母体变异数/30)=母体标准差/sqrt(30) ks.test(meannorm,"pnorm",mean,sd) 结果： One-sample Kolmogorov-Smirnov test data: meannorm D = 0.029, p-value = 0.3692 alternative hypothesis: two-sided 这个部份的sd应该不用再除sqrt(30) 因为样本的变异数已经是母体变异数除以30了。 > mean(popnorm) [1] -0.002244083 > var(popnorm) [1] 1.007059 > mean(meannorm) [1] 0.002210398 > var(meannorm) [1] 0.03560481 其中抽出样本的变异数约为母体变异数/30 > var(popnorm)/30 [1] 0.03356863 所以使用CLT的时候，标准差不用再除以sqrt(30)罗... 主要的问题是，不晓得有没有用KS检定却不需指定参数的方法？类似SAS和SPSS的那种... 虽然用JB检定是满方便的，可是初等统计教科书大多没写这方法... 另外，抽样的时候设定replace=TRUE应该没问题吧？我想说这样比较能确保每个样本都iid於母体分配。 --- 程式又大幅度改写： #X~N(0,1) set.seed(1) par(mfrow=c(1,2)) popnorm<-rnorm(n=100000,mean=0, sd=1) meanpopnorm<-mean(popnorm) varpopnorm<-var(popnorm) print(paste("The mean of the population is",round(meanpopnorm,5))) print(paste("The variance of the population is", round(varpopnorm,5))) plot(density(popnorm)) cltnorm<-function(n){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } print("----------------------------------------------------------") print(paste("The mean of samples is",round(mean(meannorm),5))) print(paste(", and the variance of samples is",round(var(meannorm),5))) print(paste("According to CLT, the theoretical mean is", round(meanpopnorm,5))) print(paste(", and the theoretical variance of samples is",round(varpopnorm/n,5))) print("----------------------------------------------------------") plot(density(meannorm)) jarque.bera.test(meannorm) } 接下来重覆五次试验(n=30)： > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 0.00355" [1] ", and the variance of samples is 0.03627" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 4.4989, df = 2, p-value = 0.1055 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is -0.00778" [1] ", and the variance of samples is 0.03286" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 0.4475, df = 2, p-value = 0.7995 > > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 0.00405" [1] ", and the variance of samples is 0.03348" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 3.2785, df = 2, p-value = 0.1941 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is -0.0048" [1] ", and the variance of samples is 0.03026" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 3.2825, df = 2, p-value = 0.1937 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 5e-05" [1] ", and the variance of samples is 0.03214" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 10.1117, df = 2, p-value = 0.006372 最後一次试验错误的拒绝虚无假设... 我该怎麽解释呢= =" --- 刚这样写： testing<-function(n, m){ set.seed(1) popnorm<-rnorm(n=100000,mean=0, sd=1) error<-array(0,dim=c(m)) for (i in 1:m){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } test<-jarque.bera.test(meannorm) print(test) p<-as.numeric(test$p.value) print(p) error[i]<-ifelse(p<0.05,1,0) print(error[i]) } return(list(matrix=error)) } 使用testing(30,50)测试，可是传回的矩阵很怪... 变成有1000个元素的一维阵列，而且第五笔资料应该要存入1，却反而是0。 ※ 编辑: anovachen 来自: 111.255.24.52 (12/15 23:48)

6^F：→ andrew43:可再抽出没错，不过你有大母体了也没什麽影响。 12/16 08:49

set.seed(1) popnorm<-rnorm(n=100000,mean=0, sd=1) error<-array(0,dim=c(30)) testing<-function(n){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } test<-jarque.bera.test(meannorm) print(test) p<-as.numeric(test$p.value) print(p) return(ifelse(p<0.05,1,0)) } for (i in 1:30){ error[i]<-testing(30) } 这样写就OK了。不晓得上面写法问题出在哪?? >"< ---- 以下是某不愿具名者提供的程式码： n <- 30 ret <- NULL for(i in 1:100){ x <- rnorm(n) x.hat <- mean(x) * sqrt(n) # CLT stat. goes to N(0, 1) in d. ret[i] <- x.hat } ks.test(ret, "pnorm") # Test 100 x.hat's against N(0, 1) 他指出我原本的写法是类似bootstrap，在有限的样本(N=100000)自助重抽样，以模拟从真正的母体里抽样的过程。而他的写法则比较贴近我想表达的。依据他的写法，我们可以如此改写： n <- 100 ret <- NULL for(i in 1:100){ x <- runif(n,min=0,max=1) x.hat <- mean(x) ret[i] <- x.hat } hist(ret) ks.test(ret, "pnorm", mean=1/2, sd = sqrt(1/12/n)) 如此可验证从U(0,1)中抽样，n=100，则X-bar的分布是否会渐近N(1/2, 1/1200)。 ※ 编辑: anovachen 来自: 111.255.13.49 (12/19 00:47)

7^F：推 adgjlsfhk123:http://ccckmit.wikidot.com/st:centraltheorem 12/22 03:35

8^F：→ adgjlsfhk123:希望对你有帮助 12/22 03:36

9^F：→ anovachen:谢谢! 12/29 16:25

	[问题/行为] 猫晚上进房间会不会有憋尿问题
	Re: [闲聊] 选了错误的女孩成为魔法少女 XDDDDDDDDDD
	[正妹] 瑞典一张
	[心得] EMS高领长版毛衣.墨小楼MC1002
	[分享] 丹龙隔热纸GE55+33+22
	[问题] 清洗洗衣机
	[寻物] 窗台下的空间
	[闲聊] 双极の女神1 木魔爵
	[售车] 新竹 1997 march 1297cc 白色四门
	[讨论] 能从照片感受到摄影者心情吗
	[狂贺] 贺贺贺贺贺！岛村卯月！总选举NO.1
	[难过] 羡慕白皮肤的女生
	阅读文章
	[黑特]
	[问题] SBK S1安装於安全帽位置
	[分享] 旧woo100绝版开箱!!
	Re: [无言] 关於小包卫生纸
	[开箱] E5-2683V3 RX480Strix 快睿C1 简单测试
	[心得] 苍の海贼龙地狱执行者16PT
	[售车] 1999年Virage iO 1.8EXi
	[心得] 挑战33 LV10 狮子座pt solo
	[闲聊] 手把手教你不被桶之新手主购教学
	[分享] Civic Type R 量产版官方照无预警流出
	[售车] Golf 4 2.0 银色自排
	[出售] Graco提篮汽座（有底座）2000元诚可议
	[问题] 请问补牙材质掉了还能再补吗?(台中半年内
	[问题] 44th 单曲生写竟然都给重复的啊啊！
	[心得] 华南红卡/icash 核卡
	[问题] 拔牙矫正这样正常吗
	[赠送] 老莫高业初业 102年版
	[情报] 三大行动支付本季掀战火
	[宝宝] 博客来Amos水蜡笔5/1特价五折
	Re: [心得] 新鲜人一些面试分享
	[心得] 苍の海贼龙地狱麒麟25PT
	Re: [闲聊] (君の名は。雷慎入) 君名二创漫画翻译
	Re: [闲聊] OGN中场影片：失踪人口局 (英文字幕)
	[问题] 台湾大哥大4G讯号差
	[出售] [全国]全新千寻侘草LED灯, 水草

WEB批踢踢(PTT)

R_Language 板

[问题] 使用R辅助基础机率论教学

热门看板

赞助商连结