作者dragon0139 (豆子)
看板Statistics
标题[问题] 如何决定最佳集群数Cluster
时间Fri Jun 25 12:36:19 2021
关於群集分析,我需要根据6个变项分数将手上的资料分作数个群集
目前先以SPSS跑二阶段
已用Ward’和Euclidean distance跑出树状图,可以看出分成4~5个群集较适宜
但在跑k-means之前,我要怎麽决定是分成4个或5个最佳?
(参考文献的资料是分成4~9个,最後觉得6个最佳,但我不懂的是6是怎麽出来的!)
参考文献的做法是跑Cohen’s kappa
是指初始中心点和最终中心点的2种分群方式去跑吗?(还是我对文献理解错误?)
觉得很疑惑,为什麽可以这样跑?
一般都是该如何决定最佳集群数呢?
补充--
文献是这样描述的:
This procedure begins by randomly assigning the sample into two groups.
The cluster centers of each group from the first step are used as
initial cluster centers for a series of k-means analyses that
assign participants to clusters ranging from four to nine.
Then, another set of k-means analyses are computed for each group,
but in this case, the cluster centers from the opposite group are used to
assign participants to the clusters.
The two sets of k-means yield two sets of cluster assignmentsper group.
The sets within a group are then compared via Cohen’s kappa (Cohen, 1960)
to determine the reliability of cluster assignment,
or in other words, the degree to which participants in each group
are assigned to the same cluster given different initial cluster centers.
感谢指点!
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 118.232.16.73 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/Statistics/M.1624595784.A.B2D.html
1F:→ andrew43: 没有绝对的答案。光是指标就很多种。 06/25 13:06
2F:→ andrew43: 至於这篇文章的方法应该是把样本随机分二组分别做kmeans 06/25 13:12
3F:→ andrew43: 再看哪种群数在二个kmeans的结果较一致。 06/25 13:12
4F:→ andrew43: 同群数的kmeans刻意让先做出来的中心强制成为後做的中心 06/25 13:16
5F:→ andrew43: 「先做」和「後做」分别对应随机分成的二群样本。 06/25 13:18
6F:→ dragon0139: 好的,谢谢你的帮忙! 06/25 17:39