作者ardodo (米虫)
看板R_Language
标题[问题] 文字探勘无法将中文的\n移除
时间Thu Aug 6 15:25:59 2015
[问题类型]:
程式谘询(我想用R 做某件事情,但是我不知道要怎麽用R 写出来)
[软体熟悉度]:
使用者(已经有用R 做过不少作品)
[问题叙述]:
text mining无法将\n移除
我参考了陈嘉葳的文章,照着他的作法作,但是我无法将中文的\n移除,利用
设定了stopwords後依然无法将\n给断掉,请问该如何解决呢?
范例档案在此↓
https://drive.google.com/open?id=0Bz0IlJks1nIiUVktajFlc29PODA
[程式范例]:
http://pastebin.com/gaeXZX6w
[sessionInfo]
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 LC_CTYPE=Chinese
(Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] cluster_2.0.1 fpc_2.1-9 wordcloud_2.5
RColorBrewer_1.1-2 Rwordseg_0.2-1
[6] rJava_0.9-6 tmcn_0.1-4 tm_0.6-2
NLP_0.1-8
loaded via a namespace (and not attached):
[1] flexmix_2.3-13 Rcpp_0.11.6 MASS_7.3-40 mclust_5.0.2
lattice_0.20-31
[6] prabclus_2.2-6 tools_3.2.1 nnet_7.3-9 parallel_3.2.1
grid_3.2.1
[11] modeltools_0.2-21 class_7.3-12 trimcluster_0.1-2 kernlab_0.9-20
robustbase_0.92-5
[16] slam_0.1-32 DEoptimR_1.0-3 diptest_0.75-7 stats4_3.2.1
mvtnorm_1.0-3
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 61.222.207.246
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1438845962.A.DFF.html
1F:→ psinqoo: R的版本? 08/06 16:00
附上我的sessionInfo
※ 编辑: ardodo (61.222.207.246), 08/06/2015 16:48:45
2F:推 psinqoo: 在3.0X前不会出现 08/07 08:22
3F:→ ardodo: 是喔?稍後我用旧版本试试看 08/07 11:54