作者celestialgod (天)
看板R_Language
标题Re: [问题] 资料整理问题(更改)
时间Wed Sep 13 23:22:05 2017
※ 引述《tony1331 (BLUE)》之铭言:
: [问题类型]:
: 我想用R 做某件事情,但是我不知道要怎麽用R 写出来
: [软体熟悉度]:
: 写过程式,R 是我的第一次
: [问题叙述]:
: http://i.imgur.com/yKt85T3.jpg
: 目前只会单列unique,想把每列都unique~
: 恳请教导,感谢~
: 不好意思手机发文好像怪怪的
: -----
: Sent from JPTT on my Asus ASUS_Z012DA.
不用套件有点麻烦,懒得写XD,下面是最简单的作法:
# 好读板:
https://pastebin.com/67j92ptN
library(data.table)
library(pipeR)
# data generation
numRows <- 3e4
numCols <- 8
s <- matrix(NA_integer_, numRows, numCols)
idx <- sample(numCols, numRows, TRUE)
for (i in 1:numCols)
s[idx == i, 1:i] <- sample(42, i * sum(idx == i), TRUE)
time <- seq(ISOdate(2017, 9, 12) - numRows*3600, ISOdate(2017, 9, 12),
by = "hour")
DT <- data.table(time = head(time, numRows), s)
# time V1 V2 V3 V4 V5 V6 V7 V8
# 1: 2014-04-11 12:00:00 31 26 20 19 7 NA NA NA
# 2: 2014-04-11 13:00:00 4 5 4 4 7 NA NA NA
# 3: 2014-04-11 14:00:00 17 17 32 36 NA NA NA NA
# 4: 2014-04-11 15:00:00 2 23 25 28 41 14 32 10
# 5: 2014-04-11 16:00:00 40 33 25 27 29 NA NA NA
# ---
# 29996: 2017-09-12 07:00:00 36 13 7 38 21 34 36 NA
# 29997: 2017-09-12 08:00:00 18 NA NA NA NA NA NA NA
# 29998: 2017-09-12 09:00:00 8 9 35 32 12 18 3 NA
# 29999: 2017-09-12 10:00:00 41 10 22 32 28 39 17 31
# 30000: 2017-09-12 11:00:00 21 21 NA NA NA NA NA NA
# 主程式
# 第一个问题
st <- proc.time()
melt(DT, 1, 2:9) %>>% na.omit("value") %>>%
`[`(j = .(value = unique(value), idx = 1:uniqueN(value)), by = .(time)) %>>%
dcast(time ~ idx, value.var = "value")
proc.time() - st
# user system elapsed
# 0.68 0.00 0.71
# time 1 2 3 4 5 6 7 8
# 1: 2014-04-11 12:00:00 31 26 20 19 7 NA NA NA
# 2: 2014-04-11 13:00:00 4 5 7 NA NA NA NA NA
# 3: 2014-04-11 14:00:00 17 32 36 NA NA NA NA NA
# 4: 2014-04-11 15:00:00 2 23 25 28 41 14 32 10
# 5: 2014-04-11 16:00:00 40 33 25 27 29 NA NA NA
# ---
# 29996: 2017-09-12 07:00:00 36 13 7 38 21 34 NA NA
# 29997: 2017-09-12 08:00:00 18 NA NA NA NA NA NA NA
# 29998: 2017-09-12 09:00:00 8 9 35 32 12 18 3 NA
# 29999: 2017-09-12 10:00:00 41 10 22 32 28 39 17 31
# 30000: 2017-09-12 11:00:00 21 NA NA NA NA NA NA NA
# 第二个问题
st <- proc.time()
melt(DT, 1, 2:9) %>>% na.omit("value") %>>%
`[`(j = .(cnt = .N), by = .(time, value))
proc.time() - st
# user system elapsed
# 0.14 0.03 0.17
# time value cnt
# 1: 2014-04-11 12:00:00 31 1
# 2: 2014-04-11 13:00:00 4 3
# 3: 2014-04-11 14:00:00 17 2
# 4: 2014-04-11 15:00:00 2 1
# 5: 2014-04-11 16:00:00 40 1
# ---
# 128084: 2017-09-11 03:00:00 29 1
# 128085: 2017-09-11 16:00:00 6 1
# 128086: 2017-09-11 22:00:00 24 1
# 128087: 2017-09-12 05:00:00 4 1
# 128088: 2017-09-12 10:00:00 31 1
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 118.170.62.243
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1505316129.A.087.html
1F:推 tony1331: 谢谢,再请问如果时间是rowname的话,那会比较简单吗? 09/14 11:35
2F:→ clansoda: 没有比较简单 时间在rowname还是要拉出来才能melt 09/14 11:51
3F:→ clansoda: datatable的形式也没有rowname这个概念 09/14 11:51
rownames这概念,我一直不觉得有什麽用... 我基本上也没再用rownames
而且其实rownames in somehow也是一种变数,没必要放到rownames去
4F:推 tony1331: 谢谢 我再研究~ 09/14 19:38
5F:→ clansoda: 你喜欢用rowname也可以用Dataframe来做 09/14 19:41
6F:→ clansoda: 配上dplyr + tidyr应该效果一样 时间应该不会差太多 09/14 19:42
你用melt跟用dplyr+tidyr效果是一样的,时间的话,我相信data.table还是会快一点
8F:→ tony1331: .jpg 09/14 21:14
你少打了一个点....
※ 编辑: celestialgod (118.170.62.243), 09/14/2017 21:28:35
9F:→ tony1331: 感谢c大教导~~ 09/14 22:05
10F:→ clansoda: 我也相信比较快,但是他喜欢rowname 但是rowname也要 09/14 22:16
11F:→ clansoda: 变成column才能melt所以本质不变 09/14 22:16