作者j2225653 (水中鱼)
看板R_Language
标题[问题] rvest爬取股票网页资料
时间Sun Dec 1 00:17:02 2019
[问题类型]:
经验谘询(我想用R爬取网页资料,请问大家的经验)
[软体熟悉度]:
新手(没写过程式,R 是我的第一次)
[问题叙述]:
跑范例跑出以下错误讯息
Error in open.connection(x, "rb") : HTTP error 503.
以前爬还没问题,不知道是不是网站有在过滤爬虫,用CHROME浏览正常。
[程式范例]:
library(rvest)
url <- "
https://www.wantgoo.com/stock/astock/agentstat2?stockno=1722"
DATA = read_html(url)
[环境叙述]:
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950 LC_CTYPE=Chinese
(Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] jsonlite_1.6 httr_1.4.1 rvest_0.3.4 xml2_1.2.2
loaded via a namespace (and not attached):
[1] compiler_3.6.1 magrittr_1.5 R6_2.4.0 tools_3.6.1
curl_4.2 Rcpp_1.0.2
[关键字]:
rvest 爬虫
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 1.175.67.86 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1575130624.A.295.html
1F:推 nyannyannyan: 网站服务条款看起来不给爬,应该是阻挡爬虫 12/01 11:06