作者alanfengjkl (Alan)
看板R_Language
标题Re: [问题] 读取XML档
时间Thu May 24 10:08:50 2018
※ 引述《alanfengjkl (Alan)》之铭言:
: [问题类型]:
: 程式谘询
: [软体熟悉度]:
: 使用者
: [问题叙述]:
: 在使用R读取XML档案时
: 出现下列ERROR
: 想知道如何排除问题?
: 另外由於资料是BIG5
: 读进来後若有乱码该如何处理?
: [程式范例]:
: library(XML)
: xml.doc <- xmlParse(file,encoding = "BIG5")
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: xmlParseEntityRef: no name
: input conversion failed due to input error, bytes 0xF9 0xDC 0xBC 0x7A
: input conversion failed due to input error, bytes 0xF9 0xDC 0xBC 0x7A
: encoder errorPremature end of data in tag ?冽饿??line 168391
: Premature end of data in tag 掸喳腙?怿敦 line 168384
: Premature end of data in tag INVOICE line 2
: Error: 1: xmlParseEntityRef: no name
: 2: xmlParseEntityRef: no name
: 3: xmlParseEntityRef: no name
: 4: xmlParseEntityRef: no name
: 5: xmlParseEntityRef: no name
: 6: xmlParseEntityRef: no name
: 7: xmlParseEntityRef: no name
: 8: xmlParseEntityRef: no name
: 9: xmlParseEntityRef: no name
: 10: xmlParseEntityRef: no name
: 11: input conversion failed due to input error, bytes 0xF9 0xDC 0xBC 0x7A
: 12: input conversion failed due to input error, bytes 0xF9 0xDC 0xBC 0x7A
: 13: encoder error14: Premature end of data in tag ?冽饿??line 168391
: 15: Premature end of data in tag 掸喳腙?怿敦 line 168384
: 16: Premature end of data in tag INVOICE line 2
: [环境叙述]:
: 最新版R
:
可是我用xml2也有error如下
> rm(list=ls())
> file="C0001BILL9_EBill_20180505_20180505114335.xml"
> setwd("D:/")
> library(xml2)
> d=read_xml(file,encoding="BIG5")
Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options =
options) :
xmlParseEntityRef: no name [68]
> d=read_html(file,encoding="BIG5")
Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options =
options) :
input conversion failed due to input error, bytes 0xF9 0xDC 0xBC 0x7A [600
3]
--
posted from bbs reader hybrid on my samsung SM-G965F
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 110.50.148.91
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1527127732.A.CCC.html
1F:推 Wush978: 可是你没有给让网友能重现错误的方式,所以我也帮不了 05/24 11:33
2F:→ obarisk: 先把Windows换掉,大概就解决了 05/25 08:55
3F:→ yulunchu: encoding="UTF-8" 05/31 05:57
我的XML内码是BIG5,不是UTF-8,我後来用暴力破解法,直接用readLines读入文字,自己写回圈解析
※ 编辑: alanfengjkl (220.130.135.230), 05/31/2018 08:22:15