作者starlichin (白星羽)
看板Python
标题Re: [问题] 初学网路爬虫问题
时间Sun Nov 4 20:04:27 2018
XML格式的网页中(网页网址是
http://py4e-data.dr-chuck.net/comments_42.xml),
想爬出里面count这个tag下面的attribute。
网页的原始码大概是长这样:
<comments>
<comment>
<name>Romina</name>
<count>97</count>
</comment>
<comment>
<name>Laurie</name>
<count>97</count>
</comment>
<comment>
<name>Bayli</name>
<count>90</count>
</comment>
<comment>
<name>Siyona</name>
<count>90</count>
</comment>
<comment>
<name>Taisha</name>
<count>88</count>
</comment>
我写的部分如下,但抓不到Attribute (显示为none),可以请教为什麽吗?
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = '
http://py4e-data.dr-chuck.net/comments_42.xml'
html = urllib.request.urlopen(url, context=ctx).read().decode('utf-8')
tree = ET.fromstring(html)
counts = tree.findall('.//count')
print('counts:', len(counts))
for item in counts:
print('Attribute:', item.get("count"))
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 111.250.154.48
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/Python/M.1541333072.A.3E8.html
1F:→ InfinityGate: 因为它就没有attribute 11/04 23:15
2F:→ InfinityGate: 你如果要那个数字那是它的text 11/04 23:17