小言_互联网的博客

用xpath出现Element 一堆字符怎么办? python

592人阅读  评论(0)
print()打印之后出现这样的字符
[<Element p at 0x10263c300>, <Element p at 0x101562940>, <Element p at 0x1014d2fc0>, <Element p at 0x102669e40>, <Element p at 0x102669e80>, <Element p at 0x1025abc40>, <Element p at 0x101d83a80>, <Element p at 0x102684580>, <Element p at 0x1026845c0>, <Element p at 0x101d86fc0>, <Element p at 0x102684540>, <Element p at 0x102684640>, <Element p at 0x102684680>, <Element p at 0x1026846c0>, <Element p at 0x102684700>, <Element p at 0x102684780>, <Element p at 0x1026847c0>, <Element p at 0x102684800>, <Element p at 0x102684840>, <Element p at 0x102684880>, <Element p at 0x1026848c0>, <Element p at 0x102684600>, <Element p at 0x102684900>, <Element p at 0x102684940>, <Element p at 0x102684980>, <Element p at 0x1026849c0>, <Element p at 0x102684a00>, <Element p at 0x102684a40>, <Element p at 0x102684a80>, <Element p at 0x102684ac0>, <Element p at 0x102684b00>, <Element p at 0x102684740>, <Element p at 0x102684b80>, <Element p at 0x102684bc0>]

晚上遇到用xpath清洗数据时候,一直出现这样的数据,看着好像没解码,但是加上.text和decode()都不行

# 解析得到的信息
resq = requests.get(url, headers=headers).text

html = etree.HTML(resq)

result = html.xpath('//div//p')

print(result)

最后求救别人才得到解决方案:

xpath('//div//p'后边要加上‘/text()

改成这样就行了:

# 解析得到的信息
resq = requests.get(url, headers=headers).text

html = etree.HTML(resq)

result = html.xpath('//div//p/text()')

print(result)

转载:https://blog.csdn.net/dsfgdgsdf/article/details/104544835
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场