Xpath和CSS选择器语法整合

网络爬虫

浏览数:149

2019-8-25

AD:资源代下载服务

最近项目中需要用Scrapy写一个爬虫,用到了许多xpath和css选择器的语法,
因此在此做一个整合,基本的语法就不赘述了,只整理我遇到的比较复杂的语法,以便日后查询。

xpath语法:

<div class="tabs-panel is-active" id="size_en">
    <div class="item-connection text-center ">
        <a href="javascript:void(0);" class="value size-value"
           data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-full-xl-red-qd0455-26651987/i/"
           data-enabled="false">Full XL</a></div>
    <div class="item-connection text-center active">
        <a href="javascript:void(0);" class="value size-value"
           data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-king-brown-qd0457-31656950/i/"
           data-enabled="true">King</a></div>
    <div class="item-connection text-center ">
        <a href="javascript:void(0);" class="value size-value"
           data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-twin-white-qd0452-32710651/i/"
           data-enabled="false">Twin</a></div>
    <div class="item-connection text-center ">
        <a href="javascript:void(0);" class="value size-value"
           data-url="https://uae.souq.com/ae-en/leggett-platt-home-textiles-cool-shield-mattress-protector-twin-xl-black-qd0453-31237117/i/"
           data-enabled="false">Twin XL</a></div>
</div>

1.选取某属性不包含某内容的节点

 #选取class不包含"active"的节点
//*[@id='size_en']/div[not(contains(@class,"active"))] 

2.选取id包含REVIEWS并且aria-hidden=”false”或没有aria-hidden属性的div节点

//div[contains(@id,"REVIEWS")and (@aria-hidden="false" or not(@aria-hidden))]

css选择器语法:

  1. 获取属性内容
#获取i标签style属性的内容
li>header>div>span>i>i::attr(style)

作者:疏花