首页 > 资讯 > 后端开发 > Python >Python中怎么定位元素

226

分享到

Python中怎么定位元素

2023-06-15 17:06:23 226人浏览八月长安

Python 官方文档：入门教程 => 点击学习

摘要

本篇文章为大家展示了python中怎么定位元素，内容简明扼要并且容易理解，绝对能使你眼前一亮，通过这篇文章的详细介绍希望你能有所收获。确定网站没有设置反爬措施，是否能直接返回待解析的内容：import requests

本篇文章为大家展示了python中怎么定位元素，内容简明扼要并且容易理解，绝对能使你眼前一亮，通过这篇文章的详细介绍希望你能有所收获。

确定网站没有设置反爬措施，是否能直接返回待解析的内容：

import requests  url = 'Http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text print(response)

Python中怎么定位元素

仔细检查后发现需要的数据都在返回内容中，说明不需要特别考虑反爬举措

审查网页元素后可以发现，书目信息都包含在 li 中，从属于 class 为 bang_list clearfix bang_list_mode 的 ul 中

Python中怎么定位元素

进一步审查也可以发现书名在的相应位置，这是多种解析方法的重要基础

Python中怎么定位元素

1. 传统 BeautifulSoup 操作

经典的 BeautifulSoup 方法借助 from bs4 import BeautifulSoup，然后通过 soup = BeautifulSoup(html, "lxml") 将文本转换为特定规范的结构，利用 find 系列方法进行解析，代码如下：

import requests from bs4 import BeautifulSoup  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text  def bs_for_parse(response):     soup = BeautifulSoup(response, "lxml")     li_list = soup.find('ul', class_='bang_list clearfix bang_list_mode').find_all('li') # 锁定ul后获取20个li     for li in li_list:         title = li.find('div', class_='name').find('a')['title'] # 逐个解析获取书名         print(title)  if __name__ == '__main__':     bs_for_parse(response)

Python中怎么定位元素

成功获取了 20 个书名，有些书面显得冗长可以通过正则或者其他字符串方法处理，本文不作详细介绍

2. 基于 BeautifulSoup 的 CSS 选择器

这种方法实际上就是 PyQuery 中 CSS 选择器在其他模块的迁移使用，用法是类似的。关于 CSS 选择器详细语法可以参考：http://www.w3school.com.cn/cssref/css_selectors.asp由于是基于 BeautifulSoup 所以导入的模块以及文本结构转换都是一致的：

import requests from bs4 import BeautifulSoup  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text          def css_for_parse(response):     soup = BeautifulSoup(response, "lxml")      print(soup)  if __name__ == '__main__':     css_for_parse(response)

然后就是通过 soup.select 辅以特定的 CSS 语法获取特定内容，基础依旧是对元素的认真审查分析：

import requests from bs4 import BeautifulSoup from lxml import html  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text          def css_for_parse(response):     soup = BeautifulSoup(response, "lxml")     li_list = soup.select('ul.bang_list.clearfix.bang_list_mode > li')     for li in li_list:         title = li.select('div.name > a')[0]['title']         print(title)  if __name__ == '__main__':     css_for_parse(response)

3. XPath

XPath 即为 XML 路径语言，它是一种用来确定 XML 文档中某部分位置的计算机语言，如果使用 Chrome 浏览器建议安装 XPath Helper 插件，会大大提高写 XPath 的效率。

之前的爬虫文章基本都是基于 XPath，大家相对比较熟悉因此代码直接给出：

import requests from lxml import html  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text  def xpath_for_parse(response):     selector = html.fromstring(response)     books = selector.xpath("//ul[@class='bang_list clearfix bang_list_mode']/li")     for book in books:         title = book.xpath('div[@class="name"]/a/@title')[0]         print(title)  if __name__ == '__main__':     xpath_for_parse(response)

正则表达式如果对 HTML 语言不熟悉，那么之前的几种解析方法都会比较吃力。这里也提供一种万能解析大法：正则表达式，只需要关注文本本身有什么特殊构造文法，即可用特定规则获取相应内容。依赖的模块是 re

首先重新观察直接返回的内容中，需要的文字前后有什么特殊：

import requests import re  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text print(response)

Python中怎么定位元素

观察几个数目相信就有答案了：<div class="name"><a href="http://product.dangdang.com/xxxxxxxx.html" target="_blank" title="xxxxxxx">

书名就藏在上面的字符串中，蕴含的网址链接中末尾的数字会随着书名而改变。

分析到这里正则表达式就可以写出来了：

import requests import re  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text  def re_for_parse(response):     reg = '<div class="name"><a href="http://product.dangdang.com/\d+.html" target="_blank" title="(.*?)">'     for title in re.findall(reg, response):         print(title)  if __name__ == '__main__':     re_for_parse(response)

可以发现正则写法是最简单的，但是需要对于正则规则非常熟练。所谓正则大法好!

当然，不论哪种方法都有它所适用的场景，在真实操作中我们也需要在分析网页结构来判断如何高效的定位元素，最后附上本文介绍的四种方法的完整代码，大家可以自行操作一下来加深体会

import requests from bs4 import BeautifulSoup from lxml import html import re  url = 'http://bang.dangdang.com/books/bestsellers/01.00.00.00.00.00-24hours-0-0-1-1' response = requests.get(url).text  def bs_for_parse(response):     soup = BeautifulSoup(response, "lxml")     li_list = soup.find('ul', class_='bang_list clearfix bang_list_mode').find_all('li')     for li in li_list:         title = li.find('div', class_='name').find('a')['title']         print(title)  def css_for_parse(response):     soup = BeautifulSoup(response, "lxml")     li_list = soup.select('ul.bang_list.clearfix.bang_list_mode > li')     for li in li_list:         title = li.select('div.name > a')[0]['title']         print(title)  def xpath_for_parse(response):     selector = html.fromstring(response)     books = selector.xpath("//ul[@class='bang_list clearfix bang_list_mode']/li")     for book in books:         title = book.xpath('div[@class="name"]/a/@title')[0]         print(title)  def re_for_parse(response):     reg = '<div class="name"><a href="http://product.dangdang.com/\d+.html" target="_blank" title="(.*?)">'     for title in re.findall(reg, response):         print(title)  if __name__ == '__main__':     # bs_for_parse(response)     # css_for_parse(response)     # xpath_for_parse(response)     re_for_parse(response)

上述内容就是Python中怎么定位元素，你们学到知识或技能了吗？如果还想学到更多技能或者丰富自己的知识储备，欢迎关注编程网Python频道。

您可能感兴趣的文档:

--结束END--

本文标题: Python中怎么定位元素

本文链接: https://lsjlt.com/news/281028.html(转载时请注明来源链接)

有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

回答

如何调试操作系统的错误？
操作系统

2023-11-15发布

回答

操作系统中的I/O系统是如何实现的？
操作系统

2023-11-15发布

回答

如何实现操作系统的内存管理？
操作系统

2023-11-15发布

回答

什么是虚拟内存，它对操作系统有什么影响？
操作系统

2023-11-15发布

回答

ASP中的MVC架构和WebForms架构有什么区别和使用场景？
ASP.NET

2023-11-15发布

回答

ASP中的数据验证和数据校验有什么不同？
ASP.NET

2023-11-15发布

回答

ASP中的ADO对象和DAO对象有什么区别和使用方法？
ASP.NET

2023-11-15发布

回答

Node.js中的包管理器NPM是什么？如何使用它进行依赖管理？
node.js

2023-11-15发布

回答

Vue.js中的动态组件是什么？如何使用它来动态渲染组件？
VUE

2023-11-15发布

回答

如何使用Vue.js实现懒加载和预加载？
VUE

2023-11-15发布

Python中怎么定位元素

Python中怎么定位元素

python---定位元素

python+selenium-元素定位

JavaScript怎么定位元素

python定位元素的方法

python怎么自动化八大定位元素

HTML中怎么利用定位使元素居中

selenium---元素定位（find_element）

python怎么在列表指定位置添加元素

Python爬虫网页元素定位术

c++中vector怎么删除指定位置元素

css中元素的定位方法

Python selenium 八种定位元素的方式

绝对定位元素与相对定位元素的区别与联系

详解Selenium中元素定位方式

css中元素的定位方法是什么

python怎么查找列表中元素的位置

详解Python自动化中这八大元素定位

HTML怎么对一个元素的位置进行定位

c++中怎么用set遍历指定位置的元素

python分析数据的方法是什么

如何使用Python实现抽奖小程序

python copy函数的作用是什么

python ffmpeg模块怎么安装和使用

python进程池创建队列的方法是什么

python无法运行文件的原因有哪些

python can't open file报错怎么解决

python keyerror错误怎么解决

python字符串处理与应用的方法有哪些

python全局变量如何定义