飞道的博客

Python+selenium 模拟网页点击爬虫交管12123违章数据

404人阅读  评论(0)

在上一篇文章《Python教程—模拟网页点击爬虫定位系统》讲解怎么通过模拟点击方式爬取车辆定位数据,本次介绍怎么以模拟点击方式进入交管12123爬取车辆违章数据,本文直接讲解过程,使用的命令解释见上一篇文章。本文同《Python教程—模拟网页点击爬虫定位系统》同样为企业中实际的爬虫案例,如果之后想进入车企行业可以做个了解。

准备工具:spyder、selenium库、google浏览器及对应版本的chromedriver.exe

效果

注:分享此案例目的是为了帮助同行解放双手,更好管理企业资产,本文程序以删除网址、账号密码,该网址比较麻烦的一点是开始点击登录的时候网页可能会有其他弹窗出现,使得原有路径改变,程序会因为找不到对应路径而报错,重新执行程序即可。除了模拟点击登录,还可以直接通过Cookie直接登录网页,这种方式就可以绕过登录的繁琐步骤。

调用库


  
  1. from selenium import webdriver
  2. import time
  3. import csv
  4. import datetime
  5. from selenium.webdriver.common.by import By
  6. from selenium.webdriver.support import expected_conditions as EC
  7. from selenium.webdriver.support.wait import WebDriverWait
  8. import math
  9. import xlrd

读取需要查询的车牌号

data = xlrd.open_workbook('cheliang.xlsx')

创建浏览,打开网页


  
  1. opt = webdriver.ChromeOptions() #创建浏览
  2. #opt.set_headless() #无窗口模式
  3. driver = webdriver.Chrome(options=opt) #创建浏览器对象
  4. driver.maximize_window() #最大化窗口
  5. print( "正在打开网页")
  6. driver.get( '') #打开网页

依次点击单位登录、输入账号、密码、点击验证码填写区域触发图片、勾选、输入验证码、点击登录


  
  1. time.sleep(3) #加载等待
  2. print( "点击单位登录")
  3. time.sleep(3) #加载等待
  4. driver.find_element_by_xpath( "/html/body/div[1]/div[2]/div/div[2]/div[2]/button").click() #点击单位登录
  5. time.sleep(3) #加载等待
  6. print( "正在填写账号")
  7. elem = driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[1]/div/input")
  8. # 清空原有内容
  9. elem.clear()
  10. # 填入账号
  11. elem.send_keys( "")
  12. time.sleep(1) #加载等待
  13. print( "正在填写密码")
  14. elem = driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[2]/div/input")
  15. # 清空原有内容
  16. elem.clear()
  17. # 填入密码
  18. elem.send_keys( "")
  19. time.sleep(1) #加载等待
  20. print( "正在查看验证码")
  21. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input").click() #查看验证码
  22. print( "请输入验证码")
  23. yanzhengma=input()
  24. time.sleep(1) #加载等待
  25. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[4]/div/label/input").click() #勾选
  26. time.sleep(1) #加载等待
  27. # 填入验证码
  28. elem = driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input")
  29. elem.clear()
  30. elem.send_keys(str(yanzhengma))
  31. time.sleep(1) #加载等待
  32. print( "正在登陆")
  33. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[5]/button").click() #点击

点击违法查询,设置查询时间


  
  1. driver .find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[5]/button") .click()#点击
  2. time .sleep( 3) #加载等待
  3. driver .find_element_by_xpath( "/html/body/div[4]/div/div[1]/ul/li[5]/a") .click()#点击违法查询
  4. time .sleep( 1) #加载等待
  5. driver .find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[1]/div/div[1]/span/i") .click()#点击选择日期
  6. for i in range( 3):
  7. time .sleep( 0.5) #加载等待
  8. driver .find_element_by_xpath( "/html/body/div[6]/div[4]/table/thead/tr/th[1]/i") .click()#点击
  9. time .sleep( 0.5) #加载等待
  10. driver .find_element_by_xpath( "/html/body/div[6]/div[4]/table/tbody/tr/td/span[1]") .click()#点击
  11. time .sleep( 0.5) #加载等待
  12. driver .find_element_by_xpath( "/html/body/div[6]/div[3]/table/tbody/tr[2]/td[1]") .click()#点击

循环依次查询每个车牌违章信息,每次都需要清空上次输入,填写本次查询车牌,识别有多少条数据,共多少页,每页最多展示10条,最后一页有多少条数据


  
  1. for ii in range( 0,nrows):
  2. rowValues= table.row_values(ii) #某一行数据
  3. print( '正在读取第'+str(ii+ 1)+ '辆车')
  4. # 填写车牌
  5. time.sleep( 0. 5) #加载等待
  6. elem = driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[3]/div/input")
  7. elem.clear()
  8. elem.send_keys(rowValues) #输入车牌
  9. time.sleep( 0. 1) #加载等待
  10. driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[4]/button").click() #点击查询
  11. time.sleep( 0. 5) #加载等待
  12. result=driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[2]/div[1]/div/p/span").text #总违章条数
  13. result= int(result)
  14. a=math.ceil(result/ 10) #总页数
  15. b=result%10 #除余

读取列表中的数据,其中扣分和罚款需要点击"查看详情",从弹窗中读取数据


  
  1. result1=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[1]"))).text
  2. result2=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[2]"))).text
  3. result3=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[3]"))).text
  4. result4=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[4]"))).text
  5. result5=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[5]"))).text
  6. result6=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[6]"))).text
  7. result7=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[7]"))).text
  8. WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[8]/a"))).click()#查看详情,打开弹窗
  9. time.sleep( 1) #加载等待
  10. result8=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[7]/span[2]"))).text
  11. result9=WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[8]/span[2]"))).text
  12. result=[result 1,result 2,result 3,result 4,result 5,result 6,result 7,result 8,result 9]
  13. R.append(result)
  14. WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='modal-footer ui_modal']/button"))).click()#关闭弹窗
  15. time.sleep( 0. 5) #加载等待

每读取一辆车的数据就写入表格中


  
  1. with open(wenjian, 'w', encoding= 'utf-8', newline= '') as fp:
  2. writer = csv.writer(fp)
  3. writer.writerows(R) #写入数据

完整代码


  
  1. from selenium import webdriver
  2. import time
  3. import csv
  4. import datetime
  5. from selenium.webdriver.common. by import By
  6. from selenium.webdriver.support import expected_conditions as EC
  7. from selenium.webdriver.support.wait import WebDriverWait
  8. import math
  9. import xlrd
  10. data = xlrd.open_workbook( 'cheliang.xlsx')
  11. table = data.sheets()[ 0]
  12. nrows = table.nrows #行数
  13. ncols = table.ncols #列数
  14. opt = webdriver.ChromeOptions() #创建浏览
  15. #opt.set_headless() #无窗口模式
  16. driver = webdriver.Chrome(options=opt) #创建浏览器对象
  17. driver.maximize_window() #最大化窗口
  18. print( "正在打开网页")
  19. driver.get( '') #打开网页
  20. time.sleep( 3) #加载等待
  21. print( "点击单位登录")
  22. time.sleep( 3) #加载等待
  23. driver.find_element_by_xpath( "/html/body/div[1]/div[2]/div/div[2]/div[2]/button").click() #点击单位登录
  24. time.sleep( 3) #加载等待
  25. print( "正在填写账号")
  26. elem = driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[1]/div/input")
  27. # 清空原有内容
  28. elem.clear()
  29. # 填入账号
  30. elem.send_keys( "")
  31. time.sleep( 1) #加载等待
  32. print( "正在填写密码")
  33. elem = driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[2]/div/input")
  34. # 清空原有内容
  35. elem.clear()
  36. # 填入密码
  37. elem.send_keys( "")
  38. time.sleep( 1) #加载等待
  39. print( "正在查看验证码")
  40. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input").click() #查看验证码
  41. print( "请输入验证码")
  42. yanzhengma=input()
  43. time.sleep( 1) #加载等待
  44. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[4]/div/label/input").click() #勾选
  45. time.sleep( 1) #加载等待
  46. # 填入验证码
  47. elem = driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input")
  48. elem.clear()
  49. elem.send_keys(str(yanzhengma))
  50. time.sleep( 1) #加载等待
  51. print( "正在登陆")
  52. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[5]/button").click() #点击
  53. time.sleep( 3) #加载等待
  54. driver.find_element_by_xpath( "/html/body/div[4]/div/div[1]/ul/li[5]/a").click() #点击违法查询
  55. time.sleep( 1) #加载等待
  56. driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[1]/div/div[1]/span/i").click() #点击选择日期
  57. for i in range( 3):
  58. time.sleep( 0.5) #加载等待
  59. driver.find_element_by_xpath( "/html/body/div[6]/div[4]/table/thead/tr/th[1]/i").click() #点击
  60. time.sleep( 0.5) #加载等待
  61. driver.find_element_by_xpath( "/html/body/div[6]/div[4]/table/tbody/tr/td/span[1]").click() #点击
  62. time.sleep( 0.5) #加载等待
  63. driver.find_element_by_xpath( "/html/body/div[6]/div[3]/table/tbody/tr[2]/td[1]").click() #点击
  64. wenjian=datetime.datetime.now().strftime( '%Y-%m-%d-%H%M%S') #以开始时间作为数据导出的表格文件名
  65. wenjian=wenjian+ '.csv'
  66. R=[]
  67. for ii in range( 0,nrows):
  68. rowValues= table.row_values(ii) #某一行数据
  69. print( '正在读取第'+str(ii+ 1)+ '辆车')
  70. # 填写车牌
  71. time.sleep( 0.5) #加载等待
  72. elem = driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[3]/div/input")
  73. elem.clear()
  74. elem.send_keys(rowValues) #输入车牌
  75. time.sleep( 0.1) #加载等待
  76. driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[4]/button").click() #点击查询
  77. time.sleep( 0.5) #加载等待
  78. result=driver.find_element_by_xpath( "/html/body/div[3]/div/div[2]/div[2]/div[1]/div/p/span").text #总违章条数
  79. result=int(result)
  80. a=math.ceil(result/ 10) #总页数
  81. b=result% 10 #除余
  82. for i in range( 1,a):
  83. for j in range( 1, 11):
  84. result1=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[1]"))).text
  85. result2=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[2]"))).text
  86. result3=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[3]"))).text
  87. result4=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[4]"))).text
  88. result5=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[5]"))).text
  89. result6=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[6]"))).text
  90. result7=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[7]"))).text
  91. #result1=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[1]").text
  92. #result2=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[2]").text
  93. #result3=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[3]").text
  94. #result4=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[4]").text
  95. #result5=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[5]").text
  96. #result6=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[6]").text
  97. #result7=driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[7]").text
  98. WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[8]/a"))).click() #查看详情,打开弹窗
  99. time.sleep( 1) #加载等待
  100. #driver.find_element_by_xpath("//table[@id='my-msg-list']/tbody/tr["+str(j)+"]/td[8]/a").click()#点击列表中的元素
  101. #time.sleep(0.5) #加载等待
  102. result8=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[7]/span[2]"))).text
  103. result9=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[8]/span[2]"))).text
  104. #result8=driver.find_element_by_xpath("//form[@class='form-horizontal']/div[7]/span[2]").text
  105. #result9=driver.find_element_by_xpath("//form[@class='form-horizontal']/div[8]/span[2]").text
  106. result=[result1,result2,result3,result4,result5,result6,result7,result8,result9]
  107. R.append(result)
  108. WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//div[@class='modal-footer ui_modal']/button"))).click() #关闭弹窗
  109. time.sleep( 0.5) #加载等待
  110. #driver.find_element_by_xpath("//div[@class='modal-footer ui_modal']/button").click()#点击列表中的元素
  111. #time.sleep(0.5) #加载等待
  112. driver.find_element_by_link_text( "下一页").click() #翻页
  113. time.sleep( 0.5) #加载等待
  114. if b> 0:
  115. for j in range( 1,b+ 1):
  116. result1=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[1]"))).text
  117. result2=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[2]"))).text
  118. result3=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[3]"))).text
  119. result4=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[4]"))).text
  120. result5=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[5]"))).text
  121. result6=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[6]"))).text
  122. result7=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[7]"))).text
  123. WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[8]/a"))).click() #查看详情,打开弹窗
  124. time.sleep( 1) #加载等待
  125. result8=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[7]/span[2]"))).text
  126. result9=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[8]/span[2]"))).text
  127. result=[result1,result2,result3,result4,result5,result6,result7,result8,result9]
  128. R.append(result)
  129. WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//div[@class='modal-footer ui_modal']/button"))).click() #关闭弹窗
  130. time.sleep( 0.5) #加载等待
  131. if b== 0:
  132. for j in range( 1, 11):
  133. result1=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[1]"))).text
  134. result2=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[2]"))).text
  135. result3=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[3]"))).text
  136. result4=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[4]"))).text
  137. result5=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[5]"))).text
  138. result6=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[6]"))).text
  139. result7=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[7]"))).text
  140. WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//table[@id='my-msg-list']/tbody/tr["+str(j)+ "]/td[8]/a"))).click() #查看详情,打开弹窗
  141. time.sleep( 1) #加载等待
  142. result8=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[7]/span[2]"))).text
  143. result9=WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//form[@class='form-horizontal']/div[8]/span[2]"))).text
  144. result=[result1,result2,result3,result4,result5,result6,result7,result8,result9]
  145. R.append(result)
  146. WebDriverWait(driver, 10). until(EC.element_to_be_clickable((By.XPATH, "//div[@class='modal-footer ui_modal']/button"))).click() #关闭弹窗
  147. time.sleep( 0.5) #加载等待
  148. time.sleep( 0.5) #加载等待
  149. with open(wenjian, 'w',encoding= 'utf-8',newline= '') as fp:
  150. writer = csv.writer(fp)
  151. writer.writerows(R) #写入数据

 


转载:https://blog.csdn.net/qq_39899679/article/details/117267056
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场