飞道的博客

python之requests最佳入门操作

217人阅读  评论(0)

一 前言

本篇文章目的是让读者能够快速上手requests模块基本方法的操作,不会去描述HTTP具体是什么东西,需要读者有一定的网络基础;读者学会本篇文章再结合知识追寻者之前的爬虫系列专栏,就可以自己一些公开网址爬取数据,当然请读者谨慎操作,不要爬取公民信息和非公开私人数据,非法哟!!!随手点赞谢谢

二 requests模块安装

pip install requests

三 requests发送请求示例

requests模块简单的发送HTTP请求示例如下

示例 含义
requests.get(“http://httpbin.org/get”) 发送get请求
requests.post(‘http://httpbin.org/post’) 发送post请求
requests.put(‘http://httpbin.org/put’) 发送put请求
requests.delete(‘http://httpbin.org/delete’) 发送delete请求
requests.options(‘http://httpbin.org/delete’) 发送options请求

本篇文章将为大家提供一些可以用于测试的公开网址如下

github公开时间线:https://api.github.com/events

httpbin: http://httpbin.org

四 get请求常用示例

4.1 发送get请求获得响应

  1. 发送get请求
  2. 获得响应
  3. 打印状态码
  4. 将响应结果转为文本
# -*- coding: utf-8 -*-
import requests,json

url = 'http://httpbin.org/get'
req = requests.get(url)
# 200
print("status_code:",req.status_code)
message_text = req.text
# 打印成文本
print(message_text)

输出结果

status_code: 200
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e3a6de9-1c79a72a76f77a449ca0ee40"
  }, 
  "origin": "110.90.178.3", 
  "url": "http://httpbin.org/get"
}

4.2 get请求传参

  1. 定义url
  2. 设置字典参数
  3. 发送get请求
  4. 将响应转为字典
# -*- coding: utf-8 -*-
import requests,json

url = "http://httpbin.org/get"
param = {'id':5}
req = requests.get(url, param)
print(req.json())

结果如下

{'args': {'id': '5'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0', 'X-Amzn-Trace-Id': 'Root=1-5e3a6ec5-feb92bca457fb95c502d5e68'}, 'origin': '110.90.178.3', 'url': 'http://httpbin.org/get?id=5'}

4.3 get请求添加请求头

  1. 设置url
  2. 设置header
  3. 发送get 请求
  4. 获得响应转为字典
# -*- coding: utf-8 -*-
import requests,json

url = 'http://httpbin.org/get'
headers = {
	'Referer': 'http://httpbin.org/get',
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
req = requests.get(url,headers=headers)
print(req.json())

结果如下

{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'Referer': 'http://httpbin.org/get', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36', 'X-Amzn-Trace-Id': 'Root=1-5e3a6fcd-03b63f0a54e3864be6c844ce'}, 'origin': '110.90.178.3', 'url': 'http://httpbin.org/get'}

4.4 get抓取二进制流

这是一张美女丝袜图片哟,低调。

  1. 设置url
  2. 定义请求头
  3. 发送get请求
  4. 打印状态码
  5. 将响应结果转为二进制流使用open函数存储
# -*- coding: utf-8 -*-
import requests,json

url = 'https://mtku.cdn.bcebos.com/wp-content/uploads/2019/11/2019113011584845-270x370.jpg?v=1575115129'
headers = {
	'Referer': 'https://www.mtku.net/',
	'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
req = requests.get(url,headers=headers)
print(req.status_code)
with open('../dirs/mv.jpg', 'wb') as f:
f.write(req.content)

4.5 发送cookie

  1. 设置url
  2. 设置cookies
  3. 发送请求获得响应
# -*- coding: utf-8 -*-
import requests,json

url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='zszxz')
response = requests.get(url,cookies=cookies)
print(response.text)

输出结果

{
  "cookies": {
    "cookies_are": "zszxz"
  }
}

4.6 通过RequestsCookieJar设置cookie

  1. 设置url
  2. 获取RequestsCookieJar
  3. RequestsCookieJar中设置key-val
  4. 发送请求,获得响应
# -*- coding: utf-8 -*-
import requests,json

url = 'http://httpbin.org/cookies'
jar = requests.cookies.RequestsCookieJar()
print(jar)
jar.set('set_cookies','zszxz')
response = requests.get(url, cookies=jar)
print(response.text)

输出

<RequestsCookieJar[]>
{
  "cookies": {
    "set_cookies": "zszxz"
  }
}

4.7 身份认证

  1. 设url
  2. 使用auth,key为账号,value为密码
# -*- coding: utf-8 -*-
import requests,json

url = 'http://httpbin.org/post'
response = requests.get(url,auth=('zszxz','zszxz'))
print(response.text)

输出结果是报错,读者可以使用自己能登陆的地址测试

4.8代理与超时设置

爬取网页时,做代理是比不可取少的,知识追寻者当初就后悔没做代理差点出事咯;有能力的读者可以购买代理比较稳定;

  1. 设置多个代理
  2. 设置超时4秒
  3. 发送请求
# -*- coding: utf-8 -*-
import requests,json

proxies = {"http": "http://ip:port","https": "http://ip:port",}
url = 'http://httpbin.org/get'
response = requests.get(url, proxies=proxies, timeout=4)
print(response.text)

输出

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.22.0", 
    "X-Amzn-Trace-Id": "Root=1-5e3a7f91-8b3c1a06035f04b53b847295"
  }, 
  "origin": "ip", 
  "url": "http://httpbin.org/get"
}

转载:https://blog.csdn.net/youku1327/article/details/104185139
查看评论
* 以上用户言论只代表其个人观点,不代表本网站的观点或立场