一 前言
本篇文章目的是让读者能够快速上手requests模块基本方法的操作,不会去描述HTTP具体是什么东西,需要读者有一定的网络基础;读者学会本篇文章再结合知识追寻者之前的爬虫系列专栏,就可以自己一些公开网址爬取数据,当然请读者谨慎操作,不要爬取公民信息和非公开私人数据,非法哟!!!随手点赞谢谢
二 requests模块安装
pip install requests
三 requests发送请求示例
requests模块简单的发送HTTP请求示例如下
示例 | 含义 |
---|---|
requests.get(“http://httpbin.org/get”) | 发送get请求 |
requests.post(‘http://httpbin.org/post’) | 发送post请求 |
requests.put(‘http://httpbin.org/put’) | 发送put请求 |
requests.delete(‘http://httpbin.org/delete’) | 发送delete请求 |
requests.options(‘http://httpbin.org/delete’) | 发送options请求 |
本篇文章将为大家提供一些可以用于测试的公开网址如下
github公开时间线:https://api.github.com/events
httpbin: http://httpbin.org
四 get请求常用示例
4.1 发送get请求获得响应
- 发送get请求
- 获得响应
- 打印状态码
- 将响应结果转为文本
# -*- coding: utf-8 -*-
import requests,json
url = 'http://httpbin.org/get'
req = requests.get(url)
# 200
print("status_code:",req.status_code)
message_text = req.text
# 打印成文本
print(message_text)
输出结果
status_code: 200
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0",
"X-Amzn-Trace-Id": "Root=1-5e3a6de9-1c79a72a76f77a449ca0ee40"
},
"origin": "110.90.178.3",
"url": "http://httpbin.org/get"
}
4.2 get请求传参
- 定义url
- 设置字典参数
- 发送get请求
- 将响应转为字典
# -*- coding: utf-8 -*-
import requests,json
url = "http://httpbin.org/get"
param = {'id':5}
req = requests.get(url, param)
print(req.json())
结果如下
{'args': {'id': '5'}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.22.0', 'X-Amzn-Trace-Id': 'Root=1-5e3a6ec5-feb92bca457fb95c502d5e68'}, 'origin': '110.90.178.3', 'url': 'http://httpbin.org/get?id=5'}
4.3 get请求添加请求头
- 设置url
- 设置header
- 发送get 请求
- 获得响应转为字典
# -*- coding: utf-8 -*-
import requests,json
url = 'http://httpbin.org/get'
headers = {
'Referer': 'http://httpbin.org/get',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
req = requests.get(url,headers=headers)
print(req.json())
结果如下
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'Referer': 'http://httpbin.org/get', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36', 'X-Amzn-Trace-Id': 'Root=1-5e3a6fcd-03b63f0a54e3864be6c844ce'}, 'origin': '110.90.178.3', 'url': 'http://httpbin.org/get'}
4.4 get抓取二进制流
这是一张美女丝袜图片哟,低调。
- 设置url
- 定义请求头
- 发送get请求
- 打印状态码
- 将响应结果转为二进制流使用open函数存储
# -*- coding: utf-8 -*-
import requests,json
url = 'https://mtku.cdn.bcebos.com/wp-content/uploads/2019/11/2019113011584845-270x370.jpg?v=1575115129'
headers = {
'Referer': 'https://www.mtku.net/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
req = requests.get(url,headers=headers)
print(req.status_code)
with open('../dirs/mv.jpg', 'wb') as f:
f.write(req.content)
4.5 发送cookie
- 设置url
- 设置cookies
- 发送请求获得响应
# -*- coding: utf-8 -*-
import requests,json
url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='zszxz')
response = requests.get(url,cookies=cookies)
print(response.text)
输出结果
{
"cookies": {
"cookies_are": "zszxz"
}
}
4.6 通过RequestsCookieJar设置cookie
- 设置url
- 获取RequestsCookieJar
- RequestsCookieJar中设置key-val
- 发送请求,获得响应
# -*- coding: utf-8 -*-
import requests,json
url = 'http://httpbin.org/cookies'
jar = requests.cookies.RequestsCookieJar()
print(jar)
jar.set('set_cookies','zszxz')
response = requests.get(url, cookies=jar)
print(response.text)
输出
<RequestsCookieJar[]>
{
"cookies": {
"set_cookies": "zszxz"
}
}
4.7 身份认证
- 设url
- 使用auth,key为账号,value为密码
# -*- coding: utf-8 -*-
import requests,json
url = 'http://httpbin.org/post'
response = requests.get(url,auth=('zszxz','zszxz'))
print(response.text)
输出结果是报错,读者可以使用自己能登陆的地址测试
4.8代理与超时设置
爬取网页时,做代理是比不可取少的,知识追寻者当初就后悔没做代理差点出事咯;有能力的读者可以购买代理比较稳定;
- 设置多个代理
- 设置超时4秒
- 发送请求
# -*- coding: utf-8 -*-
import requests,json
proxies = {"http": "http://ip:port","https": "http://ip:port",}
url = 'http://httpbin.org/get'
response = requests.get(url, proxies=proxies, timeout=4)
print(response.text)
输出
{
"args": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Host": "httpbin.org",
"User-Agent": "python-requests/2.22.0",
"X-Amzn-Trace-Id": "Root=1-5e3a7f91-8b3c1a06035f04b53b847295"
},
"origin": "ip",
"url": "http://httpbin.org/get"
}
转载:https://blog.csdn.net/youku1327/article/details/104185139
查看评论