网易云音乐评论爬虫 params encSecKey逆向分析
链接:https://music.163.com/#/song?id=29004400 烟火里的尘埃 id:29004400
首先通过抓包分析评论是通过js动态加载的
接口链接为:https://music.163.com/weapi/comment/resource/comments/get?csrf_token=
请求方式:post
参数为:params、encSecKey
params: m4/nQ+hYmsHIvmReRvcwkAscNlKk3+CSFyPh0NO6IA2INd3JSGzobgOIPCLk0LNb6TqbOa4BJEgKixtD2RaOeFapbGE5I7WP9KXWBfu7R6sDkortRuJGWwRYH4VkpQoSc7lEc4RwcpaKoBBIdwuWGXNFxHQcinrN/JM5EhWQPqOrEoOp3GmKPp+lOWq29T1zaOou03oES+ZJwQu5rAGjMnwzrd1zQD4B5NHAlVntzPjfCvze//TmiVDy/qr6EGfFhmBQtVc0f8Yj56RkKLWt4A==
encSecKey: 5c9c178e4f21e1fdca83b3d093f8ce51d899ba80a65a888c929663b59a5853ec744149d05cc219464cb6e61703097af1e68c7252f47a2727dd7171fb0d2e6b56d3ef716b5730cb1262bc7f80a25397858c427efb96a438a6da7e8dbf92d94585a429401afaa7a976c1348cbd7283c68e05d9a625526377dfe64e9ac8d3102915
这两个参数一看就是加密过的, 我们通过chrome浏览器来进行动态调试看下是如何进行加密的 打开对应的js 进行搜索params
可以看到params: bVZ9Q.encText encSecKey: bVZ9Q.encSecKey
所以我们要看bVZ9Q是如何生成的, 在上一行我们就可以看到
var bVZ9Q = window.asrsea(JSON.stringify(i0x), bqN1x(["流泪", "强"]), bqN1x(Wx4B.md), bqN1x(["爱心", "女孩", "惊恐", "大笑"]));
只要把这个解出来就ok了, 这里分为五步
1、JSON.stringify(i0x) 断点调制找到i0x返回内容
可以看到i0x是个字典 内容是
{"csrf_token": "", "cursor": "1596866209514", "offset": "20", "orderType": "1", "pageNo": "3", "pageSize": "20", "rid": "R_SO_4_29004400", "threadId": "R_SO_4_29004400"}
经过分析可以知道
csrf_token 空字符串
cursor 当前时间错(毫秒)
offset 翻页跳过多好行
orderType 1 固定 猜测应该是排序
pageNo 页码
pageSize 每页返回多少条评论
rid 固定"R_SO_4_" + song_id
threadId 固定"R_SO_4_" + song_id
JSON.stringify(i0x) 相当于python中 json.dumps()
2、bqN1x([“流泪”, “强”])
这里需要知道bqN1x这个函数执行了什么操作
var bqN1x = function(cyC5H) {
var m0x = [];
j0x.bf1x(cyC5H, function(cyB5G) {
m0x.push(Wx4B.emj[cyB5G])
});
return m0x.join("")
};
翻译成python代码
def get_bqN1x(md):
m0x = []
for key in md:
m0x.append(emj[key])
return m0x
emj 是固定的一个字典
Wx4B.emj = {"色": "00e0b","流感": "509f6","这边": "259df","弱": "8642d","嘴唇": "bc356","亲": "62901","开心": "477df","呲牙": "22677","憨笑": "ec152","猫": "b5ff6", "皱眉": "8ace6","幽灵": "15bb7","蛋糕": "b7251","发怒": "52b3a","大哭": "b17a8","兔子": "76aea","星星": "8a5aa","钟情": "76d2e","牵手": "41762","公鸡": "9ec4e","爱意": "e341f","禁止": "56135","狗": "fccf6","亲亲": "95280","叉": "104e0","礼物": "312ec","晕": "bda92","呆": "557c9","生病": "38701","钻石": "14af6","拜": "c9d05","怒": "c4f7f","示爱": "0c368","汗": "5b7a4","小鸡": "6bee2","痛苦": "55932", "撇嘴": "575cc","惶恐": "e10b4","口罩": "24d81","吐舌": "3cfe4","心碎": "875d3","生气": "e8204","可爱": "7b97d","鬼脸": "def52","跳舞": "741d5","男孩": "46b8e","奸笑": "289dc","猪": "6935b","圈": "3ece0","便便": "462db","外星": "0a22b","圣诞": "8e7","流泪": "01000","强": "1","爱心": "0CoJU","女孩": "m6Qyw", "惊恐": "8W8ju", "大笑": "d"};
所以bqN1x([“流泪”, “强”])是固定的
010001
3、bqN1x(Wx4B.md)
同上这个也是固定的
00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7
4、bqN1x([“爱心”, “女孩”, “惊恐”, “大笑”]))
同上这个也是固定的
0CoJUm6Qyw8W8jud
5、 window.asrsea()查看这个函数执行了什么操作
先将js扒出来
function a(a) {
var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = "";
for (d = 0; a > d; d += 1)
e = Math.random() * b.length,
e = Math.floor(e),
c += b.charAt(e);
return c
}
function b(a, b) {
var c = CryptoJS.enc.Utf8.parse(b)
, d = CryptoJS.enc.Utf8.parse("0102030405060708")
, e = CryptoJS.enc.Utf8.parse(a)
, f = CryptoJS.AES.encrypt(e, c, {
iv: d,
mode: CryptoJS.mode.CBC
});
return f.toString()
}
function c(a, b, c) {
var d, e;
return setMaxDigits(131),
d = new RSAKeyPair(b,"",c),
e = encryptedString(d, a)
}
function d(d, e, f, g) {
var h = {}
, i = a(16);
return h.encText = b(d, g),
h.encText = b(h.encText, i),
h.encSecKey = c(i, e, f),
h
}
window.asrsea = d
可以看到window.asrsea = d 所以我们要执行的就是d这个函数大概就是进行了3步运算
1、生成16位随机数
import random
rand = "".join([random.choice("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789") for i in range(16)])
其实这个16位随机数写成固定的就可以省略第3部运算
2、进行两次AES加密得到h,encText 其实就是params
关于AES加密可以我这篇博客AES加密算法介绍
详细代码会放在最后就不贴重复代码了
3、通过位移等一系列运算生成h.encSecKey 其实就是encSecKey这样两个参数就全了
这里可以把第1步的随机数省略掉,直接通过调试导出一个固定随机数和encSecKey就可以一直使用, 是不变的, 或者通过中间人攻击直接修改js随机数那写固定 然后得到对应的encSecKey 也可以。
全部代码
import requests
import json
import base64
import random
import time
from Crypto.Cipher import AES
param2 = "010001"
param3 = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"
param4 = "0CoJUm6Qyw8W8jud"
def AES_encrypt(text, key, iv):
pad = 16 - len(text) % 16
text = text + pad * chr(pad)
text = text.encode("utf-8")
encryptor = AES.new(key.encode('utf-8'), AES.MODE_CBC, iv)
encrypt_text = encryptor.encrypt(text)
encrypt_text = base64.b64encode(encrypt_text)
return encrypt_text.decode('utf-8')
def asrsea(p1, p2, p3, p4):
res = {}
rand_num = "aq9d7cvBOJ1tzj1o"
vi = b"0102030405060708"
h_encText = AES_encrypt(p1, p4, vi)
h_encText = AES_encrypt(h_encText, rand_num, vi)
res["encText"] = h_encText
res["encSecKey"] = "5dec9ded1d7223302cc7db8d7e0428b04139743ab7e3d451ae47837f34e66f9a86f63e45ef20d147c33d88530a6c3c9d9d88e38586b42ee30ce43fbf3283a2b10e3118b76e11d6561d80e33ae38deb96832b1a358665c0579b1576b21f995829d45fc43612eede2ac243c6ebb6c2d16127742f3ac913d3ac7d6026b44cee424e"
return res
for i in range(11):
curr_time = int(time.time() * 1000)
param1 = json.dumps({"csrf_token": "", "cursor": "%s" % curr_time, "offset": str(i*20), "orderType": "2", "pageNo": str(i+1),
"pageSize": "20", "rid": "R_SO_4_29004400", "threadId": "R_SO_4_29004400"})
asrsea_res = asrsea(param1, param2, param3, param4)
url = "https://music.163.com/weapi/comment/resource/comments/get?csrf_token="
headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36",
"Referer": "https://music.163.com/song?id=29004400",
"Content-Type": "application/x-www-form-urlencoded",
"Origin": "http://music.163.com",
"Host": "music.163.com"
}
param_data = {"params": asrsea_res["encText"],
"encSecKey": asrsea_res["encSecKey"]}
r = requests.post(url, headers=headers, data=param_data, verify=False)
for comment in json.loads(r.text)["data"]["comments"]:
print(comment["content"])
break
"""
结果:
花花h
下午好,小烟
那是什么?
埋了
好听
花花高音不是修的哦。
我是2020年的新晋粉,也是70后,错过花花7年了,最近在狂补花花所有的综艺,每天听着花花的歌特喜欢花花的视频边听边欣赏他每一个表情传递的歌的灵魂
jj和周董也差不多吧
好喜欢,好好听
听着花花的《烟里的尘埃》,看着触动我心的评论…眼泪却止不住的流了出来,我是怎么了?
你敢骂他吗?
hhhh尤其文科生,一本书要背的比初中三年的都多
嗯嗯嗯还有那首!我都忘了
你会喜欢我吗
好听!听林俊杰的歌还要VIP
知道我的人知道我有48面,但懂我的人知道我的48面都是真实(花花说的,大概是这样)
一
华晨宇是原唱哈,谢谢
[星星]
这还差不多嘛
"""
大功告成
转载:https://blog.csdn.net/weixin_40352715/article/details/107879915