暖暖免费高清日本社区在线观看,俺来也俺也啪www色

知識(shí)點(diǎn)

requests
json
re
pprint

開發(fā)環(huán)境：

版本：anaconda5.2.0（python3.6.5）
編輯器：pycharm

案例實(shí)現(xiàn)步驟：

一. 數(shù)據(jù)來源分析

(只有當(dāng)你找到數(shù)據(jù)來源的時(shí)候, 才能通過代碼實(shí)現(xiàn))

1.確定需求 (要爬取的內(nèi)容是什么?)

爬取某個(gè)關(guān)鍵詞對(duì)應(yīng)的視頻保存mp4

2.通過開發(fā)者工具進(jìn)行抓包分析分析數(shù)據(jù)從哪里來的(找出真正的數(shù)據(jù)來源)?

靜態(tài)加載頁(yè)面
筆趣閣為例
動(dòng)態(tài)加載頁(yè)面
開發(fā)者工具抓數(shù)據(jù)包

【付費(fèi)VIP完整版】只要看了就能學(xué)會(huì)的教程，80集Python基礎(chǔ)入門視頻教學(xué)

點(diǎn)這里即可免費(fèi)在線觀看

二. 代碼實(shí)現(xiàn)過程

找到目標(biāo)網(wǎng)址
發(fā)送請(qǐng)求 get post
解析數(shù)據(jù) （獲取視頻地址視頻標(biāo)題）
發(fā)送請(qǐng)求請(qǐng)求每個(gè)視頻地址
保存視頻

今天的目標(biāo)

三. 單個(gè)視頻

導(dǎo)入所需模塊

import json
import requests
import re

發(fā)送請(qǐng)求

data = {
 'operationName': "visionSearchPhoto",
 'query': "query visionSearchPhoto($keyword: String, $pcursor: String, $searchSessionId: String, $page: String, $webPageArea: String) {\n  visionSearchPhoto(keyword: $keyword, pcursor: $pcursor, searchSessionId: $searchSessionId, page: $page, webPageArea: $webPageArea) {\n result\n llsid\n webPageArea\n feeds {\ntype\nauthor {\n  id\n  name\n  following\n  headerUrl\n  headerUrls {\n cdn\n url\n __typename\n  }\n  __typename\n}\ntags {\n  type\n  name\n  __typename\n}\nphoto {\n  id\n  duration\n  caption\n  likeCount\n  realLikeCount\n  coverUrl\n  photoUrl\n  liked\n  timestamp\n  expTag\n  coverUrls {\n cdn\n url\n __typename\n  }\n  photoUrls {\n cdn\n url\n __typename\n  }\n  animatedCoverUrl\n  stereoType\n  videoRatio\n  __typename\n}\ncanAddComment\ncurrentPcursor\nllsid\nstatus\n__typename\n }\n searchSessionId\n pcursor\n aladdinBanner {\nimgUrl\nlink\n__typename\n }\n __typename\n  }\n}\n",
 'variables': {
  'keyword': '張三',
  'pcursor': ' ',
  'page': "search",
  'searchSessionId': "MTRfMjcwOTMyMTQ2XzE2Mjk5ODcyODQ2NTJf5oWi5pGHXzQzMQ"
 }
response = requests.post('https://www.kuaishou.com/graphql', data=data)

加請(qǐng)求頭

headers = {
 # Content-Type（內(nèi)容類型）的格式有四種(對(duì)應(yīng)data)：分別是
 # 爬蟲基礎(chǔ)/xml: 把xml作為一個(gè)文件來傳輸
 # multipart/form-data: 用于文件上傳
 'content-type': 'application/json',
 # 用戶身份標(biāo)識(shí)
 'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_721a784b472981d650bcb8bbc5e9c9c2',
 # 瀏覽器信息 （偽裝成瀏覽器發(fā)送請(qǐng)求）
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}

json序列化操作

# json數(shù)據(jù)交換格式, 在JSON出現(xiàn)之前, 大家一直用XML來傳遞數(shù)據(jù)
# 由于各個(gè)語(yǔ)言都支持 JSON ，JSON 又支持各種數(shù)據(jù)類型，所以JSON常用于我們?nèi)粘５?HTTP 交互、數(shù)據(jù)存儲(chǔ)等。
# 將python對(duì)象編碼成Json字符串
data = json.dumps(data)
json_data = requests.post('https://www.kuaishou.com/graphql', headers=headers, data=data).json()

字典取值

feeds = json_data['data']['visionSearchPhoto']['feeds']
for feed in feeds:
 caption = feed['photo']['caption']
 photoUrl = feed['photo']['photoUrl']
 new_title = re.sub(r'[/\:*?<>/\n] ', '-', caption)

再次發(fā)送請(qǐng)求

resp = requests.get(photoUrl).content

保存數(shù)據(jù)

with open('video\\' + title + '.mp4', mode='wb') as f:
 f.write(resp)
print(title, '爬取成功?。?！')

四. 翻頁(yè)爬取

導(dǎo)入模塊

import concurrent.futures
import time

發(fā)送請(qǐng)求

def get_json(url, data):
 response = requests.post(url, headers=headers, data=data).json()
 return response

修改標(biāo)題

def change_title(title):
 # windows系統(tǒng)文件命名 不能含有特殊字符...
 # windows文件命名 字符串不能超過 256...
 new_title = re.sub(r'[/\\|:?<>"*\n]', '_', title)
 if len(new_title) > 50:
  new_title = new_title[:10]
 return new_title

數(shù)據(jù)提取

def parse(json_data):
 data_list = json_data['data']['visionSearchPhoto']['feeds']
 info_list = []
 for data in data_list:
  # 提取標(biāo)題
  title = data['photo']['caption']
  new_title = change_title(title)
  url_1 = data['photo']['photoUrl']
  info_list.append([new_title, url_1])
 return info_list

保存數(shù)據(jù)

def save(title, url_1):
 resp = requests.get(url_1).content
 with open('video\\' + title + '.mp4', mode='wb') as f:
  f.write(resp)
 print(title, '爬取成功?。。?)

主函數(shù) 調(diào)動(dòng)所有的函數(shù)

def run(url, data):
 """主函數(shù) 調(diào)動(dòng)所有的函數(shù)"""
 json_data = get_json(url, data)
 info_list = parse(json_data)
 for title, url_1 in info_list:
  save(title, url_1)
if __name__ == '__main__':
 start_time = time.time()
 with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
  for page in range(1, 5):
url = 'https://www.kuaishou.com/graphql'
data = {
 'operationName': "visionSearchPhoto",
 'query': "query visionSearchPhoto($keyword: String, $pcursor: String, $searchSessionId: String, $page: String, $webPageArea: String) {\n  visionSearchPhoto(keyword: $keyword, pcursor: $pcursor, searchSessionId: $searchSessionId, page: $page, webPageArea: $webPageArea) {\n result\n llsid\n webPageArea\n feeds {\ntype\nauthor {\n  id\n  name\n  following\n  headerUrl\n  headerUrls {\n cdn\n url\n __typename\n  }\n  __typename\n}\ntags {\n  type\n  name\n  __typename\n}\nphoto {\n  id\n  duration\n  caption\n  likeCount\n  realLikeCount\n  coverUrl\n  photoUrl\n  liked\n  timestamp\n  expTag\n  coverUrls {\n cdn\n url\n __typename\n  }\n  photoUrls {\n cdn\n url\n __typename\n  }\n  animatedCoverUrl\n  stereoType\n  videoRatio\n  __typename\n}\ncanAddComment\ncurrentPcursor\nllsid\nstatus\n__typename\n }\n searchSessionId\n pcursor\n aladdinBanner {\nimgUrl\nlink\n__typename\n }\n __typename\n  }\n}\n",
 'variables': {
  'keyword': '曹芬',
  # 'keyword': keyword,
  'pcursor': str(page),
  'page': "search",
  'searchSessionId': "MTRfMjcwOTMyMTQ2XzE2Mjk5ODcyODQ2NTJf5oWi5pGHXzQzMQ"
 }
}
data = json.dumps(data)
executor.submit(run, url, data, )
 print('一共花費(fèi)了：', time.time()-start_time)

耗時(shí)為57.7秒

到此這篇關(guān)于Python爬蟲實(shí)戰(zhàn)之批量下載快手平臺(tái)視頻數(shù)據(jù)的文章就介紹到這了,更多相關(guān)Python 批量下載快手視頻內(nèi)容請(qǐng)搜索本站以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持本站！

版權(quán)聲明：本站文章來源標(biāo)注為YINGSOO的內(nèi)容版權(quán)均為本站所有，歡迎引用、轉(zhuǎn)載，請(qǐng)保持原文完整并注明來源及原文鏈接。禁止復(fù)制或仿造本網(wǎng)站，禁止在非www.sddonglingsh.com所屬的服務(wù)器上建立鏡像，否則將依法追究法律責(zé)任。本站部分內(nèi)容來源于網(wǎng)友推薦、互聯(lián)網(wǎng)收集整理而來，僅供學(xué)習(xí)參考，不代表本站立場(chǎng)，如有內(nèi)容涉嫌侵權(quán)，請(qǐng)聯(lián)系alex-e#qq.com處理。

排名優(yōu)化：網(wǎng)站排名優(yōu)化方法有什么，如何做有效果

老域名：怎樣才算老域名，老域名建站有什么影響

內(nèi)容優(yōu)化：關(guān)鍵字排名要做哪些方面的優(yōu)化，怎樣做

技巧：網(wǎng)站轉(zhuǎn)化率究竟是什么，有什么提升的技巧

一下吧：外貿(mào)站優(yōu)化有哪些基本的做法和注意事項(xiàng)

概要：競(jìng)價(jià)推廣費(fèi)用大概要多少呢，競(jìng)價(jià)推廣好不好

一下吧：SEO中site是什么意思，作用和應(yīng)用是怎樣的

郵箱：付費(fèi)郵箱有哪些優(yōu)勢(shì)，付費(fèi)郵箱挑選要考慮什么

集群是什么意思：集群是什么意思，都有哪些優(yōu)勢(shì)呢