再我们编写爬虫的使用,经常会遇到图片文件; 对于这些图片有时我们需要保存原始路径进行下载; 那么使用Python有什么办法进行来下载吗?
下面我们就来使用Python进行保存原路径下载
对于图片的请求,我们使用requests
进行请求;
首先我们先对pip
包进行更新:
python -m pip install --upgrade pip
接着来安装requests
:
pip install requests
我们使用requests.get()
将文件写入对象中,
r = requests.get(url)
然后对原始图片链接使用正则进行分析, 解析出域名,文件路径,文件名:
def _path_name(self, url):
name = url.split("/")[-1]
reobj1 = re.compile(r'''(?xi)\A
([a-z][a-zA-Z0-9+\-.]*:(//[^/?#]+)?)?
([a-zA-Z0-9\-._~%!$&'()*+,;=:@/]*)''')
match = reobj1.search(url)
if match:
path_name = match.group(3).strip('/')
path = match.group(3).rstrip(name).strip('/')
return path_name, path, name
else:
return path_name, '', name
判断路径是否存在,不存在则创建, 打开相应的文件路径,将文件写入:
path_name, path, name = self._path_name(url)
if not os.path.exists(path): # 判断路径是否存在
os.makedirs(path, mode=0o755) # 不在则创建
# os.chdir(path) # 打开路径文件
with open(path_name, 'wb') as f:
f.write(r.content)
完整图片下载类:
# coding:utf-8
import requests
import os
import re
class ImgDownloader(object):
# 文件下载(单图多图通用)
def downloader(self, urls):
if urls is not None and type(urls)!=str and len(urls) > 0:
for url in urls:
self.img_downloader(url)
elif urls is not None and type(urls)==str and len(urls) > 0:
self.img_downloader(urls)
else:
return
# 文件下载(单图)
def img_downloader(self, url):
try:
r = requests.get(url=url)
except:
print(404)
return -1
try:
path_name, path, name = self._path_name(url)
if not os.path.exists(path): # 判断路径是否存在
os.makedirs(path, mode=0o755) # 不在则创建
# os.chdir(path) # 打开路径文件
with open(path_name, 'wb') as f:
f.write(r.content)
except:
print(403)
return -1
# 文件下载(多图)
def imgs_downloader(self, urls):
if urls is None or len(urls) == 0:
return
for url in urls:
self.img_downloader(url)
def _path_name(self, url):
name = url.split("/")[-1]
reobj1 = re.compile(r'''(?xi)\A
([a-z][a-zA-Z0-9+\-.]*:(//[^/?#]+)?)?
([a-zA-Z0-9\-._~%!$&'()*+,;=:@/]*)''')
match = reobj1.search(url)
if match:
path_name = match.group(3).strip('/')
path = match.group(3).rstrip(name).strip('/')
return path_name, path, name
else:
return path_name, '', name
if __name__=="__main__":
root_url = ("https://fengkui.net/uploads/article/20171112/1510484160334722.jpg","https://fengkui.net/uploads/article/20190321/5cbc21485f1a1.jpg")
obj_spider = ImgDownloader()
obj_spider.downloader(root_url)
本文为冯奎原创文章,转载无需和我联系,但请注明来自冯奎博客fengkui.net
最新评论