图片下载保留原路径(Python)

再我们编写爬虫的使用,经常会遇到图片文件;
对于这些图片有时我们需要保存原始路径进行下载;
那么使用Python有什么办法进行来下载吗?

下面我们就来使用Python进行保存原路径下载
对于图片的请求,我们使用requests进行请求;
首先我们先对pip包进行更新:

python -m pip install --upgrade pip

接着来安装requests

pip install requests

我们使用requests.get()将文件写入对象中,

r = requests.get(url)

然后对原始图片链接使用正则进行分析,
解析出域名,文件路径,文件名:

def _path_name(self, url):
    name = url.split("/")[-1]
    reobj1 = re.compile(r'''(?xi)\A
    ([a-z][a-zA-Z0-9+\-.]*:(//[^/?#]+)?)?
    ([a-zA-Z0-9\-._~%!$&'()*+,;=:@/]*)''')
    match = reobj1.search(url)
    if match:
        path_name = match.group(3).strip('/')
        path = match.group(3).rstrip(name).strip('/')
        return path_name, path, name
    else:
        return path_name, '', name

判断路径是否存在,不存在则创建,
打开相应的文件路径,将文件写入:

path_name, path, name = self._path_name(url)
if not os.path.exists(path): # 判断路径是否存在
    os.makedirs(path) # 不在则创建
# os.chdir(path) # 打开路径文件
with open(path_name, 'wb') as f:
    f.write(r.content)

完整图片下载类:

# coding:utf-8
import requests
import os
import re

class ImgDownloader(object):

    def img_dowloader(self, url):
        print(url)
        try:
            r = requests.get(url=url)
        except:
            print(404)
            return -1
        try:
            path_name, path, name = self._path_name(url)
            if not os.path.exists(path): # 判断路径是否存在
                os.makedirs(path) # 不在则创建
            # os.chdir(path) # 打开路径文件
            with open(path_name, 'wb') as f:
                f.write(r.content)
        except:
            print(403)
            return -1

    def imgs_dowloader(self, urls):
        if urls is None or len(urls) == 0:
            return
        for url in urls:
            self.img_dowloader(url)

    def _path_name(self, url):
        name = url.split("/")[-1]
        reobj1 = re.compile(r'''(?xi)\A
        ([a-z][a-zA-Z0-9+\-.]*:(//[^/?#]+)?)?
        ([a-zA-Z0-9\-._~%!$&'()*+,;=:@/]*)''')
        match = reobj1.search(url)
        if match:
            path_name = match.group(3).strip('/')
            path = match.group(3).rstrip(name).strip('/')
            return path_name, path, name
        else:
            return path_name, '', name


if __name__=="__main__":
    root_url = "https://fengkui.net/uploads/article/20171112/1510484160334722.jpg"
    obj_spider = ImgDownloader()
    obj_spider.img_dowloader(root_url)

冯奎博客
请先登录后发表评论
  • 最新评论
  • 总共0条评论