request

# request

request.urlopen (url) 打开一个 url 获取 Response 对象
res.getirl () 获取主机地址
res.getcode () 获取状态码
- 200 为成功
- 3xx 发生了重定向
- 4xx 访问资源有问题
- 5xx 内部错误
res.info () 获取响应头
res.read () 获取的是字节形式的内容返回一文本对象
- textl.decode ("utf-8") 需要指定编码才能正确显示
res.json () 如果返回为 json 数据直接解码
res.encoding = "utf-8" 将 Response 以此编码读取

# 发送请求

res.get (url) 发送 get 请求

#地址传递值
requests.get(http://httpbin.org/get?name=gemey&age=22)

#字典传递
data = {
    'name': 'tom',
    'age': 20
}

response = requests.get('http://httpbin.org/get', params=data)

1
2
3
4
5
6
7
8
9
10

res.post (url) 发送 post 请求

#post请求通过字典或者 json字符串传递参数
data = {'name':'tom','age':'22'}
response = requests.post('http://httpbin.org/post', data=data)

1
2
3

# 代理

同添加 headers 方法，代理参数也要是一个 dict 属性名为 proxies

proxy = {
    'http': '120.25.253.234:812',
    'https' '163.125.222.244:8123'
}
req = requests.get(url, proxies=proxy)

1
2
3
4
5

添加 header 信息，这是最基本的反爬的措施有一些网站拥有反爬技术，我们需要模拟真实浏览器的包头进行访问发生

为请求添加 HTTP 头部，只要简单地传递一个 dict 给 headers 参数就可以了。

request.Request(url,headers=header)

#添加header信息,这是最基本的反爬的措施
url ="http://www.dianping.com/"  #有一些网站拥有反爬技术,我们需要模拟真实浏览器的包头进行访问发生
header={
   "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
}   #需要一个字典存放包头
req=request.Request(url,headers=header)  #requests需要一个网站和包头
res=request.urlopen(req)

print(res.geturl())  #获取主机地址
print(res.getcode())  #获取请求状态码   200为成功  3xx发生了重定向  4xx访问资源有问题   5xx内部错误
print(res.info())  #获取响应头

1
2
3
4
5
6
7
8
9
10
11

获取响应头

r.headers['Content-Type']
r.headers.get('content-type')  #根据key获取响应头

1
2

如果某个响应中包含一些 cookie，你可以快速访问它们

res.cookies ["键名"] 获取 cookies

url ='http://example.com/some/cookie/setting/url'
r = requests.get(url)
r.cookies['example_cookie_name']  #获取指定key的cookies

1
2
3

发送 cookies 通过请求时传递 cookies 属性

url = 'http://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
r.text

1
2
3
4

# 使用 Corntab 定时调度爬虫

在 linux 上安装 chrome

cd /etc/yum.repos.d/
touch google-chrome.repo

1
2

添加 chrome 源

[google-chrome]
name=google-chrome
baseurl=http://dl.google.com/linux/chrome/rpm/stable/$basearch
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub

1
2
3
4
5
6

yum -y install google-chrome-stable --nogpgcheck

crontab -e 编写 python 执行

编辑

#Python模块

上次更新: 2023/12/06, 01:31:48

← 循环 Beautifulsoup4→

request

# request

# 发送请求

# 代理

# 定制请求头 header

# Cookie

# 使用 Corntab 定时调度爬虫