通过爬蛛方式监控url位置的及其下资源的响应情况,发现一个问题
自己的异常做了except urllib2.URLError as e
按官网当帮助写的(https://docs.python.org/2/library/urllib2.html)
File "F:\pc\working\test_selenium\sp-python\aya-web\checktime.py", line 15, in
check
self._getvale=self._fun(check_arg)
File "D:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "D:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "D:\Python27\lib\urllib2.py", line 422, in _open
'_open', req)
File "D:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "D:\Python27\lib\urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "D:\Python27\lib\urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "D:\Python27\lib\httplib.py", line 1067, in getresponse
response.begin()
File "D:\Python27\lib\httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "D:\Python27\lib\httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "D:\Python27\lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 10053] [go]一下,发现是urllib2 的一个bug,如下: add:http://heyman.info/2010/apr/22/python-urllib2-timeout-issue/
------------------原文----------------------------
自己的异常做了except urllib2.URLError as e
按官网当帮助写的(https://docs.python.org/2/library/urllib2.html)
- urllib2.urlopen(url[, data][, timeout])
- Open the URL url, which can be either a string or a Request object.
......
但有超时时提示如下:
check
self._getvale=self._fun(check_arg)
File "D:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "D:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "D:\Python27\lib\urllib2.py", line 422, in _open
'_open', req)
File "D:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "D:\Python27\lib\urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "D:\Python27\lib\urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "D:\Python27\lib\httplib.py", line 1067, in getresponse
response.begin()
File "D:\Python27\lib\httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "D:\Python27\lib\httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "D:\Python27\lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 10053]
------------------原文----------------------------
I use urllib2 from Python's standard library, in quite a few projects. It's quite nice, but the documentation isn't very comprehensive and it always makes me feel like I'm programming Java once I want to do something more complicated than just open an URL and read the response (i.e. handling redirect responses, reading response headers, etc).
Anyway, the other day I found - if not a bug - then at least an undocumented issue. Since Python 2.6, urllib2 provides a way to set the timeout time, like in the following code where the timeout is set to 2.5 seconds:
import urllib2 try: response = urllib2.urlopen("http://google.com", None, 2.5) except URLError, e: print "Oops, timed out?"
If no timeout is specified, the global socket timeout value will be used, which by default is infinite.
The above code will catch almost every timeout, but the problem is that you might still get a timeout raised as a totally different exception:
File "/usr/lib/python2.4/socket.py", line 285, in read data = self._sock.recv(recv_size) File "/usr/lib/python2.4/httplib.py", line 460, in read return self._read_chunked(amt) File "/usr/lib/python2.4/httplib.py", line 495, in _read_chunked line = self.fp.readline() File "/usr/lib/python2.4/socket.py", line 325, in readline data = recv(1) socket.timeout: timed out
The solution is to catch this other exception, thrown by python's socket lib, as well:
import urllib2 import socket try: response = urllib2.urlopen("http://google.com", None, 2.5) except URLError, e: print "Oops, timed out?" except socket.timeout: print "Timed out!"
Hopefully this will save someone else some headache :).
没有评论:
发表评论