Python grequests takes a long time to finish -


i trying unshort lot of urls have in urlset. following code works of time. times takes long time finish. example have 2950 in urlset. stderr tells me 2900 done, geturlmapping not finish.

def geturlmapping(urlset): # url mapping urlmapping = {} #rs = (grequests.get(u) u in urlset) rs = (grequests.head(u) u in urlset) res = grequests.imap(rs, size = 100) counter = 0 x in res:     counter += 1     if counter % 50 == 0:         sys.stderr.write('doing %d url_mapping length %d \n' %(counter, len(urlmapping)))     urlmapping[ getoriginalurl(x) ]  =   getgoalurl(x)  return urlmapping  def getgoalurl(resp): url='' try:     url = resp.url except:     url = 'null' return url  def getoriginalurl(resp): url='' try:     url = resp.history[0].url except indexerror:     url = resp.url except:     url = 'null' return url 

probably won't has passed long time still..

i having issues requests, similar ones having. me problem requests took ages download pages, using other software (browsers, curl, wget, python's urllib) worked fine...

afer lot of time wasted, noticed server sending invalid headers, example, in 1 of "slow" pages, after content-type: text/html began send header in form header-name : header-value (notice space before colon). somehow breaks python's email.header functionality used parse http headers requests transfer-encoding: chunked header wasn't being parsed.

long story short: manually setting chunked property true of response objects before asking content solved issue. example:

response = requests.get('http://my-slow-url') print(response.text) 

took ages but

response = requests.get('http://my-slow-url') response.raw.chunked = true print(response.text) 

worked great!


Comments