i followed advice these 2 posts trying create generic scrapy spider:
how pass user defined argument in scrapy spider
creating generic scrapy spider
but i'm getting error variable supposed passing argument not defined. missing in init method?
code:
from scrapy.spider import basespider scrapy.selector import htmlxpathselector data.items import dataitem class companyspider(basespider): name = "woz" def __init__(self, domains=""): ''' domains string ''' self.domains = domains deny_domains = [""] start_urls = [domains] def parse(self, response): hxs = htmlxpathselector(response) sites = hxs.select('/html') items = [] site in sites: item = dataitem() item['text'] = site.select('text()').extract() items.append(item) return items here command-line:
scrapy crawl woz -a domains="http://www.dmoz.org/computers/programming/languages/python/books/" and here error:
nameerror: name 'domains' not defined
you should call super(companyspider, self).__init__(*args, **kwargs) @ beginning of __init__.
def __init__(self, domains="", *args, **kwargs): super(companyspider, self).__init__(*args, **kwargs) self.domains = domains in case first requests depend on spider argument, override start_requests() method, without overriding __init__(). parameter name command line aleady available attribute spider:
class companyspider(basespider): name = "woz" deny_domains = [""] def start_requests(self): yield request(self.domains) # example if domains single url def parse(self, response): ...
Comments
Post a Comment