dimanche 23 mai 2021

Scrapy rotating user agents in spider

I put a function in my spider which will generate a random user agent from a txt file. Now, I called this function from the start_requests function:

def start_requests(self):

        url = 'someurl'

        head = self.loadUserAgents() 
        
        headers =  {
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.5',
            'User-Agent': head
        }

        yield scrapy.http.Request(url,headers=headers)

I use a parse function that is able to follow the next page. I think that in this way the spider will only generate the random user agent once. How can I force the spider to generate a new user agent on each following page?

Thanks.




Aucun commentaire:

Enregistrer un commentaire