I try to scrape some informations on the french car advert website "LaCentral" using Beautifulsoup. Informations like price, mileage, colour... about sportcars interest me for data analysis.
I'm quite comfortable with Beautifulsoup, I successfully done that on other website but this one have something particular : when I try to get the price or the mileage of a car, the value in the html script is :
<span class="cbm__priceWrapper" 185 000 € </span>
So I use this python command :
price = soup.find('span',class_='cbm__priceWrapper').get_text()
But it gave me 212 740 €
rather than 185 000 €
. Same thing with the mileage and for every car advert and it is the only way in the html script to acess to the price or mileage.
I wonder if the website could add a random coefficient to price and mileage to prevent from scraping. Every other informations like colours, numbers of options, date... are correct except price and mileage (two most important of course...)
I didn't found any linear coefficient between the real price and the one get by soup.find, it looks really random...
Have you ever face to that issue while scraping ?
Aucun commentaire:
Enregistrer un commentaire