I am writing a web scraper in python which goes in a loop n times, however, I get IndexError. I know what it means, but what confuses me is that it occurs at seemingly random points.
date_added = []
stat_name_list = []
stat_values_list = []
def add_to_list(name, values):
stat_name_list.append(name)
stat_values_list.append(values)
for n in pbar(range(1,21438)):
url = 'http://www.futbin.com/18/player/' + str(n)
res = requests.get(url)
res.raise_for_status()
player = bs4.BeautifulSoup(res.text, "html.parser")
I have excluded the middle bit where I am grabbing some values which I have no problems with, however I am struggling with doing operations on the date (which is after I grab it at i == 3)
What I am doing is I am iterating over 14 values to get the data, and then iterating over 14 values to get the tags for the data.
for i in range(15):
info_stats = player.select('#info_content .table-row-text')[i]
info_stats_values = info_stats.getText(strip=True)
if i == 8:
info_stats_values = info_stats_values[:3]
if i in [4, 5, 6, 8, 9, 16]:
info_stats_values = float(info_stats_values)
if i == 13:
date_added.append(info_stats_values)
if i != 13:
stat_values_list.append(info_stats_values)
for i in range(15):
info_stats = player.select('#info_content th')[i]
info_stats_names = info_stats.getText(strip=True)
if i != 13:
stat_name_list.append(info_stats_names)
Then I am trying to convert the date list that I have as date_added
into a dataframe format:
date_added_f = datetime.strptime(date_added[n-1], '%Y-%m-%d')
date_today = datetime.strptime('2018-09-12', '%Y-%m-%d')
days_in_game = float((date_today - date_added_f).days)
add_to_list("Days in game", days_in_game)
The process stops at date_added_f = datetime.strptime(date_added[n-1], '%Y-%m-%d')
giving me IndexError: list index out of range
, however sometimes it happens at 9 iterations, sometimes at 78, and one time it got to 400 something, which seems weird.
Aucun commentaire:
Enregistrer un commentaire