jeudi 13 septembre 2018

Why does InedxError occur at random points

I am writing a web scraper in python which goes in a loop n times, however, I get IndexError. I know what it means, but what confuses me is that it occurs at seemingly random points.

date_added = []
stat_name_list = []
stat_values_list = []
def add_to_list(name, values):
   stat_name_list.append(name)
   stat_values_list.append(values)

for n in pbar(range(1,21438)):
    url = 'http://www.futbin.com/18/player/' + str(n)
    res = requests.get(url)
    res.raise_for_status()
    player = bs4.BeautifulSoup(res.text, "html.parser")

I have excluded the middle bit where I am grabbing some values which I have no problems with, however I am struggling with doing operations on the date (which is after I grab it at i == 3)

What I am doing is I am iterating over 14 values to get the data, and then iterating over 14 values to get the tags for the data.

    for i in range(15):
        info_stats = player.select('#info_content .table-row-text')[i]
        info_stats_values = info_stats.getText(strip=True)
        if i == 8:
            info_stats_values = info_stats_values[:3]
        if i in [4, 5, 6, 8, 9, 16]:
            info_stats_values = float(info_stats_values)
        if i == 13: 
            date_added.append(info_stats_values)

        if i != 13:    
            stat_values_list.append(info_stats_values)


    for i in range(15):
        info_stats = player.select('#info_content th')[i]
        info_stats_names = info_stats.getText(strip=True)
        if i != 13: 
            stat_name_list.append(info_stats_names)

Then I am trying to convert the date list that I have as date_added into a dataframe format:

    date_added_f = datetime.strptime(date_added[n-1], '%Y-%m-%d')
    date_today = datetime.strptime('2018-09-12', '%Y-%m-%d') 
    days_in_game = float((date_today - date_added_f).days)
    add_to_list("Days in game", days_in_game)

The process stops at date_added_f = datetime.strptime(date_added[n-1], '%Y-%m-%d') giving me IndexError: list index out of range, however sometimes it happens at 9 iterations, sometimes at 78, and one time it got to 400 something, which seems weird.




Aucun commentaire:

Enregistrer un commentaire