mercredi 7 juillet 2021

Sampling files from multiple folders, getting TypeChoice error

I'm currently trying to figure out how to sample multiple audio files (sample size of 50+) from a span of multiple folders to eventually use to train a model. Obviously doing this by hand would be very tedious, so I'm trying to figure out how to write a script to do this:

import os
import sys
from pathlib import Path
import random

wav_pathlist = Path(src_dir).glob('**/*.wav')
lab_pathlist = Path(src_dir).glob('**/*.lab')

random_wav_list = []

for i in range(1, int(sample_size)):
    random_wav_list.append(random.choice(wav_pathlist))

print(random_wav_list)

What my current approach is is to use reservoir sampling via pathlib to try and get random samples. I am able to get the file paths, but they are coming back in PosixPath form. I have been able to extract the file path strings through casting, although I am stuck at one error when trying to get a random sample; using random.choice() should get me a random sample of a set amount of file paths, but this is giving the error: TypeError: object of type 'generator' has no len(), and I'm not sure how to fix this error.

Edit: in response to an answer posted below, I have tried casting to a list like shown below

random_wav_list = []

for i in range(1, int(sample_size)):
    random_wav_list.append(random.choice(list(wav_pathlist)))

it gives me IndexError: Cannot choose from an empty sequence




Aucun commentaire:

Enregistrer un commentaire