I'm currently trying to figure out how to sample multiple audio files (sample size of 50+) from a span of multiple folders to eventually use to train a model. Obviously doing this by hand would be very tedious, so I'm trying to figure out how to write a script to do this:
import os
import sys
from pathlib import Path
import random
wav_pathlist = Path(src_dir).glob('**/*.wav')
lab_pathlist = Path(src_dir).glob('**/*.lab')
random_wav_list = []
for i in range(1, int(sample_size)):
random_wav_list.append(random.choice(wav_pathlist))
print(random_wav_list)
What my current approach is is to use reservoir sampling via pathlib to try and get random samples. I am able to get the file paths, but they are coming back in PosixPath form. I have been able to extract the file path strings through casting, although I am stuck at one error when trying to get a random sample; using random.choice()
should get me a random sample of a set amount of file paths, but this is giving the error: TypeError: object of type 'generator' has no len()
, and I'm not sure how to fix this error.
Edit: in response to an answer posted below, I have tried casting to a list like shown below
random_wav_list = []
for i in range(1, int(sample_size)):
random_wav_list.append(random.choice(list(wav_pathlist)))
it gives me IndexError: Cannot choose from an empty sequence
Aucun commentaire:
Enregistrer un commentaire