mardi 20 décembre 2022

Select randomly percent of datasets in each percent of epochs

I have a model that should train with 25000 data in 50000 epochs. I want to train with percentage of datasets for percentage of epochs for example it trains for 10 first epoch only 1000 random data then for 10 next epoch, 1000 random data..... My source code in part of dataloder is in follow.

class DataModule(pl.LightningDataModule):

  def __init__(self, train_dataset, val_dataset,  batch_size = 2):

    super(DataModule, self).__init__()
    self.train_dataset = train_dataset
    self.val_dataset = val_dataset
    self.batch_size = batch_size

  def train_dataloader(self):
    return DataLoader(self.train_dataset, batch_size = self.batch_size, 
                      collate_fn = collate_fn, shuffle = True, num_workers = 2, pin_memory = True)
  
  def val_dataloader(self):
    return DataLoader(self.val_dataset, batch_size = self.batch_size,
                    collate_fn = collate_fn, shuffle = False, num_workers = 2, pin_memory = True)

I understand below code could select random of dataset but I want to train the other data for next epochs too.

df_fraction= df_mydataset.sample(frac=0.04) 

And I understand below code could select random of dataset but I dont know how it works.Because I should change data for each 10 epochs

train_sampler = SubsetRandomSampler(train_indices)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=2, sampler=train_sampler)

How can I do that with batch_size=2?




Aucun commentaire:

Enregistrer un commentaire