samedi 15 octobre 2022

Odd behaviour of sample when subsetting Seurat objects

I am seeing a peculiar behaviour from the R Seurat package, when trying to subset objects to specific sets of cells.

So, say that I generate three sets of random cell names from a Seurat object using sample

library(Seurat)

set.seed(12345)

ten_cells_id <- sample(Cells(pbmc_small), 10)
other_ten_ids <- sample(Cells(pbmc_small), 10)
and_other_ten <- sample(Cells(pbmc_small), 10)

I can now subset the object using [] and print the cell tags

Cells(pbmc_small[, ten_cells_id], pt.size=3)
Cells(pbmc_small[, other_ten_ids], pt.size=3)
Cells(pbmc_small[, and_other_ten], pt.size=3)

No surprises here; it yields three different things as expected.

> Cells(pbmc_small[, ten_cells_id], pt.size=3)
 [1] "CATGAGACACGGGA" "CGTAGCCTGTATGC" "ACTCGCACGAAAGT" "CTAGGTGATGGTTG" "TTACGTACGTTCAG" "CATGGCCTGTGCAT"
 [7] "ACAGGTACTGGTGT" "AATGTTGACAGTCA" "GATAGAGAAGGGTG" "CATTACACCAACTG"
> Cells(pbmc_small[, other_ten_ids], pt.size=3)
 [1] "GGCATATGCTTATC" "ACAGGTACTGGTGT" "CATCAGGATGCACA" "ATGCCAGAACGACT" "GAGTTGTGGTAGCT" "GGCATATGGGGAGT"
 [7] "AGAGATGATCTCGC" "GAACCTGATGAACC" "GATATAACACGCAT" "CATGAGACACGGGA"
> Cells(pbmc_small[, and_other_ten], pt.size=3)
 [1] "GGGTAACTCTAGTG" "TTTAGCTGTACTCT" "TACATCACGCTAAC" "CTAAACCTGTGCAT" "ATACCACTCTAAGC" "CATGCGCTAGTCAC"
 [7] "GATAGAGAAGGGTG" "ATTACCTGCCTTAT" "GCGCATCTTGCTCC" "ACAGGTACTGGTGT"

However, if I do

cells1 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells2 <- pbmc_small[, sample(Cells(pbmc_small), 10)]
cells3 <- pbmc_small[, sample(Cells(pbmc_small), 10)]

Cells(cells1)
Cells(cells2)
Cells(cells3)

I get three times the same thing

> Cells(cells1)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells2)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"
> Cells(cells3)
 [1] "GATAGAGATCACGA" "GGCATATGCTTATC" "ATGCCAGAACGACT" "AGATATACCCGTAA" "TACAATGATGCTAG" "CATGAGACACGGGA"
 [7] "GCACTAGACCTTTA" "CGTAGCCTGTATGC" "TTACCATGAATCGC" "ATAAGTTGGTACGT"

The values are always the same, independently of the seed I use! I guess that R is somehow resetting the seed each time. This is not an issue with [] as:

a <- 1:100
a[sample(1:100, 10)]
a[sample(1:100, 10)]
a[sample(1:100, 10)]

Returns three different values.

The only thing I can think of is that something strange is happening because Seurat overloads []. Any ideas?




Aucun commentaire:

Enregistrer un commentaire