lundi 5 juillet 2021

How to trim a percentage of data points within a specific range in R

I have a text file containing millions of p-values (Range: 1 - 5e-09, ($P)). My goal is to generate a Manhattan plot in R using these p-values. However, since the vast majority of the p-values are in the 0.01-1 range, I would like to randomly trim say, 95% of the p-values in this range before generating the plot (so as to reduce the output file size). Until now, I have been using:

data <- read.table(<path_to_my_p-value_file>)
data <- subset(data,data$P<=0.01)

but this command removes all p-values greater than 0.01, which results in an unsightly gap between the x-axis and the remaining p-values in the Manhattan plot. Is there a way to trim most of the p-values within a specified range (instead of all)?




Aucun commentaire:

Enregistrer un commentaire