vendredi 9 avril 2021

High fluctuation data optimisation and prediction

Hello I have dataframe that looks like the below. The data is basically capacity information of a cluster system. This is like a small sample of it. You may notice that the days_to_full column fluctuates a lot within a span of 20 days. The Threshold_date column is date when the disk will be full based on adding the days_to_full to the Datetime column.

If I were to build a model that could oversee the high fluctuations in the threshold_date and notify me only when the growth is fairly linear or basically when the growth is real. How would I do that? I would like to know at least 90 to 100 days in advance that I could be going full for real. But such fluctuations could give me wrong predications and cause panic for no reason.

I've been trying to read a lot of articles but unable to identify what (mathematical)model using python would help me make better estimations.

Appreciate if some one could explain why you suggest a model when you suggest it and what are the parameters that could be looked at to make it as optimal in predication as possible. Thanks in advance!

cluster_name   region   Datetime    total_data_capacity used_data_capacity  available_data_capacity days_to_full    Threshold_date
cluster01      lon      2021-03-02  29990.745869        23540.127364        6450.618505             219.000000      2021-11-14
cluster01      lon      2021-03-03  29990.745869        23555.783363        6434.962505             219.000000      2021-11-14
cluster01      lon      2021-03-04  29990.745869        23572.610517        6418.135352             219.833333      2021-11-14
cluster01      lon      2021-03-05  29990.745869        23589.994672        6400.751197             220.000000      2021-11-15
cluster01      lon      2021-03-06  29990.745869        23608.169950        6382.575918             220.000000      2021-11-15
cluster01      lon      2021-03-07  29990.745869        23612.727373        6378.018496             220.000000      2021-11-15
cluster01      lon      2021-03-08  29990.745869        23621.996424        6368.749444             220.000000      2021-11-15
cluster01      lon      2021-03-09  29990.745869        23642.840187        6347.905682             926.285714      2023-10-22
cluster01      lon      2021-03-10  29990.745869        23663.032472        6327.713397             1044.000000     2024-02-17
cluster01      lon      2021-03-11  29990.745869        23682.244640        6308.501229             1004.833333     2024-01-08
cluster01      lon      2021-03-12  29990.745869        23703.716183        6287.029686             997.000000      2024-01-01
cluster01      lon      2021-03-13  29990.745869        23723.670334        6267.075534             997.000000      2024-01-01
cluster01      lon      2021-03-14  29990.745869        23726.441732        6264.304136             997.000000      2024-01-01
cluster01      lon      2021-03-15  29990.745869        23638.685020        6352.060849             997.000000      2024-01-01
cluster01      lon      2021-03-16  29990.745869        23607.307080        6383.438789             1022.000000     2024-01-26
cluster01      lon      2021-03-17  29990.745869        23649.954446        6340.791423             1027.000000     2024-01-31
cluster01      lon      2021-03-18  29990.745869        23694.870332        6295.875536             991.545455      2023-12-26
cluster01      lon      2021-03-19  29990.745869        23739.976639        6250.769230             988.000000      2023-12-23



Aucun commentaire:

Enregistrer un commentaire