vendredi 22 novembre 2019

Randomly generating huge data which is unique

I want to generate huge data with 16 variables.

The data should be generated randomly with the ranges:

(The range may change)

v01: 20 ~ 30 
v02: 15 ~ 25
v03: 3 ~ 9
v04: 20 ~ 30 
v05: 3 ~ 9
v06: 30 ~ 40
v07: 600 ~ 700
v08: 50 ~ 60
v09: 5 ~ 55
v10: 4 ~ 10
v11: 25 ~ 35
v12: 10 ~ 16
v13: 9 ~ 15
v14: 2 ~ 8
v15: 450 ~ 550
v16: 5 ~ 35

For example:

+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| v01  | v02  | v03  | v04  | v05  | v06  | v07  | v08  | v09  | v10  | v11  | v12  | v13  | v14  | v15  | v16  |
+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
|   24 |   21 |    3 |   29 |    3 |   30 |  635 |   58 |   10 |    8 |   32 |   10 |   12 |    4 |  540 |   32 |
|   20 |   18 |    7 |   22 |    7 |   34 |  675 |   50 |   45 |    4 |   29 |   12 |    9 |    3 |  475 |   34 |
|   28 |   24 |    7 |   28 |    3 |   39 |  685 |   53 |    5 |    6 |   34 |   12 |   10 |    2 |  495 |   31 |
|   23 |   24 |    3 |   28 |    5 |   32 |  645 |   53 |    5 |    4 |   32 |   12 |   14 |    5 |  545 |   25 |
|   21 |   20 |    7 |   28 |    7 |   37 |  635 |   57 |   50 |    4 |   32 |   12 |   12 |    3 |  455 |   34 |
+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+

Every row should be unique, so I created a MySQL table:

CREATE TABLE `test_table` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `v01` smallint(6) DEFAULT NULL,
  `v02` smallint(6) DEFAULT NULL,
  `v03` smallint(6) DEFAULT NULL,
  `v04` smallint(6) DEFAULT NULL,
  `v05` smallint(6) DEFAULT NULL,
  `v06` smallint(6) DEFAULT NULL,
  `v07` smallint(6) DEFAULT NULL,
  `v08` smallint(6) DEFAULT NULL,
  `v09` smallint(6) DEFAULT NULL,
  `v10` smallint(6) DEFAULT NULL,
  `v11` smallint(6) DEFAULT NULL,
  `v12` smallint(6) DEFAULT NULL,
  `v13` smallint(6) DEFAULT NULL,
  `v14` smallint(6) DEFAULT NULL,
  `v15` smallint(6) DEFAULT NULL,
  `v16` smallint(6) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `unique_index` (`v01`,`v02`,`v03`,`v04`,`v05`,`v06`,`v07`,`v08`,`v09`,`v10`,`v11`,`v12`,`v13`,`v14`,`v15`,`v16`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

And I wrote a (python) script to make random values and insert into it.

In the beginning, it could generate a million rows within 10 minutes,

but now, it costs about a whole day to generate a million rows when there are a billion rows existed.

It becomes slower and slower.

I want 10 billion rows, so I'm considering to buy a faster computer.

Before I buy a new computer, I just want to ask a question,

is there any more efficient way to do that?

Thank you




Aucun commentaire:

Enregistrer un commentaire