vendredi 25 septembre 2020

Hive: randomly select N values from distinct values of one column

Suppose I have a dataset like this

|-----------------|----------------|
|    ID           |     Values     |
|-----------------|----------------|
|     123         |     aaaa       |
|-----------------|----------------|
|    234          |    bbb         |
|-----------------|----------------|
|     123         |     ab3d       |
|-----------------|----------------|
|    264          |     34g3ff     |
|-----------------|----------------|
|     783         |     341g5h     |
|-----------------|----------------|
|    921          |     7jdfew     |
|-----------------|----------------|
|     264         |     53fj       |
|-----------------|----------------|

I would like to randomly select, say, 3 values from the distinct ID values. One possibility is to get a table like this

|-----------------|----------------|
|    ID           |     Values     |
|-----------------|----------------|
|     123         |     aaaa       |
|-----------------|----------------|
|     123         |     ab3d       |
|-----------------|----------------|
|     783         |     341g5h     |
|-----------------|----------------|
|    921          |     7jdfew     |
|-----------------|----------------|

How shall I do that in Hive?




Aucun commentaire:

Enregistrer un commentaire