mardi 26 novembre 2019

Selecting random IDs from another table....confused about LATERAL JOIN

I'm trying to get a handle on generating random data in Postgres, and find I'm misunderstanding something about LATERAL JOIN. Building on some help I got earlier, I've got some code that's trying to:

-- Generate a series of numbers
-- Generate timestamps on the hour matching the number sequence
-- Generate a random(ish) score in a roughly normal distribution for each row
-- Pick an ID randomly from a table in the database.

This last bit is not working. When I run the script shown below, I get a random value for the facility_id, but every row has the same random value. I'd like the random ID assigned on each row, not once globally for the whole run. In procedural thinking, the facility_id is being assigned before the loop, and I want it assigned in the loop. I thought that LATERAL JOIN would help me out here, but

WITH facilities_count AS
(SELECT count(*) from facility)

 SELECT hour_number AS id, -- Get numbers in sequence.
       '2019-01-01 00:00'::timestamp + interval '1 HOUR' * hour_number AS stamp, -- Get hours in sequence
        ABS(TRUNC(normal_rand(1, 0, 1) * 100)) AS score, -- Create a random score in a ~normal distribution.
       random_facility.id

 FROM (SELECT * FROM generate_series(1,8760,1) AS hour_number) generated_numbers 

 LEFT JOIN LATERAL
      (SELECT id
          FROM facility 
        OFFSET floor(random() * (select count from facilities_count))
         LIMIT 1) random_facility
            ON true;

I thought that a subquery might work, but I also get a single value across all rows for the facility_id with this code:

WITH facilities_counter AS
(SELECT count(*) from facility)

 SELECT hour_number AS id, -- Get numbers in sequence.
       '2019-01-01 00:00'::timestamp + interval '1 HOUR' * hour_number AS stamp, -- Get hours in sequence
        ABS(TRUNC(normal_rand(1, 0, 1) * 100)) AS score, -- Create a random score in a ~normal distribution.
         (SELECT id FROM facility OFFSET floor(random() * (select count from facilities_counter)) LIMIT 1)

 FROM (SELECT * FROM generate_series(1,8760,1) AS hour_number) generated_numbers;

I haven't listed the facility table definition, but the only field that matters above is id, so any table would work the same way.

In case it makes any difference to the answer, once I can figure out the solution to this problem, I'd like to then use the random facility_id on each row as an input to select something else out of another table.

Thanks for any help. I'm working on this not just to get the solution, but to try and get a better mental model of how the various tools work. I'm (obviously) not at the point where I can read code like the above and predict in my head how it will behave. Getting to that understanding is kind of fundamental to figuring out how to solve problems on my own. Put another way, I'm not only trying to solve this problem, I'm trying to reduce my mental gaps generally.




Aucun commentaire:

Enregistrer un commentaire