jeudi 9 juin 2022

Too frequent Random Byte Collision

I've written a program to simulate collisions in accordance to the typical "Birthday Problem".

However, even taking that into account I'm seeing collisions happen far too frequently in the random data, to the point where my generator is reporting collisions in random 1024 byte data which should be infeasible if my code is correct.

I suspect my problem is in my implementation of my EqualityComparer

void Main()
{
    var gen = System.Security.Cryptography.RandomNumberGenerator.Create();
    
    foreach (int i in new int[]{1,2,3,4,8,16,64,256,1024})
    {
        (var dupe, var counter) = BirthdayByte(gen, i);
        Console.WriteLine($"{i:00}. Found duplicate byte after {counter:000000} tries: {BitConverter.ToString(dupe)} ");    
    }
}

public (byte[], long) BirthdayByte(RandomNumberGenerator rng, int bufferLength)
{
    var found = false;
    HashSet<byte[]> members = new HashSet<byte[]>(new ByteArrayComparer());
    byte[] randomBytes = new byte[bufferLength];
    long counter = 0;
    while (!found)
    {
        counter++;
        rng.GetBytes(randomBytes);
        found = members.Contains(randomBytes);
        if (!found)
        {
            members.Add(randomBytes);
        }
    }
    return (randomBytes, counter);
}

public class ByteArrayComparer : IEqualityComparer<byte[]>
{
    bool IEqualityComparer<byte[]>.Equals(byte[] x, byte[] y)
    {
        if(x.Length != y.Length) return false;
        
        for (int i = 0; i < x.Length; i++)
        {
            if(!x[i].Equals(y[i])) return false;
        }
        return true;
    }

    int IEqualityComparer<byte[]>.GetHashCode(byte[] obj)
    {
        unchecked
        {
            if (obj == null)
            {
                return 0;
            }
            int hash = 17;
            foreach (byte element in obj)
            {
                hash = hash * 31 + element.GetHashCode();
            }
            return hash;
        }
    }
}

One typical output is:

01. Found duplicate byte after 000017 tries: D8 
02. Found duplicate byte after 000044 tries: 9D-59 
03. Found duplicate byte after 000536 tries: 49-DF-8D 
04. Found duplicate byte after 004234 tries: 09-02-ED-18 
08. Found duplicate byte after 095210 tries: 46-1D-86-37-00-ED-AD-6D 
16. Found duplicate byte after 040305 tries: 79-EF-C4-C8-49-97-DD-8A-28-45-CD-D8-E8-EA-58-3E 
64. Found duplicate byte after 068927 tries: 57-12-3E-6D-EF-A1-90-F0-F2-F9-3F-E8-C8-76-0E-15-E9-AF-F0-D4-14-C4-AA-85-D6-D0-56-D6-CB-66-1C-7D-F0-5B-73-CC-67-90-F7-73-0D-DB-7B-64-FC-D8-D3-86-6D-38-DD-6E-9F-1E-18-A3-C1-96-86-C2-5A-BD-47-76 
256. Found duplicate byte after 073932 tries: 11-4D-5E-B7-92-0D-84-58-79-53-60-14-B9-80-3A-7B-00-CC-A4-33-77-E8-8F-21-C5-AB-38-7F-F1-61-3B-9D-F7-C6-3D-F1-39-9A-9E-57-7B-E7-E7-DA-A3-6E-69-D0-5E-8B-9A-94-0D-74-1E-83-F0-02-A4-FD-BF-76-C8-DC-09-CD-4E-55-39-7C-AA-58-0D-5F-FF-6B-1B-A6-51-1B-13-38-FD-FD-DE-C7-35-03-85-5B-AF-3D-F6-21-BA-9F-94-15-05-E8-4F-D0-A6-AA-F2-0D-0D-B7-41-B7-E0-DE-A8-BE-F1-D5-CB-DC-FD-A6-B2-82-72-AC-C5-49-63-5F-55-5B-0D-E5-4A-AD-B9-2D-3D-F5-B5-61-A8-0F-31-91-8F-3C-7D-67-F8-A6-14-58-C6-AA-C5-97-6F-66-19-AC-66-07-BB-BF-FC-FB-E0-C6-41-9F-24-36-5E-51-4D-5B-6D-1C-C6-96-43-09-24-A7-38-86-89-BC-C1-1F-5F-44-48-0C-F8-58-CB-76-01-46-F9-84-EE-CE-77-E7-9C-B6-35-D9-2F-ED-A7-7B-F4-C6-94-6E-79-63-68-29-1C-3A-52-BA-04-DD-80-4E-A2-B0-FB-BF-09-FC-C6-B0-1F-BA-79-2F-F5-DE-3A-D5-00-B0-20-82-65-ED-15-C8-DA-4F 
1024. Found duplicate byte after 026518 tries: F5-54-6A-81-92-27-FB-75-3D-EA-CC-D2-2D-26-90-1D-B8-B5-8F-52-BB-60-49-47-E0-8D-AC-E4-02-1C-76-EA-64-FD-A5-AC-9A-9D-A2-52-69-6E-28-D8-EF-8C-6D-1D-CA-24-3D-85-8B-F5-64-47-81-72-97-24-B9-DA-5E-A9-CB-6E-97-65-2E-96-7C-71-A8-12-79-27-43-31-72-36-0E-F4-C1-D8-E1-17-D9-EA-88-FF-9B-E6-E7-59-10-95-74-7F-6A-D6-6C-95-3A-5B-22-CD-99-6E-C7-D2-09-68-47-B6-F5-14-F0-BB-9B-D0-04-67-E0-85-3E-03-85-AB-68-8D-DB-93-97-8E-B2-65-55-03-47-5E-DC-8B-05-45-44-F8-5C-FA-6D-0B-08-C9-2B-16-6E-35-20-AD-56-BA-14-6B-F1-0B-98-69-52-23-5A-86-6E-CF-08-2B-6A-8D-0E-B1-0F-36-F8-BF-70-33-FB-EB-50-94-D9-12-AB-C9-4E-0B-8C-1C-53-B2-63-96-E9-D1-AF-44-8C-79-23-54-0E-3E-AA-C2-1D-D6-DF-0A-95-01-38-D4-A8-68-0E-FF-5E-6B-EC-20-70-5D-7F-43-FD-43-32-AF-E4-DC-67-6F-8A-0A-33-67-9A-9E-5A-7B-29-07-FA-70-36-0B-96-E4-B9-28-86-FF-FA-0B-9B-38-46-C1-7B-9E-92-F5-E1-F1-44-44-98-D0-DE-0B-9B-88-24-B6-8D-E4-0A-D7-93-5D-65-07-BD-84-E1-54-BC-90-D0-67-38-96-62-4F-4A-C4-04-BC-A7-68-43-F2-77-C7-83-39-CD-D9-44-0D-A7-A7-6D-DF-01-CE-50-ED-D7-36-8B-50-71-FC-AE-16-ED-FD-A9-DD-C3-92-B2-5E-16-6C-67-D0-F6-E5-84-5B-25-AD-34-BA-4D-EB-92-5E-2F-3B-F5-AE-7E-6F-E4-8D-D2-8D-70-DE-D6-38-0C-89-43-26-B6-91-9E-FC-C4-24-43-FC-CB-56-2D-12-84-88-F4-B2-76-01-82-BA-68-6A-4A-07-E2-78-C1-F5-F1-9A-73-35-D2-A9-D7-9A-9A-B5-10-54-1C-D7-BA-63-7E-A4-B6-CA-4C-76-86-B5-FE-03-6C-6C-93-B2-60-14-47-EE-D5-55-36-2F-6B-BA-57-73-53-F2-0F-81-97-2B-A5-5E-79-DF-E0-DF-50-55-B6-1D-BE-EC-40-7A-14-2B-99-BA-31-C8-4E-BC-CE-89-50-87-63-B1-26-70-4D-76-AB-5A-DD-AB-3A-EF-D1-06-17-8E-CD-BE-14-0B-8D-3C-91-05-20-07-23-31-6B-8B-17-76-5E-8B-A7-EC-57-6F-53-56-6A-33-45-0F-85-F0-D8-1F-35-ED-B8-9B-CF-1D-28-19-F4-C9-9E-CA-EF-E3-C9-3A-AE-87-03-20-F2-8D-03-0B-C0-2E-22-C8-0B-71-8D-9E-50-C2-4B-D2-8C-B1-5B-75-2B-D2-AC-2F-0B-D8-CB-3B-41-E0-F1-04-0F-3B-06-F0-85-BE-B8-9A-41-ED-57-EF-C3-A8-85-15-8A-48-96-0F-6C-37-24-C2-54-0E-F3-07-8E-4D-47-0A-FC-68-FC-5E-72-3B-37-49-2A-29-2F-F0-E3-8B-D5-BF-83-40-4C-08-65-ED-2F-25-AC-CA-DF-07-6F-3D-08-32-6F-AB-75-35-67-44-2C-2B-75-69-23-B0-56-1B-7D-9C-BF-A3-E9-EB-37-9A-05-D7-18-55-A4-4D-A1-BF-79-5D-A4-59-93-BF-83-99-70-27-C5-CA-31-BD-4F-58-81-FC-E0-A5-DE-2C-2A-DA-48-DC-8C-4C-21-95-15-F9-DF-23-E8-1D-45-77-A9-3E-55-13-1F-77-DC-C4-A8-8F-9F-02-76-73-72-A5-0E-3D-F6-03-DF-B8-5E-91-DF-83-7C-AB-08-C0-99-54-58-32-A0-7C-41-4D-87-61-BF-32-D0-FD-13-27-4A-70-F7-5A-9D-C5-B8-03-C7-A7-22-5A-AF-E9-EF-93-5E-96-66-D4-91-09-60-B7-AA-21-77-AD-A1-A0-9D-82-5E-C0-28-E4-24-68-DA-32-D8-BE-8F-97-27-7C-FE-47-90-A9-FE-A5-0E-0C-84-BF-47-50-D0-74-B9-AF-77-A8-5F-55-D6-6F-00-79-C8-79-42-16-D9-70-7A-98-72-28-6B-B9-9E-89-18-4E-2A-7B-BA-CB-0C-09-A3-25-0C-D1-E8-4A-51-2D-E8-CA-95-F4-AE-56-3B-86-30-09-D2-92-F9-6B-40-CB-0F-40-F8-92-26-71-CE-AA-11-26-F9-E7-F2-EF-C8-4F-D1-10-7B-54-11-41-F4-8C-13-C2-19-30-1E-8F-E2-2F-EC-69-5E-9B-3C-5A-CE-9E-B3-49-EE-74-53-2D-FE-02-AD-9E-C1-04-88-68-40-91-06-6E-38-AC-DE-15-8F-75-9C-72-E0-49-95-75-1A-11-70-AD-33-4E-A2-90-99-D6-9E-B3-84-17-56-7D-54-86-B2-E3-53-61-D1-92-8F-B1-C4-20-3D-CE-89-38-CE-B2-6C-77-9E-39-02-2A-9E-A8-1F-1C-97-DC-7D-A6-9B-03-A6-B7-0E-95-B8-46-1A-9E-38-26-2E-B2-F3-B7-72-E6-69-1C-42-BC-DC-8C-39-A2-DE-20-3F-66-7A-52-3C-5D 

The first result with a single byte feels about right, but I'd expect that doubling the buffer length ought to make the search take significantly longer and that doesn't seem to be the case.

I've tried printing out the bytes and while they appear to be "genuine" collisions I struggle to believe that I'm really generating random collisions so easily.

Is the problem in the way I'm filling the randomBytes buffer, or a problem in my ByteArrayComparer?




Aucun commentaire:

Enregistrer un commentaire