mercredi 5 juin 2019

Need some help shuffling/randomizing a Dictionary

I'm tasked with shuffling or randomizing a Dictionary that contains 30 million records from a text file.

This is the code:

protected void btnRandomize_Click(object sender, EventArgs e)
        {
            var numberOfEntries = 1;
            Dictionary<int, string> entries = null;
            Dictionary<int, string> entriesShuffled = null;

            try
            {
                if (fuNames.HasFile)
                {
                    var filename = @"C:\temp\" + fuNames.FileName;
                    fuNames.SaveAs(filename);
                    string[] entry = null; //new string[rawEntries.Length];
                    entries = new Dictionary<int, string>(30000000);

                    var s = string.Empty;
                    using (StreamReader sr = new StreamReader(filename))
                    {
                        do
                        {
                            s = sr.ReadLine();
                            if (s.Contains("."))
                            {
                                entry = s.Split(new char[] { '.' });
                                entries.Add(numberOfEntries, entry[1].ToString());
                                s = string.Empty;
                                entry = null;
                            }
                            else
                            {
                                entries.Add(numberOfEntries, s);
                            }

                            numberOfEntries++;
                        } while (sr.Peek() != -1);
                    }


                    entriesShuffled = new Dictionary<int, string>(entries.Count);
                    entriesShuffled = entries.Shuffle();

                    // Build the output
                    var dt = BuildGrid(entriesShuffled);

                        var ms = new MemoryStream(2147483647);
                        using (TextWriter tw = new StreamWriter(ms))
                        {
                            for (int i = 0; i < dt.Rows.Count; i++)
                            {
                                tw.WriteLine(dt.Rows[i]["Order"].ToString() + "   " + dt.Rows[i]["Entry Number"].ToString() + "   " + dt.Rows[i]["Name"].ToString());
                            }
                            tw.WriteLine("Timestamp:  " + DateTime.Now.ToString("MM/dd/yyyy HH:mm:ss tt"));

                            tw.Flush();
                            var bytes = ms.ToArray();
                            ms.Close();

                            File.WriteAllBytes(@"C:\temp\" + txtName.Text.Trim() + " - " + txtGroup.Text.Trim() + " - " + DateTime.Now.ToString("MMddyyyy") + ".txt", bytes);
                            Response.Clear();
                            Response.ClearContent();
                            Response.ClearHeaders();
                            Response.ContentType = "application/force-download";
                            Response.AddHeader("content-disposition", "attachment; filename=" + txtName.Text.Trim() + " - " + txtGroup.Text.Trim() + " - " + DateTime.Now.ToString("MMddyyyy") + ".txt");
                            Response.BinaryWrite(bytes);
                        }
                        Response.End();
                    }
                }
                else
                {
                    lblMsg.Text = "Please select your list of names and try again.";
                }
            }
            catch (Exception ex)
            {
                lblMsg.Text = entries.Count.ToString() + " An error occurred while processing the randomization.  Error:  " + ex.Message;
            }
        }

Everything works fine until this line: entriesShuffled = entries.Shuffle();

The .Shuffle() method comes from the following class:

public static class DictionaryExtensions
    {
        public static Dictionary<TKey, TValue> Shuffle<TKey, TValue>(
           this Dictionary<TKey, TValue> source)
        {
            var r = new Random();
            return source.OrderBy(x => r.Next())
               .ToDictionary(item => item.Key, item => item.Value);
        }
    }

It is throwing the System.OutOfMemoryException when it calls the .ToDictionary(item => item.Key, item => item.Value); part of the Lambda expression. This data is not coming from a database which would make this a lot easier but all they can provide me with is a text file. With lesser numbers, say 5-10 million it works fine, but with 30 million it is blowing up even though, the performance monintor is only showing about half of my memory being utilized. This is on a 64 bit i7 with 16 gigs of RAM running Windows 10 professional. I'm running it via the Visual Studio debugger and it keeps throwing the exception when trying to convert back to the Dictionary. I'm using a Dictionary object in order to pass 2 parameters int and string as they must remain mapped to each other during the shuffle and a separate counter is used when building the DataTable.

Will anyone out there help me figure this out? Please provide any changes to the code or any new code examples, not just links to other posts as I've searched and searched for this.

Thank you kindly,

Nathan




Aucun commentaire:

Enregistrer un commentaire