sparing the details, I am currently working on a Java program that performs Pairwise Classification for a Ranking SVM.
To give some background, I have read contents from a CSV file to a 2D string array list, performed all necessary calculations on the 2D array list, and need to write this back to a CSV file.
However, before the data is written back to a blank CSV file, there is one final step. The last cell of each row contains one of two values - either a "1" or a "-1". My objective is to make sure that the data is written back to the file, such that there are an equal number of data rows (or within a range of 1) with regards to the last cell either containing a "1" or a "-1," if that makes any sense.
To provide some examples, here are two acceptable results to be written back to a file:
[10 20 30 -1]
[12 13 14 1]
[12 13 14 -1]
[34 35 36 1]
and
[10 20 30 -1]
[12 13 14 1]
[12 13 14 -1]
[34 35 36 1]
[20 34 35 -1]
As you can see, in the first example, there is an equal distribution of rows with regards to the last cell of each row being either a "1" or "-1," and in the second, there is a distribution of rows such that the difference of the amount of rows where there is a "-1" or "1" in the last cell differ by only 1. These are both acceptable.
So, I was wondering if someone could provide me with some ideas of how I could go about doing this? What I am currently thinking is the creation of a random number generator for an indexer which continuously includes rows based on this random index, keeping track of the amount of rows that contain a "1" or "-1" in the last cell, only stopping when an amount of random rows are included in my final result such that my criteria of equal/almost equal distribution is met. I'm not quite sure how I would go about setting up code to do this, however.
Thank you, and please let me know if I can provide any more details that would be of help.
Aucun commentaire:
Enregistrer un commentaire