dimanche 23 octobre 2022

Machine learning design for a simple problem

I'm a beginner in ML, have read some theory and tried some codes, but still a newbie and not yet confortable enough to have a clear mind on how to design a ML network to solve a specific problem.

Was hoping that some of you could give me some advices on this simple problem i would try to solve with ML.

I will try to simplify the problem and give you some reproducible code :

I have several matrices that i call "models" with some weights. Here is what a model could look like : enter image description here

9 letters, each letter has a weight.

I also have thousands of matrices that i call samples. they are based on a model. here is an example of 2 samples, based on the same model given earlier : enter image description hereenter image description here

Sample1 has 19 occurences, Sample2 has 26 occurences.

What i would my ML algorithm to do : I give a sample in input, and it will interpolate the original weights (weights of the model).

An important thing : on every model, weights are not independant. They are strongly inter-correlated. A and C could be strongly correlated in all the models, whereas A and I are weakly correlated. This is why i would try to solve this with a ML algorithm, the algorithm could learn to understand the correlations between letters, and use them to interpolate better the original weights.

To train the model, i would use hundreds of model matrices, for each model there would be hundreds of samples matrices. I give in inputs one sample (and eventually the number n of occurences that could be useful), and the outputs are the 9 weights.

Here is some code to generate one sample matrix, given a model matrix :

import random 

letters = ['A','B','C','D','E','F','G','H','I']

# 2 matrix models
all_model_weights = [[0.9, 0.7, 1, 0.6, 0.3, 0, 0, 0, 0.3],
                 [0, 0.2, 0.7, 1, 1, 1, 0.7, 1, 0.5]]

#Lets generate a sample based on the first model
model_weights = dict(zip(letters, all_model_weights[0]))

n = random.randrange(20,100)
random_letter = [random.randrange(0, 9) for i in range(n)]
probs = [random.uniform(0, 1) for i in range(n)]
occurences = [0] * 9

for i in range(0,n):
    if probs[i] < model_weights[letters[random_letter[i]]]:
        occurences[random_letter[i]]+=1

# This is the sample matrix generated
occurences    

I hope i made the problem clear and you guys can give me some advices on how to design an ML algorithm to try to do this task

Thanks !




Aucun commentaire:

Enregistrer un commentaire