I want to select a random set of columns from dataframe and then calculate probability, the best algorithm for probability is expectation–maximization (EM) algorithm, but with modification to select randomly. The final output is like that ( for every run, the algorithm selected random columns):
Feature | likelihood
F1, F2, F3 | 56
F2, F4, F6 | 78
the code:
import random
def em(data, columns, num_iterations):
# Initialize the model parameters
model_params = initialize_model_params(data, columns)
for i in range(num_iterations):
# Select a random subset of columns
selected_columns = random.sample(columns, k=len(columns) // 2)
# E step: calculate the expected value of the hidden variables using the selected columns
expected_hidden_vars = calculate_expected_hidden_vars(data, selected_columns, model_params)
# M step: update the estimates of the model parameters using the selected columns and the expected hidden variables
model_params = update_model_params(data, selected_columns, expected_hidden_vars)
# Return the maximum likelihood estimates of the model parameters
return model_params
def initialize_model_params(data, columns):
# Initialize the model parameters using the data in the columns
# Return the initialized parameters
def calculate_expected_hidden_vars(data, columns, model_params):
# Calculate the probabilities of the different values that the columns can take on
# Use the current estimates of the model parameters and the probabilities to calculate the expected value of the hidden variables
# Return the expected hidden variables
def update_model_params(data, columns, expected_hidden_vars):
# Use the expected values of the hidden variables to update the estimates of the model parameters
# Return the updated model parameters
how can definitions of the functions: initialize_model_params
, calculate_expected_hidden_vars
, update_model_params
Aucun commentaire:
Enregistrer un commentaire