Ensemblem Package

Models

class ensemblem.model.KWEnsembler(k: int = 5, bias: bool = False, dist_metric=<function euclidean>)[source]

Bases: object

KWEnsembler class This class implements the K-Weighted Ensembler model. It is an ensemble model that uses the k-nearest neighbors of a sample to predict its target value. The weights of the neighbors are calculated using a weight function. The bias of the neighbors can be added to the prediction. :param k: number of neighbors to use :param bias: whether to add the bias of the neighbors to the prediction :param dist_metric: distance metric to use :return: Predictions of the target values for the test set :rtype: bytearray

fit(X_neighbors: DataFrame, y_neighbors: DataFrame, features: List, range_min: int = 0, range_max: int = 1) None[source]

Fits the ensemble by creating the search space

Parameters

param X_neighbors:

Neighbors search space

param y_neighbors:

Neighbors search space Target values

predict(X_test: ~pandas.core.frame.DataFrame, features: ~typing.List, pred_columns: ~typing.List, weight_function=<function w_inverse_LMAE>) List[source]

Predicts the target values for the test set using the ensemble method

Parameters:
  • X_test – Test set

  • features – Features of the test set

  • pred_columns – Columns to predict

  • weight_function – Weight function to use

  • range_min – Minimum value of minmax scaling

  • range_max – Maximum value of minmax scaling

Returns:

Predictions of the target values for the test set

Submodules

Metrics

ensemblem.metrics.cosine_v(x, y)[source]

Vector Cosine distance is 1 - cosine similarity

Parameters:
  • x – point to calculate the distance

  • y – data to calculate the distance

ensemblem.metrics.euclidean(point, data)[source]

Euclidean distance is the square root of the sum of the squared differences of their coordinates

Parameters:
  • point – point to calculate the distance

  • data – data to calculate the distance

Returns:

distance

ensemblem.metrics.euclidean_v(x, y)[source]

Vector Euclidean distance is the square root of the sum of the squared differences of their coordinates

Parameters:
  • x – point to calculate the distance

  • y – data to calculate the distance

Returns:

distance

ensemblem.metrics.manhattan_v(x, y)[source]

Vector Manhattan distance is the sum of the absolute differences of their coordinates

Parameters:
  • x – point to calculate the distance

  • y – data to calculate the distance

ensemblem.metrics.mean_absolute_error(actual, predicted)[source]

Mean Absolute Error (MAE)

Parameters:
  • actual – actual values

  • predicted – predicted values

ensemblem.metrics.mean_absolute_percentage_error(actual, predicted)[source]

Local Mean Absolute Percentage Error (LMAPE)

Parameters:
  • actual – actual values

  • predicted – predicted values

ensemblem.metrics.mean_squared_error(actual, predicted) float[source]

Mean Squared Error (MSE)

ensemblem.metrics.metrics_table(actual, predicted, model_name) DataFrame[source]

Create a table with pivot with results of multiple models and metrics

Parameters:
  • actual – actual values

  • predicted – predicted values

  • model_name – name of the model

Returns:

table with results

ensemblem.metrics.root_mean_squared_error(actual, predicted) float[source]

Local Root Mean Squared Error (LRMSE)

ensemblem.metrics.root_mean_squared_log_error(actual, predicted) float[source]

Local Root Mean Squared Log Error (LRMSLE) :param actual: actual values :param predicted: predicted values :return: RMSLE

Utils

ensemblem.utils.divide_sets(df, train_size, val_size, test_size)[source]

Divide the data into train, validation and test sets

Parameters

param df:

pandas.DataFrame to be divided

param train_size:

float, size of the train set

param val_size:

float, size of the neighbours-set

param test_size:

float, size of the test set

return:

train, validation and test sets

ensemblem.utils.split_sets(df, train_size, val_size, test_size, target)[source]

Split the data into train, validation and test sets with target and features

Parameters

param df:

pandas.DataFrame to be divided

param train_size:

float, size of the train set

param val_size:

float, size of the neighbours-set

param test_size:

float, size of the test set

return:

train, validation and test sets with target and features

Weights_functions

ensemblem.weights_functions.error_bias(data, k, metric)[source]

Calculate the bias of the error

Parameters:
  • data – data to calculate the distance

  • k – number of neighbors

  • metric – distance metric

Returns:

bias of the error

ensemblem.weights_functions.get_k_nearest_neighbors(point, data, k, metric)[source]

Get the k nearest neighbors of a point in a dataset

Parameters:
  • point – point to calculate the distance

  • data – data to calculate the distance

  • k – number of neighbors

  • metric – distance metric

ensemblem.weights_functions.get_k_nearest_neighbors_weights(point, data, k, metric, weights)[source]

Get the k nearest neighbors of a point in a dataset weighing the neighbors. Parameters: point, data, k, metric, weights

Parameters:
  • point – point to calculate the distance

  • data – data to calculate the distance

  • k – number of neighbors

  • metric – distance metric

  • weights – weights of the neighbors

Returns:

k nearest neighbors

ensemblem.weights_functions.predict_inverse_LMAE(point, data, k, metric)[source]

Predict the target value of a point using the inverse LMAE

ensemblem.weights_functions.w_inverse_LMAE(actual, predicted)[source]

Inverse Local MAE

Parameters:
  • actual – actual values

  • predicted – predicted values

ensemblem.weights_functions.w_inverse_log_LMAE(actual, predicted)[source]

Inverse Log Local MAE

Parameters:
  • actual – actual values

  • predicted – predicted values

Returns:

inverse log local MAE

Module contents