Ensemblem Package

Models

class ensemblem.model.KWEnsembler(k: int = 5, bias: bool = False, dist_metric=<function euclidean>)[source]

Bases: object

KWEnsembler class This class implements the K-Weighted Ensembler model. It is an ensemble model that uses the k-nearest neighbors of a sample to predict its target value. The weights of the neighbors are calculated using a weight function. The bias of the neighbors can be added to the prediction. :param k: number of neighbors to use :param bias: whether to add the bias of the neighbors to the prediction :param dist_metric: distance metric to use :return: Predictions of the target values for the test set :rtype: bytearray

fit(X_neighbors: DataFrame, y_neighbors: DataFrame, features: List, range_min: int = 0, range_max: int = 1) → None[source]

Fits the ensemble by creating the search space

Parameters

param X_neighbors:: Neighbors search space
param y_neighbors:: Neighbors search space Target values

predict(X_test: ~pandas.core.frame.DataFrame, features: ~typing.List, pred_columns: ~typing.List, weight_function=<function w_inverse_LMAE>) → List[source]

Predicts the target values for the test set using the ensemble method

Parameters:

X_test – Test set
features – Features of the test set
pred_columns – Columns to predict
weight_function – Weight function to use
range_min – Minimum value of minmax scaling
range_max – Maximum value of minmax scaling

Returns:

Predictions of the target values for the test set

Submodules

Metrics

ensemblem.metrics.cosine_v(x, y)[source]

Vector Cosine distance is 1 - cosine similarity

Parameters:

x – point to calculate the distance
y – data to calculate the distance

ensemblem.metrics.euclidean(point, data)[source]

Euclidean distance is the square root of the sum of the squared differences of their coordinates

Parameters:

point – point to calculate the distance
data – data to calculate the distance

Returns:

distance

ensemblem.metrics.euclidean_v(x, y)[source]

Vector Euclidean distance is the square root of the sum of the squared differences of their coordinates

Parameters:

x – point to calculate the distance
y – data to calculate the distance

Returns:

distance

ensemblem.metrics.manhattan_v(x, y)[source]

Vector Manhattan distance is the sum of the absolute differences of their coordinates

Parameters:

x – point to calculate the distance
y – data to calculate the distance

ensemblem.metrics.mean_absolute_error(actual, predicted)[source]

Mean Absolute Error (MAE)

Parameters:

actual – actual values
predicted – predicted values

ensemblem.metrics.mean_absolute_percentage_error(actual, predicted)[source]

Local Mean Absolute Percentage Error (LMAPE)

Parameters:

actual – actual values
predicted – predicted values

ensemblem.metrics.mean_squared_error(actual, predicted) → float[source]: Mean Squared Error (MSE)

ensemblem.metrics.metrics_table(actual, predicted, model_name) → DataFrame[source]

Create a table with pivot with results of multiple models and metrics

Parameters:

actual – actual values
predicted – predicted values
model_name – name of the model

Returns:

table with results

ensemblem.metrics.root_mean_squared_error(actual, predicted) → float[source]: Local Root Mean Squared Error (LRMSE)

ensemblem.metrics.root_mean_squared_log_error(actual, predicted) → float[source]: Local Root Mean Squared Log Error (LRMSLE) :param actual: actual values :param predicted: predicted values :return: RMSLE

Utils

ensemblem.utils.divide_sets(df, train_size, val_size, test_size)[source]

Divide the data into train, validation and test sets

Parameters

param df:: pandas.DataFrame to be divided
param train_size:: float, size of the train set
param val_size:: float, size of the neighbours-set
param test_size:: float, size of the test set
return:: train, validation and test sets

ensemblem.utils.split_sets(df, train_size, val_size, test_size, target)[source]

Split the data into train, validation and test sets with target and features

Parameters

param df:: pandas.DataFrame to be divided
param train_size:: float, size of the train set
param val_size:: float, size of the neighbours-set
param test_size:: float, size of the test set
return:: train, validation and test sets with target and features

Weights_functions

ensemblem.weights_functions.error_bias(data, k, metric)[source]

Calculate the bias of the error

Parameters:

data – data to calculate the distance
k – number of neighbors
metric – distance metric

Returns:

bias of the error

ensemblem.weights_functions.get_k_nearest_neighbors(point, data, k, metric)[source]

Get the k nearest neighbors of a point in a dataset

Parameters:

point – point to calculate the distance
data – data to calculate the distance
k – number of neighbors
metric – distance metric

ensemblem.weights_functions.get_k_nearest_neighbors_weights(point, data, k, metric, weights)[source]

Get the k nearest neighbors of a point in a dataset weighing the neighbors. Parameters: point, data, k, metric, weights

Parameters:

point – point to calculate the distance
data – data to calculate the distance
k – number of neighbors
metric – distance metric
weights – weights of the neighbors

Returns:

k nearest neighbors

ensemblem.weights_functions.predict_inverse_LMAE(point, data, k, metric)[source]: Predict the target value of a point using the inverse LMAE

ensemblem.weights_functions.w_inverse_LMAE(actual, predicted)[source]

Inverse Local MAE

Parameters:

actual – actual values
predicted – predicted values

ensemblem.weights_functions.w_inverse_log_LMAE(actual, predicted)[source]

Inverse Log Local MAE

Parameters:

actual – actual values
predicted – predicted values

Returns:

inverse log local MAE

Ensemblem Package

Models

Parameters

Submodules

Metrics

Utils

Parameters

Parameters

Weights_functions

Module contents