Ensemblem Package
Models
- class ensemblem.model.KWEnsembler(k: int = 5, bias: bool = False, dist_metric=<function euclidean>)[source]
Bases:
objectKWEnsembler class This class implements the K-Weighted Ensembler model. It is an ensemble model that uses the k-nearest neighbors of a sample to predict its target value. The weights of the neighbors are calculated using a weight function. The bias of the neighbors can be added to the prediction. :param k: number of neighbors to use :param bias: whether to add the bias of the neighbors to the prediction :param dist_metric: distance metric to use :return: Predictions of the target values for the test set :rtype: bytearray
- fit(X_neighbors: DataFrame, y_neighbors: DataFrame, features: List, range_min: int = 0, range_max: int = 1) None[source]
Fits the ensemble by creating the search space
Parameters
- param X_neighbors:
Neighbors search space
- param y_neighbors:
Neighbors search space Target values
- predict(X_test: ~pandas.core.frame.DataFrame, features: ~typing.List, pred_columns: ~typing.List, weight_function=<function w_inverse_LMAE>) List[source]
Predicts the target values for the test set using the ensemble method
- Parameters:
X_test – Test set
features – Features of the test set
pred_columns – Columns to predict
weight_function – Weight function to use
range_min – Minimum value of minmax scaling
range_max – Maximum value of minmax scaling
- Returns:
Predictions of the target values for the test set
Submodules
Metrics
- ensemblem.metrics.cosine_v(x, y)[source]
Vector Cosine distance is 1 - cosine similarity
- Parameters:
x – point to calculate the distance
y – data to calculate the distance
- ensemblem.metrics.euclidean(point, data)[source]
Euclidean distance is the square root of the sum of the squared differences of their coordinates
- Parameters:
point – point to calculate the distance
data – data to calculate the distance
- Returns:
distance
- ensemblem.metrics.euclidean_v(x, y)[source]
Vector Euclidean distance is the square root of the sum of the squared differences of their coordinates
- Parameters:
x – point to calculate the distance
y – data to calculate the distance
- Returns:
distance
- ensemblem.metrics.manhattan_v(x, y)[source]
Vector Manhattan distance is the sum of the absolute differences of their coordinates
- Parameters:
x – point to calculate the distance
y – data to calculate the distance
- ensemblem.metrics.mean_absolute_error(actual, predicted)[source]
Mean Absolute Error (MAE)
- Parameters:
actual – actual values
predicted – predicted values
- ensemblem.metrics.mean_absolute_percentage_error(actual, predicted)[source]
Local Mean Absolute Percentage Error (LMAPE)
- Parameters:
actual – actual values
predicted – predicted values
- ensemblem.metrics.metrics_table(actual, predicted, model_name) DataFrame[source]
Create a table with pivot with results of multiple models and metrics
- Parameters:
actual – actual values
predicted – predicted values
model_name – name of the model
- Returns:
table with results
Utils
- ensemblem.utils.divide_sets(df, train_size, val_size, test_size)[source]
Divide the data into train, validation and test sets
Parameters
- param df:
pandas.DataFrame to be divided
- param train_size:
float, size of the train set
- param val_size:
float, size of the neighbours-set
- param test_size:
float, size of the test set
- return:
train, validation and test sets
- ensemblem.utils.split_sets(df, train_size, val_size, test_size, target)[source]
Split the data into train, validation and test sets with target and features
Parameters
- param df:
pandas.DataFrame to be divided
- param train_size:
float, size of the train set
- param val_size:
float, size of the neighbours-set
- param test_size:
float, size of the test set
- return:
train, validation and test sets with target and features
Weights_functions
- ensemblem.weights_functions.error_bias(data, k, metric)[source]
Calculate the bias of the error
- Parameters:
data – data to calculate the distance
k – number of neighbors
metric – distance metric
- Returns:
bias of the error
- ensemblem.weights_functions.get_k_nearest_neighbors(point, data, k, metric)[source]
Get the k nearest neighbors of a point in a dataset
- Parameters:
point – point to calculate the distance
data – data to calculate the distance
k – number of neighbors
metric – distance metric
- ensemblem.weights_functions.get_k_nearest_neighbors_weights(point, data, k, metric, weights)[source]
Get the k nearest neighbors of a point in a dataset weighing the neighbors. Parameters: point, data, k, metric, weights
- Parameters:
point – point to calculate the distance
data – data to calculate the distance
k – number of neighbors
metric – distance metric
weights – weights of the neighbors
- Returns:
k nearest neighbors
- ensemblem.weights_functions.predict_inverse_LMAE(point, data, k, metric)[source]
Predict the target value of a point using the inverse LMAE