Model

class skassist.library.Model(model_path)

Manages the root folder of a model.

The model class handles the model file as well as the predictions and results that are stored in seperate files. The evaluate() function runs the model for each cross-valudation split and makes and saves the predictions.

The calc_results() can then be used to calculate a evaluation metric based on the predictions. The evaluation metric is defined by a function that is passed into calc_results().

Attributes:

meta (dict): A dictionary holding meta information about the model.

path (str): Path to the root directory.

classmethod New(estimator, name, experiment, target, features, experiment_path, modelParams={})

Factory method for creating a new Model instance. The model is created in its sub-folder inside the experiment_path folder.

Args:
estimator (str):
A name for the experiment. Will be used together with the timestamp for storing the experiment.
name (str):
A name for the experiment. Will be used together with the timestamp for storing the experiment.
experiment (str):
Name of the experiment. Useful if the model folder is lost & found.
target (str):
Name of the target variable. Must be a columns in the dataset.
features (list):
A list of column names that are to be used as features during training.
experiment_path (str):
Path to the library in which the experiment is created.
modelParams (dict, optional):
A dictionary with tunable model parameters.
calc_result(scoring_function, name, df, skf, verbose=1, te_split_idx=1)

Update or create the results series name using the provided scoring_function().

Note

The cross-validation indices in skf must index into the DataFrame df.

Args:
scoring_function (function()):
A python function that calculates results given a model, its predictions and the true labels. See scoring_function().
skf (numpy.ndarray):
An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
df (pandas.DataFrame):
The DataFrame on which to evaluate the model. Must contain all feature, “extra” feature and target columns that the model requires.
skf (numpy.ndarray):
An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.

verbose (int): Level of print output. 0 is no output.

te_split_idx (int): Index of split that the model is evaluated on.

evaluate(df, skf, split_list=None, verbose=1, te_split_idx=1)

Cross-evaluates the model on the test datasets given by te_split_idx. te_split_idx indexes the split number to use for testing. All splits bellow the test index are used for training.

Note

The cross-validation indices in skf must index into the DataFrame df.

Args:
df (pandas.DataFrame):
The DataFrame on which to evaluate the model. Must contain all feature, “extra” feature and target columns that the model requires.
skf (numpy.ndarray):
An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
split_list (list):
A list of split indices to use for evaluation. This is usefull when computation time is a limiting factor and a reduced evaluation for model selection is sufficient.

verbose (int): Level of print output. 0 is no output.

te_split_idx (int):
Index of split that the model is evaluated on.
fit(df)

Fits the model on the whole dataset and save it.

Args:
df (pandas.DataFrame):
The DataFrame on which to train the model. Must contain all feature, “extra” feature and target columns that the model requires.
reset_predictions()

Resets all predictions for all test/validation splits.

reset_results(name, verbose=1, te_split_idx=1)

Reset results with given name.

Args:

name (str): Name of the result that should be deleted.

verbose (int, optional): Level of output. 0 is no output.

te_split_idx (int, optional):
Index of evaluation split that should be resetted.