Model¶

class skassist.library.Model(model_path)¶

Manages the root folder of a model.

The model class handles the model file as well as the predictions and results that are stored in seperate files. The evaluate() function runs the model for each cross-valudation split and makes and saves the predictions.

The calc_results() can then be used to calculate a evaluation metric based on the predictions. The evaluation metric is defined by a function that is passed into calc_results().

Attributes:

meta (dict): A dictionary holding meta information about the model.

path (str): Path to the root directory.

classmethod New(estimator, name, experiment, target, features, experiment_path, modelParams={})¶

Factory method for creating a new Model instance. The model is created in its sub-folder inside the experiment_path folder.

Args:

estimator (str):: A name for the experiment. Will be used together with the timestamp for storing the experiment.
name (str):: A name for the experiment. Will be used together with the timestamp for storing the experiment.
experiment (str):: Name of the experiment. Useful if the model folder is lost & found.
target (str):: Name of the target variable. Must be a columns in the dataset.
features (list):: A list of column names that are to be used as features during training.
experiment_path (str):: Path to the library in which the experiment is created.
modelParams (dict, optional):: A dictionary with tunable model parameters.

calc_result(scoring_function, name, df, skf, verbose=1, te_split_idx=1)¶

Update or create the results series name using the provided scoring_function().

Note

The cross-validation indices in skf must index into the DataFrame df.

Args:

scoring_function (function()):: A python function that calculates results given a model, its predictions and the true labels. See scoring_function().
skf (numpy.ndarray):: An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
df (pandas.DataFrame):: The DataFrame on which to evaluate the model. Must contain all feature, “extra” feature and target columns that the model requires.
skf (numpy.ndarray):: An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.

verbose (int): Level of print output. 0 is no output.

te_split_idx (int): Index of split that the model is evaluated on.

evaluate(df, skf, split_list=None, verbose=1, te_split_idx=1)¶

Cross-evaluates the model on the test datasets given by te_split_idx. te_split_idx indexes the split number to use for testing. All splits bellow the test index are used for training.

Note

The cross-validation indices in skf must index into the DataFrame df.

Args:

df (pandas.DataFrame):: The DataFrame on which to evaluate the model. Must contain all feature, “extra” feature and target columns that the model requires.
skf (numpy.ndarray):: An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
split_list (list):: A list of split indices to use for evaluation. This is usefull when computation time is a limiting factor and a reduced evaluation for model selection is sufficient.

verbose (int): Level of print output. 0 is no output.

te_split_idx (int):: Index of split that the model is evaluated on.

fit(df)¶

Fits the model on the whole dataset and save it.

Args:

df (pandas.DataFrame):: The DataFrame on which to train the model. Must contain all feature, “extra” feature and target columns that the model requires.

reset_predictions()¶: Resets all predictions for all test/validation splits.

reset_results(name, verbose=1, te_split_idx=1)¶

Reset results with given name.

Args:

name (str): Name of the result that should be deleted.

verbose (int, optional): Level of output. 0 is no output.

te_split_idx (int, optional):: Index of evaluation split that should be resetted.