Model¶
-
class
skassist.library.
Model
(model_path)¶ Manages the root folder of a model.
The model class handles the model file as well as the predictions and results that are stored in seperate files. The evaluate() function runs the model for each cross-valudation split and makes and saves the predictions.
The calc_results() can then be used to calculate a evaluation metric based on the predictions. The evaluation metric is defined by a function that is passed into calc_results().
- Attributes:
meta (
dict
): A dictionary holding meta information about the model.path (
str
): Path to the root directory.
-
classmethod
New
(estimator, name, experiment, target, features, experiment_path, modelParams={})¶ Factory method for creating a new Model instance. The model is created in its sub-folder inside the
experiment_path
folder.- Args:
- estimator (
str
): - A name for the experiment. Will be used together with the timestamp for storing the experiment.
- name (
str
): - A name for the experiment. Will be used together with the timestamp for storing the experiment.
- experiment (
str
): - Name of the experiment. Useful if the model folder is lost & found.
- target (
str
): - Name of the target variable. Must be a columns in the dataset.
- features (
list
): - A list of column names that are to be used as features during training.
- experiment_path (
str
): - Path to the library in which the experiment is created.
- modelParams (
dict
, optional): - A dictionary with tunable model parameters.
- estimator (
-
calc_result
(scoring_function, name, df, skf, verbose=1, te_split_idx=1)¶ Update or create the results series name using the provided
scoring_function()
.Note
The cross-validation indices in skf must index into the DataFrame df.
- Args:
- scoring_function (
function()
): - A python function that calculates results given a model, its
predictions and the true labels. See
scoring_function()
. - skf (
numpy.ndarray
): - An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
- df (
pandas.DataFrame
): - The DataFrame on which to evaluate the model. Must contain all feature, “extra” feature and target columns that the model requires.
- skf (
numpy.ndarray
): - An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
verbose (
int
): Level of print output. 0 is no output.te_split_idx (
int
): Index of split that the model is evaluated on.- scoring_function (
-
evaluate
(df, skf, split_list=None, verbose=1, te_split_idx=1)¶ Cross-evaluates the model on the test datasets given by te_split_idx. te_split_idx indexes the split number to use for testing. All splits bellow the test index are used for training.
Note
The cross-validation indices in skf must index into the DataFrame df.
- Args:
- df (
pandas.DataFrame
): - The DataFrame on which to evaluate the model. Must contain all feature, “extra” feature and target columns that the model requires.
- skf (
numpy.ndarray
): - An array containing arrays of splits. E.g. an array with 10 arrays, each containing 3 splits for a 10-fold cross-validation with training, test and validation set.
- split_list (
list
): - A list of split indices to use for evaluation. This is usefull when computation time is a limiting factor and a reduced evaluation for model selection is sufficient.
verbose (
int
): Level of print output. 0 is no output.- te_split_idx (
int
): - Index of split that the model is evaluated on.
- df (
-
fit
(df)¶ Fits the model on the whole dataset and save it.
- Args:
- df (
pandas.DataFrame
): - The DataFrame on which to train the model. Must contain all feature, “extra” feature and target columns that the model requires.
- df (
-
reset_predictions
()¶ Resets all predictions for all test/validation splits.
-
reset_results
(name, verbose=1, te_split_idx=1)¶ Reset results with given name.
- Args:
name (
str
): Name of the result that should be deleted.verbose (
int
, optional): Level of output. 0 is no output.- te_split_idx (
int
, optional): - Index of evaluation split that should be resetted.
- te_split_idx (