Experiment¶

class skassist.library.Experiment(experiment_folder)¶

Manages the root folder of an experiment.

An experiments folder manages the dataset and cross-validation splits associated with it. Evaluation and result calcualtion can be initiated for all models with the evaluate()

Attributes:

experiments (list): A list of experiments found in the library folder.

path (str): Path to the root directory.

Todo

Implement LibEstimator base class that all models must inherit from. The base class ensures that the needed properties are implemented.
Offer optional arguments in Experiment.add() for the LibEstimator properties when a user doesn’t want to inherit from the base class!?

classmethod New(name, df, skf, features, lib_path, description='')¶

Factory method for creating a new Experiment instance given a name, dataset, cross-validation mask and a list of features. The path to the library in which the experiment is created must be given.

Args:

name (str):: A name for the experiment. Will be used together with the timestamp for storing the experiment.
df (pandas.DataFrame):: The dataset as a Pandas DataFrame.
skf (numpy.ndarray):: An array of indices, each being one cross-validation split.
features (list):: A list of column names that are to be used as features during training.
lib_path (str):: Path to the library in which the experiment is created.
description (str):: A descriptive string of the dataset, experiment or changes to make finding stuff later easier.

add(estimator)¶

Adds a model to the experiment.

Args:

estimator (BaseEstimator):: A name for the experiment. Will be used together with the timestamp for storing the experiment.

calc_results(scoring_function, name, max_workers=1, verbose=1, te_split_idx=1)¶

Calculate result for all models in this experiment. Calls calc_result() of each Model.

Args:

scoring_function (function()):

A python function that calculates results given a model, its predictions and the true labels. See scoring_function().

name (str):

A name for the result series. If a series with the given names exists, only missing results will be computed. Existing results are not deleted.

max_workers (int):

The number of models for which to concurrently calculate the results.: max_workers=1 is usually faster as the overhead of the ProcessPoolExecutor is too large. Could try ThreadPoolExecutor.

verbose (int): Level of print output. 0 is no output.

te_split_idx (int): Index of split that the model is evaluated on.

find(boolean_func)¶

Iterator function, yielding all models matching boolean_func().

Args:

boolean_func (boolean_func()):: A function that takes a Model and returns a boolean indicating a match.

findone(boolean_func)¶

Return the first model matching boolean_func().

Args:

boolean_func (boolean_func()):: A function that takes a Model and returns a boolean indicating a match.