Experiment¶
-
class
skassist.library.
Experiment
(experiment_folder)¶ Manages the root folder of an experiment.
An experiments folder manages the dataset and cross-validation splits associated with it. Evaluation and result calcualtion can be initiated for all models with the evaluate()
- Attributes:
experiments (
list
): A list of experiments found in the library folder.path (
str
): Path to the root directory.
Todo
- Implement
LibEstimator
base class that all models must inherit from. The base class ensures that the needed properties are implemented. - Offer optional arguments in
Experiment.add()
for the LibEstimator properties when a user doesn’t want to inherit from the base class!?
-
classmethod
New
(name, df, skf, features, lib_path, description='')¶ Factory method for creating a new Experiment instance given a name, dataset, cross-validation mask and a list of features. The path to the library in which the experiment is created must be given.
- Args:
- name (
str
): - A name for the experiment. Will be used together with the timestamp for storing the experiment.
- df (
pandas.DataFrame
): - The dataset as a Pandas DataFrame.
- skf (
numpy.ndarray
): - An array of indices, each being one cross-validation split.
- features (
list
): - A list of column names that are to be used as features during training.
- lib_path (
str
): - Path to the library in which the experiment is created.
- description (
str
): - A descriptive string of the dataset, experiment or changes to make finding stuff later easier.
- name (
-
add
(estimator)¶ Adds a model to the experiment.
- Args:
- estimator (
BaseEstimator
): - A name for the experiment. Will be used together with the timestamp for storing the experiment.
- estimator (
-
calc_results
(scoring_function, name, max_workers=1, verbose=1, te_split_idx=1)¶ Calculate result for all models in this experiment. Calls
calc_result()
of eachModel
.- Args:
- scoring_function (
function()
): - A python function that calculates results given a model, its
predictions and the true labels. See
scoring_function()
. - name (
str
): - A name for the result series. If a series with the given names exists, only missing results will be computed. Existing results are not deleted.
- max_workers (
int
): - The number of models for which to concurrently calculate the results.
- max_workers=1 is usually faster as the overhead of the ProcessPoolExecutor is too large. Could try ThreadPoolExecutor.
verbose (
int
): Level of print output. 0 is no output.te_split_idx (
int
): Index of split that the model is evaluated on.- scoring_function (
-
find
(boolean_func)¶ Iterator function, yielding all models matching
boolean_func()
.- Args:
- boolean_func (
boolean_func()
): - A function that takes a
Model
and returns a boolean indicating a match.
- boolean_func (
-
findone
(boolean_func)¶ Return the first model matching
boolean_func()
.- Args:
- boolean_func (
boolean_func()
): - A function that takes a
Model
and returns a boolean indicating a match.
- boolean_func (