When studying machine learning, you’ll often see yourself training multiple models, testing different features or tweaking parameters. While a tab sheet might help you, it’s not perfect for keeping logs, results and code. Believe me, I tried. An alternative is Sacred. This package allows you to execute multiple experiments and record everything in an organized fashion.
I’ll go through the basics here, but it’s never a bad idea to go through the documentation.
Basics
It all begins with a script. A entry point to your experiment. Sacred requires
you to wrap said script with an simple annotation. For example, suppose there’s
a file named ex.py
:
from sacred import Experiment
ex = Experiment('1-check-dataset')
@ex.config
def my_config():
data_dir = '/datasets/imagenet/train/'
classes = None
batch_size = 32
@ex.automain
def main(data_dir, classes, batch_size):
from matplotlib import pyplot
from keras.applications.inception_v3 import InceptionV3, preprocess_input
from keras.preprocessing.image import ImageDataGenerator
g = ImageDataGenerator(preprocess_function=preprocess_input)
train = g.flow_from_directory(data_dir,
classes=classes,
batch_size=batch_size)
model = InceptionV3(weights=None)
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit_generator(train, epochs=1000, steps_per_epoch=100, ...)
This experiment can be easily executed with
python ex.py -F ./logs/
.
Sacred will take over and:
- The
my_config
function is executed and captures all compatible variables (i.e. str, int, float, dict, list) defined in it - Info on the execution, the config defined and the code itself are copied
to
./logs/1/{'run.json'|'config.json'|'../_sources/_run.py'}
. The output buffer, in turn, is constantly copied to./logs/1/cout.txt
. main
is called with the parameters captured frommy_config
!
You can also pass new parameters on-the-fly by executing it with
python ex.py with batch_size=256
, but it’s always a good idea
to confirm if everything is in order with python ex.py print_config
.
Reproducing an experiment becomes easy too: python ex.py with ./logs/config.json
Other persistence back-ends are also supported (e.g. MongoDB).
Hints
Importing inside the main script function.
Imports such as tensorflow
take a while to complete, and sometimes you are
just interested in finding how are the parameters with print_config
. For
this reason, I seldom import anything but sacred
in the module. Just do it
inside your experiment’s main function, like I did in the previous example.
Sacred and Progress Bars
I faced a problem of huge logs when combining Sacred with scripts that trained keras models, where hundreds of lines were only stages of the progress bars. This can be easily solved with this:
from sacred import Experiment, utils
ex = Experiment('a-nice-experiment')
ex.captured_out_filter = utils.apply_backspaces_and_linefeeds