.. _sb_bagging: ######################### Sequentially Bootstrapped ######################### In sampling section we have shown that sampling should be done by Sequential Bootstrapping. ``SequentiallyBootstrappedBaggingClassifier`` and ``SequentiallyBootstrappedBaggingRegressor`` classes extend `sklearn `_'s ``BaggingClassifier`` and ``BaggingRegressor`` classes by using Sequential Bootstrapping instead of random sampling. In order to build indicator matrix we need Triple Barrier Events (``samples_info_sets``) and price bars used to label training data set. That is why ``samples_info_sets`` and ``price_bars`` are input parameters for classifier/regressor. To better understand the underlying method, you may be interested in reading the :ref:`Sampling/Sequential Bootstrapping `. Implementation ============== .. py:currentmodule:: mlfinpy.ensemble.sb_bagging .. automodule:: mlfinpy.ensemble.sb_bagging :members: SequentiallyBootstrappedBaggingClassifier, SequentiallyBootstrappedBaggingRegressor Example ======= An example of using ``SequentiallyBootstrappedBaggingClassifier``. .. code-block:: python import pandas as pd from sklearn.ensemble import RandomForestClassifier from mlfinlab.ensemble import SequentiallyBootstrappedBaggingClassifier X = pd.read_csv('X_FILE_PATH', index_col=0, parse_dates = [0]) y = pd.read_csv('y_FILE_PATH', index_col=0, parse_dates = [0]) triple_barrier_events = pd.read_csv('BARRIER_FILE_PATH', index_col=0, parse_dates = [0, 2]) price_bars = pd.read_csv('PRICE_BARS_FILE_PATH', index_col=0, parse_dates = [0, 2]) triple_barrier_events = triple_barrier_events.loc[X.index, :] # take only train part triple_barrier_events = triple_barrier_events[(triple_barrier_events.index >= X.index.min()) & (triple_barrier_events.index <= X.index.max())] base_est = RandomForestClassifier(n_estimators=1, criterion='entropy', bootstrap=False, class_weight='balanced_subsample') clf = SequentiallyBootstrappedBaggingClassifier(base_estimator=base_est, samples_info_sets=triple_barrier_events.t1, price_bars=price_bars, oob_score=True) clf.fit(X, y)