.. _sb_bagging:
#########################
Sequentially Bootstrapped
#########################
In sampling section we have shown that sampling should be done by Sequential Bootstrapping.
``SequentiallyBootstrappedBaggingClassifier`` and ``SequentiallyBootstrappedBaggingRegressor`` classes extend `sklearn `_'s
``BaggingClassifier`` and ``BaggingRegressor`` classes by using Sequential Bootstrapping instead of random sampling.
In order to build indicator matrix we need Triple Barrier Events (``samples_info_sets``) and price bars used to label
training data set. That is why ``samples_info_sets`` and ``price_bars`` are input parameters for classifier/regressor.
To better understand the underlying method, you may be interested in reading the :ref:`Sampling/Sequential Bootstrapping `.
Implementation
==============
.. py:currentmodule:: mlfinpy.ensemble.sb_bagging
.. automodule:: mlfinpy.ensemble.sb_bagging
:members: SequentiallyBootstrappedBaggingClassifier, SequentiallyBootstrappedBaggingRegressor
Example
=======
An example of using ``SequentiallyBootstrappedBaggingClassifier``.
.. code-block:: python
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from mlfinlab.ensemble import SequentiallyBootstrappedBaggingClassifier
X = pd.read_csv('X_FILE_PATH', index_col=0, parse_dates = [0])
y = pd.read_csv('y_FILE_PATH', index_col=0, parse_dates = [0])
triple_barrier_events = pd.read_csv('BARRIER_FILE_PATH', index_col=0, parse_dates = [0, 2])
price_bars = pd.read_csv('PRICE_BARS_FILE_PATH', index_col=0, parse_dates = [0, 2])
triple_barrier_events = triple_barrier_events.loc[X.index, :] # take only train part
triple_barrier_events = triple_barrier_events[(triple_barrier_events.index >= X.index.min()) &
(triple_barrier_events.index <= X.index.max())]
base_est = RandomForestClassifier(n_estimators=1, criterion='entropy', bootstrap=False,
class_weight='balanced_subsample')
clf = SequentiallyBootstrappedBaggingClassifier(base_estimator=base_est,
samples_info_sets=triple_barrier_events.t1,
price_bars=price_bars, oob_score=True)
clf.fit(X, y)