Data Filtering¶
Filters are used to filter events based on some kind of trigger. For example a structural break filter can be used to filter events where a structural break occurs. In Triple-Barrier labeling, this event is then used to measure the return from the event to some event horizon, say a day.
The core idea is that labeling every trading day is a fools errand, researchers should instead focus on forecasting specific market anomalies or how the market moves after an event.
Tip
If you focus on forecasting the direction of the next days move using daily OHLC data, for each and every day, then you have an ultra high likelihood of failure.
You need to put a lot of attention on what features will be informative. Which features contain relevant information to help the model in forecasting the target variable.
We have never seen the use of price data (alone) with technical indicators, work in forecasting the next days direction.
CUSUM Filter¶
The CUSUM filter is a quality-control method, designed to detect a shift in the mean value of a measured quantity away from a target value. The filter is set up to identify a sequence of upside or downside divergences from any reset level zero. We sample a bar t if and only if \(S_{t} \geq \text{threshold}\), at which point \(S_t\) is reset to 0.
One practical aspect that makes CUSUM filters appealing is that multiple events are not triggered by raw_time_series hovering around a threshold
level, which is a flaw suffered by popular market signals such as Bollinger Bands. It will require a full run of length threshold for
raw_time_series to trigger an event.
Once we have obtained this subset of event-driven bars, we will let the ML algorithm determine whether the occurrence of such events constitutes actionable intelligence. Below is an implementation of the Symmetric CUSUM filter.
Implementation¶
- mlfinpy.filters.filters.cusum_filter(raw_time_series: Series, threshold: float, time_stamps: bool = True) DatetimeIndex | list[source]¶
The Symmetric Dynamic/Fixed CUSUM Filter.
The CUSUM filter is a quality-control method, designed to detect a shift in the mean value of a measured quantity away from a target value. The filter is set up to identify a sequence of pside or downside divergences from any reset level zero. We sample a bar t if and only if S_t >= threshold, at which point S_t is reset to 0.
One practical aspect that makes CUSUM filters appealing is that multiple events are not triggered by raw_time_series hovering around a threshold level, which is a flaw suffered by popular market signals such as Bollinger Bands. It will require a full run of length threshold for raw_time_series to trigger an event.
Once we have obtained this subset of event-driven bars, we will let the ML algorithm determine whether the occurrence of such events constitutes actionable intelligence. Below is an implementation of the Symmetric CUSUM filter.
Note: As per the book this filter is applied to closing prices but we extended it to also work on other time series such as volatility.
Parameters¶
- raw_time_seriespd.Series
Close prices (or other time series, e.g. volatility).
- thresholdfloat
When the abs(change) is larger than the threshold, the function captures.
- time_stampsbool
Default is to return a DateTimeIndex, change to false to have it return a list.
Returns¶
- t_eventspd.DatetimeIndex or list
Vector of datetimes when the events occurred. This is used later to sample.
Notes¶
Reference: Advances in Financial Machine Learning, Snippet 2.4, page 39.
Example¶
An example showing how the CUSUM filter can be used to downsample a time series of close prices can be seen below:
from mlfinpy.filters import cusum_filter
cusum_events = cusum_filter(data['close'], threshold=0.05)
Z-Score Filter¶
The Z-Score filter is used to define explosive/peak points in time series.
It uses rolling simple moving average, rolling simple moving standard deviation, and z_score(threshold). When the current time series value exceeds (rolling average + z_score * rolling std) an event is triggered.
Implementation¶
- mlfinpy.filters.filters.z_score_filter(raw_time_series: Series, mean_window: int, std_window: int, z_score: float = 3, time_stamps: bool = True) DatetimeIndex | list[source]¶
Filter which implements z_score filter.
Parameters¶
- raw_time_seriespd.Series
Close prices (or other time series, e.g. volatility).
- mean_windowint
Rolling mean window
- std_windowint
Rolling std window
- z_scorefloat
Number of standard deviations to trigger the event
- time_stampsbool
Default is to return a DateTimeIndex, change to false to have it return a list.
Returns¶
- t_eventspd.DatetimeIndex or list
Vector of datetimes when the events occurred. This is used later to sample.
Notes¶
Reference: Implement the idea of z-score filter here at [StackOverflow Question] (https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data).
Example¶
An example of how the Z-score filter can be used to downsample a time series:
from mlfinpy.filters import z_score_filter
z_score_events = z_score_filter(data['close'], mean_window=100, std_window=100, z_score=3)