So the task is about detection not prediction?

Just noticed this competition.

As I understand it, we need to detect if a specified point within a given time series is a break point. The problem comes down to compare the characteristics of the two sections of the time series before and after the specified point. Guess I understand it correctly?

I learned about the few example applications of this. But this is all about delayed knowledge (certain number of points after the specified are required). While this may sound OK for some applications (climate change for instance), for others (especially finance, in the sense of regime change for instance) this is old news that may carry little value. It would be great to predict the break point using no or very small number of the points after.

I’m quite curious hearing any more thoughts from the organizers.

You understood correctly.

Once the period switch from 0 to 1, that is the point where the structural break happened or not.

enzo, I think that what leoxj is asking is whether the prediction should be entirely previsible, that is, using information only up to the break point, and no information afterwards. This is what you should clarify with the folks in Abu Dhabi. For example it could be just up to the break point, or a little bit afterwards is OK, or any amount afterwards is OK, but this depends on the sponsors preference, and should be clear in their specification.

Here is the answer from the ADIA Lab team:


@leoxj

Q: “As I understand it, we need to detect if a specified point within a given time series is a break point. The problem comes down to compare the characteristics of the two sections of the time series before and after the specified point. Guess I understand it correctly?”

A: Yes.


Comment: “I learned about the few example applications of this. But this is all about delayed knowledge (certain number of points after the specified are required). While this may sound OK for some applications (climate change for instance), for others (especially finance, in the sense of regime change for instance) this is old news that may carry little value. It would be great to predict the break point using no or very small number of the points after.”

A: It is true that reducing the number of time points in the “second section” makes the problem more difficult. But “no” time points would make the problem impossible to address. In our experience, in all cases with at least some time points after the break point, it is an interesting and challenging problem with wide applications in many fields.


@selected-lars

Comment 1: “enzo, I think that what leoxj is asking is whether the prediction should be entirely previsible, that is, using information only up to the break point, and no information afterwards.”

A1: I don’t think this is what lejox meant. However: to decide/predict whether at the designated time point a structural break occurred or not, you need to compare what happens before that time point with what happens after. That is why you need the time series values after the break point and not only “up to the breakpoint”.


Comment 2: “This is what you should clarify with the folks in Abu Dhabi. For example it could be just up to the break point, or a little bit afterwards is OK, or any amount afterwards is OK, but this depends on the sponsors preference, and should be clear in their specification.”

A2: This sentence is unclear to me, could you please reformulate?

Given a time series Y1,Y2,….Yn. Suppose there is a breakpoint at Ym, m <= n. The question is whether your breakpoint detection is on a window with W1 points before Ym and W2 points afterwards, so on Y(m-W1),Y(m-W1+1)…Ym,Y(m+1),Y(m+2)…Y(m+W2) where W1 and W2 are fixed or is it on the entire available time series and you are looking for breakpoints in the past or similar. I was suggesting with “previsible” that for a trading application you would only want to look at Y0…Y(m-1) or maybe you are happy to wait a little bit before calling a breakpoint and then react which would be like Y0…Y(m+10) or so. If you are not doing trading, and are just studying the past time series for some reason, then my question is not relevant.

The ADIA Lab team is a little bit confused by your wording, however here are their answer:

Q: “Given a time series Y1,Y2,….Yn. Suppose there is a breakpoint at Ym, m <= n. The question is whether your breakpoint detection is on a window with W1 points before Ym and W2 points afterwards, so on Y(m-W1),Y(m-W1+1)…Ym,Y(m+1),Y(m+2)…Y(m+W2) where W1 and W2 are fixed or is it on the entire available time series and you are looking for breakpoints in the past or similar.”

A: The task is not “breakpoint detection”. The (potential) break point is given as input and the task is to decide whether what happens after that time point is different enough from what happened before that time point and, if so, declare a structural break. There are no windows W1 or W2 because all available data can be considered. For a definition of structural break in this competition, please refer to: ADIA Lab Structural Break Challenge | CrunchDAO Docs V3

So to put it simply you have a time series Y1,Y2,…Ym,Y(m+1),…,Yn and given m, n. Suppose (Y1,…Ym) most closely fits statistical distribution Dm(Pm) where Dm is a distribution (like Normal, Lognormal, Gaussian, Poisson, etc.), Pm are the parameters of Dm, and (Y(m+1),…Yn) fits a distribution Dn(Pn), where Dn is a distribution which could be different or the same as Dm and Pn are the parameters of Dn, and the question is whether Dm != Dn or Dm == Dn and Pn is materially different from Pm. Something like that?

A: “Yes, that could be one possible angle, even though it does not take into account the time series structure as well as many other components.”

So the input is a synthetic time series, in which are encoded “structural breaks”. We are given training data in which the breaks are labelled. The task, using that labelled data, is to train a function which inverts the unknown synthetic data generation function to infer the “put break here” input parameter consistent with other unknown input parameters which describe “time series structure as well as many other components””.