Change Point Test

Basic Concepts

For a time series x1, …, xn, it is possible that at some point prior to time n, there is some change that shifts the median. This change may be a policy change, administration of some treatment, traumatic event, etc. In the one-tailed test you predict the direction of change, while in the two-tailed test, any change (up or down) is sufficient.

Test Description

If the time series takes numeric values from a continuous variable, then define ri to be the rank of xi and define

w_i and u_i

Now define m to be the value of i < n where ui is at its maximum. This is the potential change point. If more than one such i exists, select the first one. Define w to be wm.

We now need to test whether the value of ui is larger than we would expect by chance.

For n sufficiently large (when m > 10 or n–m > 10), under the null hypothesis that there is no change in direction, we can assume that w is normally distributed with the following mean and variance

mean and variance

Employing a continuity correction factor of .5, we can use the test statistic zN(0,1) where

z statistic

and

continuity correction h

In the case of ties, the variance needs to be adjusted as for the one-sample signed-ranks test.

Example

Example 1: For the time series shown in column B of Figure 1, determine whether there is a point-change, and if so when did it occur.

Change Point Test example

Figure 1 – Point-Change Test

The various Excel calculations are shown in Figure 1. E.g. cell C3 contains =RANK.AVG(B3,B$2:B$25,1), cell D3 contains =D2+C3, and cell E3 contains =ABS(2*D3-A3*(A$25+1)).

We only use one Real Statistic function, namely INDEX_MAX (in cell H4), although we can obtain this value by just observing where 80 occurs in column E. Alternatively, we can use the Excel formula =MATCH(E2:E24,MAX(E2:E24),0).

From cell H4, we see that a potential point change occurs at time = 6. Since n – m = 24-6 = 18 > 10, we use the normal approximation to determine whether this point change is significant.

We see that p-value = .008455 (cell H10), which is a highly significant result. We conclude that there is a point change and that it occurs at time = 6.

Worksheet Function

Real Statistics Function: The Real Statistics Resource Pack provides the following function for the time series data in R1.

CHANGEPT_TEST(R1, lab): returns a column array with the values change point, w-stat, z-stat, and p-value

R1 must be a column array or range with no missing data. If lab = TRUE (default FALSE) a column of labels is appended to the output.

We can use this function to get the results for Example 1, as shown in range G13:H16 of Figure 1.

Test with Binary Data

Click here for information about the change point test with binary data.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Siegel, S., Castellan, N. J. (1988) Nonparametric statistics for the behavioral sciences, 2nd ed.
https://psycnet.apa.org/record/1988-97307-000

Bruns, D. (2024) Position of max value in list
https://exceljet.net/formulas/position-of-max-value-in-list

6 thoughts on “Change Point Test”

  1. Hello, Mr. Charles

    A couple questions please

    1. cell C3 contains =RANK.AVG(B3,B$2:B$25,1), cell D3 contains =D2+C3, and cell E3 contains =ABS(2*D3-A3*(A$25+1)). — How did you populate D2 and C2?
    2. How do I find out if there are multiple changes in the data?

    Reply
    • Hello Tade,
      1. C2 contains =RANK.AVG(B2,B$2:B$25,1) and D2 contains =C2
      2. I don’t know how to test for multiple changes in the data. One approach might be to split the time series into two segments and see whether there is a change point in both segments. You probably need to be clever about how to do this splitting.
      Charles

      Reply
  2. Hi Charles,
    it seems that CHANGEPT_TEST(R1, lab) doesn’t work with the last Resource Pack
    Excel error #NAME?
    Other functions, like INDEX_MAX(), are ok

    Reply
    • sorry Mr Charles, i have some other questions:
      1) you wrote in cells h =IF(H10>H11,-0.5,-5) but i ve reade above that maybe if w<mean +0.5 is applied. maybe i misunderstood?
      2) why do you use =INDEX_MAX(E7:E29) instead of Excel native MATCH(E7:E29,'max u';0)?
      3) what the formula for p would be if m<10 or n-m<10?
      thank you

      Reply
      • Hello Eugenio,
        1) Yes, this formula should be =IF(H10>H11,-0.5,.5). Thank you for catching this error.
        2) You can use =MATCH(H3,E2:E24,0)
        3) I don’t have a formula in this case. Usually, you would use a table of values, but I don’t know of such a table.
        Charles

        Reply

Leave a Comment