Box-Cox Linear Transformation

Basic Approach

Given two samples x1, …, xn and y1, …, yn, we seek a transformation z1, …, zn of the y values so that the x and z values are as close to linear as possible; i.e. the points (x1, z1), …, (xn, zn) more or less form a straight line.

The approach we take is to find z1, …, zn using a Box-Cox transformation, where we choose λ so that the correlation between the x and z values is as close to 1 as possible. In particular, we use Excel’s Goal Seek to maximize the correlation coefficient.

Example

Example 1: Make a Box-Cox transformation z of the y data in Figure 1 so that the x and z data are as close to linear as possible.

Box-Cox scatter diagram

Figure 1 – Scatter plot

We see from the scatter diagram displayed on the right side of Figure 1 that the (xi, yi) values are far from linear.

We define the Box-Cox transformation z1, …, zn for any value of λ by

image296z

We implement this as shown in Figure 2.

Box-Cox Goal Seek

Figure 2 – Initialization of Goal Seek

Here, column P contains the transformed data, e.g. cell P5 contains the formula =IF(S$5<>0,(O5^S$5-1)/S$5,LN(O5)), cell S5 contains a guess for the λ value and cell S6 contains the formula =CORREL(N5:N14,P5:P14) to calculate the correlation coefficient between the y and z values.

Goal Seek

We now select Data > Data Tools|What-If Analysis and choose the Goal Seek tool. Next, we fill in the dialog box that appears with the values shown on the right side of Figure 2 and press the OK button. The result is that the value for λ converges to .002565.

Box-Cox linear transformation

Figure 3 – Goal Seek output

We see that this value of λ transforms the y values into z values which forms a near linear relationship with the x values; in fact, these are aligned with the straight line z = x.

It is common to round λ off to the nearest integer or reciprocal of an integer (especially -2, -1.5, -1, -.5, 0, .5, 1, 1.5, 2).

For Example 1, since λ is near 0, we can use the transformation z = ln y instead of

image297z

This is a consequence of the fact that

image298z

Alternative Approach

Another version of the Box-Cox transformation is given by

image299z

image300z

where ỹ is the geometric mean of the y + a values, i.e.

image301z

This has the advantage that the standard deviation of the z values is approximately equal to the standard deviation of the y values (for any value of λ).

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

Reference

Li, P. (2005) Box-Cox transformations: an overview
https://www.ime.usp.br/~abe/lista/pdfm9cJKUmFZp.pdf

 

2 thoughts on “Box-Cox Linear Transformation”

Leave a Comment