Iterative Proportional Fitting Procedure (IPFP)

Two-way contingency tables

We now define an algorithm that lets us compare two-way tables which have different row and/or column totals. We explain the algorithm using the following example.

Example 1: Revise the table in range A5:D7 of the following figure so that the sum of each of the rows is the indicated row total and the sum of each of the columns is the indicated column total.

Initializing two-way IPFP

Figure 1 – Start of iterative proportional fitting procedure

In Figure 1 the range B5:E7 contains the initial table values. The range F5:F7 contains the target row totals and the range B8:E8 contains the target column totals.

In order for the procedure to work the sum of the target row totals must equal the sum of the target column totals. This is indeed the case for Example 1. The grand total (cell F8) contains the value 60, which is the sum of the target row totals as well as the target column totals; i.e. SUM(F5:F7) = SUM(B8:E8) = 60. We also assume that all table entries, as well as target row/column totals, are positive.

Procedure explained

We begin the procedure (step 0) by putting the initial table values in the range H5:K7, but this time we include the real row and column totals (this is called the Seed in Figure 1).

Step 1 of the procedure consists of two parts. In the first part, the values in the table are adjusted proportionally so that the row totals equal the targeted values (Row Adjustment). E.g. the formula in cell B12 is =H5*F5/L5.

In the second part, the values in the table are adjusted proportionally so that the column totals equal the targeted values (Column Adjustment). E.g. the formula in cell H12 is =B12*B8/B15.

This same procedure is now repeated again and again until the sum and column totals are sufficiently close to the target values. For Example 1 this is achieved in 8 steps, as can be seen in Figure 2. In fact, after 8 steps the row and column totals are exactly equal to the target values and so any further iterations will produce exactly the same table values.

2-way IPFP iterations

Figure 2 – Iterative proportional fitting procedure for Example 1

The final result of the procedure is found in the lower right-hand corner of Figure 2.

Independence Testing

Example 2: Use the iterative proportional fitting procedure to find the expected values for the data in Example 1 of Independence Testing assuming that a person’s level of schooling is independent of their parents’ wealth.

It turns out that if the row and column totals of the observed data are used as the targets and 1’s are used as the seed, the IPF procedure converges in 1 step to the expected values. This is depicted in the following figure.

IPFP2 for independence testing

Figure 3 – Iterative proportional fitting procedure for Example 2

These results agree with those found in Figure 1 of Independence Testing.

Three-way contingency tables

We now turn our attention to three-way tables. The procedure is similar. The main challenge is how to represent a three-dimensional table in two-dimensional space. We will use the representation shown in Figure 4.

Three-way contingency table

Figure 4 – Three-way contingency table

Figure 4 contains a 3  × 2 × 4 table; i.e. 3 rows, 2 columns and 4 layers (we use the term layer for the third dimension). The area in blue (range C5:F10) contains the data for the table.

The areas in grey contain the marginal totals for any two dimensions: G5:G10 contains the totals for Rows × Columns, C13:F15 contains the totals for Rows × Layers and C17:F18 contains the totals for Columns × Layers. Thus, for example, cell E14 contains the total for the 2nd row and 3rd layer, and so has the value 10. It can be calculated in Excel using the formula SUMIF(A5:A10,A14,E5:E10).

The areas in pink contain the marginal totals for any single dimension: C11:F11 contains the totals for the layers, G13:G15 contains the totals for the rows and G17:G18 contains the totals for the columns. E.g. F11 contains the totals for the 4th layer and can be calculated via the formula SUM(F5:F10). Similarly, G14 contains the totals for the 2nd row and can be calculated via the formula SUM(C14:F14).

Finally, cell G11 (shaded in green) contains the grand total, i.e. the sum of all the values in the table. This can be specified in Excel using any one of the following formulas: SUM(C5:F10), SUM(G5:G10) or SUM(C11:F11).

Observations

With two-way tables, the only restriction on the target row and column totals is that the sum of the target row totals must equal the sum of the target column totals. The situation is more complicated with three-way totals. For these reasons, we will only consider targets that already come from a three-way table. This is not a real limitation since this is the only case we care about anyway. Once again we will demonstrate the key concepts by means of an example.

Example

Example 3: Revise the 3-way table in Figure 4 (repeated on the left side of Figure 5) so that its row, column and layer totals are as described in the 3-way table on the right side of Figure 5.

IPFP marginal totals Excel

Figure 5 – Specification of target marginal totals

Essentially we are attempting to insert the data from the table on the left into the table on the right while retaining all the totals for the table on the right. We now show the IPF procedure for accomplishing this (see Figure 6).

IPFP3 initial steps

Figure 6 – Steps 0 and 1 of IPF procedure for Example 3

Step 1 of the procedure consists of three parts. In the first part, the values in the table are adjusted proportionally so that the row × column totals meet the targeted values (Row/Column Adjustment). E.g. the formula in cell F25 is =V7*G7/W7.

In the second part the values in the table are adjusted proportionally so that the row × layer totals meet the targeted values (Row/Layer Adjustment). E.g. the formula in cell L28 is =D28*D16/D34.

In the third part the values in the table are adjusted proportionally so that the column × layer totals meet the targeted values (Column/Layer Adjustment). E.g. the formula in cell U29 is =M29*E19/M37.

This same procedure is now repeated again and again until all the marginal totals are sufficiently close to the target values. After 19 iterations we arrive at the solution shown in Figure 7.

3-way IPFP solution

Figure 7 – Solution to Example 3

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack supplies the following array functions for implementing the IPF procedures.

IPFP2(R1): outputs the results of the IPFP algorithm for two-way contingency tables. R1 contains the input data and the target row and column totals, formatted as in range A5:E8 of Figure 1. If R1 is an m+1 × n+1 range then the output is an m × n range.

IPFP3(R1, R2): outputs the results of the IPFP algorithm for three-way contingency tables. R1 contains the input data, formatted as in range A5:F10 of Figure 5. R2 contains the targets, formatted as in range K5:N10 of that figure. The format of the output is the same as that of range R1.

For Example 1, the output from the formula =IPFP2(B5:F8) is shown in range H5:K7 of Figure 8.

IPFP2 function Excel

Figure 8 – IPFP2 function

For Example 3, the output from the formula =IPFP3(A5:F10, K5:N10) is shown in range R5:W10 of Figure 9. None of the totals are included in the input or output ranges.

IPFP3 function Excel

Figure 9 – IPFP3 function

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Wikipedia (2012) Iterative proportional fitting
https://en.wikipedia.org/wiki/Iterative_proportional_fitting

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Hunsinger, E. (2008) Iterative proportional fitting for a three-dimensional table
https://u.demog.berkeley.edu/~eddieh/IPFDescription/AKDOLWDIPFTHREED.pdf

19 thoughts on “Iterative Proportional Fitting Procedure (IPFP)”

  1. For a 2 by 2 contigency table,how do you find a minimal sufficient statistics using iterative proportional fitting and using the estimates to fit a log linear models?

    Reply
  2. Hi Charles, thanks a lot for sharing this. What I do not understand well is, should I passing the output, e.g. in T5 = 7.19 to all those households/ ids , as a wight ?

    Reply
    • George,
      I haven’t implemented this on the Real Statistics website or software as of yet. Keep in mind that the situation is already pretty complicated for 3 dimensions. It might be difficult to interpret results beyond 3 dimensions.
      Charles

      Reply
  3. Hi,

    Is there any way to get rid of decimals? I would like to get an output with integers and still have totals to match.

    Thank you.

    Reply
  4. Hi,

    I’ve just downloaded Real Statistics Resource Pack, but, it doesn’t include Iterative Proportional Fitting Procedure. I thought it would be a part of it.

    Viktoria

    Reply
    • Sorry Viktoria, but the software doesn’t include the IPFP yet. You can download the Examples Workbook file to get an Excel version of the example on the referenced webpage.
      Charles

      Reply

Leave a Comment