Deming Regression Basic Concepts

Basic Approach

In ordinary linear regression, the yi values are estimated from the xi values with error ɛi. In Deming regression, it is assumed that also the xi values are estimated with error, which we will denote δi.

We further assume that the ɛi and δi errors are independent of each other and both are distributed normally with a mean of zero. The variance of the ɛi values is denoted σ2 and the variance of the δi values is τ2.

The regression model now takes the form

Deming regression formula

where β0 and β1 are the intercept and slope coefficients and ŷi and \hat x_i are the estimates of the true values of yi and xi, respectively.

Variances

If the values of σ2 and τ2 are known then we use σ2 and τ2 as the variances of the ɛi and δi errors. If they are not known then we must have multiple estimates of the x and y values. Assuming that for each i we have ki estimates for the xi and mi estimates for the yi, then for each i, we can estimate xi as the mean of these ki estimates xij and yi as the mean of these mi estimates yij, i.e.

Means of estimates

where n is the sample size. Note that ki > 1 and mi > 1 , but ki can be different from mi.

We can also estimate the values of σ2 and τ2 by

Variance estimates Deming regression

Note if all the ki are equal (say to k) and all the mi are equal (say to m), then each estimate of the variance is equal to the average of the row variances, i.e.

Estimated variances alternative

Property

Property 1: If λ = s2/t2, then the estimates of the coefficients that minimize the Deming sum of squares

Sum of squares Deming

are given byEstimated Deming slope coefficient

Estimated Deming intercept coefficient

where  and ȳ are the means of the xi and yi values respectively, and

Formulas for u v r

Note: In some references, λ is defined as the reciprocal of the value shown in Property 1.

Definitions

Definition 1: The estimated true values for the xi and yi are then calculated by

Formulas for estimated x y

where the ei are the (raw) residuals

Raw residuals

Definition 2: In addition to the raw residuals defined above, there are the following additional types of residuals:

x residual

y residual

optimized residual

Example (known variances)

Example 1: Find the Deming regression equation for the data in columns A, B, and C of Figure 1. Here the variance of the measurements for the x values is known to be .05 and the variance for the y values is known to be .02.

Deming regression coefficients calculation

Figure 1 – Calculation of regression coefficients

Using the formulas described above we see that the regression formula is

y = -.1708 +1.018x

We can characterize the sample data and residuals as described in Figure 2 using the formulas in Definitions 1 and 2.

Deming regression residuals

Figure 2 – Residuals Report

For example, the formula in cell M6 is =F$12+K6*F$13 (referring to Figure 1), the formula in cell P6 is =L6-M6, the formula in cell N6 is =K6+$F$11*$F$13*P6/($F$11*$F$13^2+1), and the formula in cell O6 is =L6-P6/($F$11*$F$13^2+1). The formula shown in cell Q6 is =K6-N6, the formula in cell R6 is =L6-N6, and, finally, the formula in cell S6 is =SIGN(P6)*SQRT(Q6^2+$F$11*R6^2).

Note, further, that the means of these residuals (as shown in row 16) are all close to zero, as expected.

Testing Residuals for Normality

One of the assumptions for Deming regression is that the residuals are normally distributed. We test the optimized residuals (range P6:P15) for normality using a QQ plot and Shapiro-Wilk, as shown in Figure 3. Both tests confirm that the residuals are normality distributed.

Normality test Deming residuals

Figure 3 – Testing optimized residuals for normality

Example (unknown variances)

Example 2: Find the Deming regression equation for the data in Figure 4.

Deming regression measurement data

Figure 4 – Deming Regression Data

This time note that there are 3 measurements for each x value and 2 measurements for each y value. We, therefore need to calculate the x and y variances for each subject in order to calculate lambda. In order to carry out the Deming regression, we also need to take the mean of the x and y measurements for each subject. This is shown on the left side of Figure 5.

Deming regression analysis

Figure 5 – Deming regression

For example, cell H4 contains the formula =DEVSQ(B4:D4), cell I4 contains =DEVSQ(E4:F4), cell L4 contains =AVERAGE(B4:D4), and cell M4 contains the formula =AVERAGE(E4:F4).

Cells L15 and M15 contain the variances for x and y values, as calculated by the worksheet formulas =SUM(H4:H13)/(COUNT(B4:D13)-A13) and =SUM(I4:I13)/(COUNT(E4:F13)-A13). The value of lambda shown in cell P10 is calculated as 47.5/19.7 = 2.411168.

Using the data in columns K, L, and M we can calculate the regression coefficients exactly as we did in Example 1. The regression equation is y = -15.9117 + .772981x.

Worksheet Functions

Real Statistics Functions: For array or range R1 containing X values and R2 containing y values, we have two forms of the following two array functions. The first version of each corresponds to data as in Example 1 where lambda is known and the second where lambda is calculated from the data as in Example 2.

DRegCoeff(R1, R2, λ, lab) = 2 × 2 array consisting of the intercept and slope coefficients and standard errors for Deming regression on the data in R1 and R2 where lambda = λ.

DRegCoeff(R1, R2,, lab) = DRegCoeff(R3, R4, λ, lab) where R3 consists of the averages of the X data measurements in R1, R4 consists of the averages of the Y data measurements in R2 and the λ is calculated as in Example 2.

DRegResiduals(R1, R2, λ, lab) = n × 7 array consisting of pred y, x-hat, y-hat, raw residual, x-residual, y-residual and optimized residual for each pair of data elements in R1 and R2 based on the Deming regression on the data in R1 and R2 where lambda = λ and n = the number of elements in R1 (or R2). It is assumed that R1 and R2 are column arrays.

DRegResiduals(R1, R2,, lab) = DRegResiduals(R3, R4, λ, lab) where R3 consists of the averages of the X data measurements in R1, R4 consists of the averages of the Y data measurements in R2 and the λ is calculated as in Example 2.

If lab = TRUE (default FALSE), then an extra column is appended to the output from DRegCoeff containing the labels “intercept” and “slope”. Similarly, If lab = TRUE (default FALSE), then an extra row is appended to the output from DRegResiduals with the labels shown in range M5:S5 of Figure 2.

Non-array function

In addition, we have the following non-array function:

DRegLambda(R1, R2) = the lambda value calculated from R1 and R2 as described in Example 2.

Note that for Example 1, the array formula =DRegCoeff(B4:B13, C4:C13, 2.5) produces the coefficients shown in F12:F13 of Figure 1. For Example 2, =DRegCoeff(B4:D13, E4:F13) produces the coefficients shown in P11:P12 of Figure 5. Also for Example 2, =DRegLambda(B4:D13, E4:F13) produces the result shown in cell P10 of Figure 5.

For Example 1, =DRegResiduals(B4:B13, C4:C13, 2.5, TRUE) produces the output shown in range M5:S15 of Figure 2.

Conclusion

We have now shown how to calculate the regression coefficients in the case where the measurement variances are known (Example 1) and when they need to be estimated from the data (Example 2). See Jackknifing for how to calculate the standard error of these coefficients by using a technique called jackknifing.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

NCSS (2016) Deming regression
https://www.ncss.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Deming_Regression.pdf

Tellinghuisen, J. (2020) Least squares methods for treating problems with uncertainty in x and y. Anal. Chem. 2020, 92, 16, 10863–10871
https://pubmed.ncbi.nlm.nih.gov/32678579/

Mandel, J. (2018) Fitting Straight Lines When Both Variables are Subject to Error
https://www.tandfonline.com/doi/abs/10.1080/00224065.1984.11978881

43 thoughts on “Deming Regression Basic Concepts”

  1. Dear Charles, I hope you and your family are ok. Please i am doing Demkin Revgression, but the resoult are:

    Deming Regression
    alpha 0,05
    coeff std err df t stat p-value lower upper
    #¡VALOR! #¡VALOR! #¡VALOR! 9 #¡VALOR! #¡VALOR! #¡VALOR! #¡VALOR!
    #¡VALOR! #¡VALOR! #¡VALOR! 9 #¡VALOR! #¡VALOR! #¡VALOR! #¡VALOR!

    Hypothesis Testing
    alpha 0,025
    test param std err df t stat p-value lower upper
    slope = 1 #¡VALOR! #¡VALOR! 9 #¡VALOR! #¡VALOR! #¡VALOR! #¡VALOR!
    identity #¡VALOR! #¡VALOR! 9 #¡VALOR! #¡VALOR! #¡VALOR! #¡VALOR!

    coeff std err df t stat p-value lower upper
    #¡VALOR! #¡VALOR! #¡VALOR! 9 #¡VALOR! #¡VALOR! #¡VALOR! #¡VALOR!
    #¡VALOR! #¡VALOR! #¡VALOR! 9 #¡VALOR! #¡VALOR! #¡VALOR! #¡VALOR!
    Please help me.
    Thanks

    Reply
  2. I’m trying to understand how the variance is determined for the x and y variables in figure 5. I can’t seem to duplicate the values that are generated in the example.

    Reply
  3. Dr. Zaiontz,
    thanks for the tutorial and detailed illustration. I am wondering how b0 and b1 can be calculated in Fig 1 without doing minimization (such as using the Solver in excel)?

    Reply
    • Hi Tony,
      Probably so. I have used Solver for many types of regression, as illustrated elsewhere on this website, but haven’t tried it for Deming Regression.
      Charles

      Reply
      • Dr. Zaiontz,
        sorry for not being clear in my question, my question was why the fig 1 calculation does not appear to require a minimization step? I don’t mean it has to be Solver. My question was why there was no minimization step in Fig 1 when b0 and b1 are calculated. shouldn’t a regression always look for a pair of b0 and b1 that minimizes the summed deviations?
        thanks

        Reply
  4. Dear Sirs
    There is one more small error in line below this

    “We can also estimate the values of σ2 and τ2 by” line 25 approximately

    you are replacing σ2 by s2 and τ2 by t2, i have also sent a mail for the same.

    rgds
    abrar

    Reply
      • ‘We can also estimate the values of σ2 and τ2 by’
        In this line you define σ2 and τ2

        below line above you write as s-square s2 intead of σ2, it is there just concentrate a bit.

        rgds
        abrar

        Reply
        • Sorry, but I don’t understand your comment. sigma-squared is the population parameter, while s-square is the sample statistic used as an estimate of sigma-squared.
          Charles

          Reply
  5. Dear Sirs Thanks for such a real explanation for Demming Regression, however there seem to be some typos as in these two lines ……

    Note, further, that the mean of these residuals are all close to zero (see row 30), as expected.
    test the optimized residuals (range P20:P29) for normality using a QQ plot and Shapiro-Wilk, as

    where is row 30
    where is range P20:P29

    Kindly clarify

    Reply
    • Thank you for bringing these typos to my attention.
      I have just corrected the webpage
      row 30 => row 16
      range P20:P29 => P4:P15
      I appreciate your help in improving the quality of the Real Statistics website.
      Charles

      Reply
  6. Dear Charles,

    I was struggling to understand how deming regression works, but when I found this page it took about an hour.

    Thank you very much for you wonderful webpage!

    -Elias

    Reply
  7. Dear Charles, good morning!

    Have you ever thought about turning your website into a book? I would buy! I learned more about some Statistics topics from you than from other teachers! You are extremely didactic!

    A big hug from Brazil, Igor

    Reply
    • Igor,
      Thank you for your very kind words and for your suggestion.
      I originally wrote the website as a book and planned to publish it, but ultimately I decided against it since I was constantly updating it and adding content, something I couldn’t do with a book.
      Charles

      Reply
  8. Charles,
    Thank you very much for your webpage and your package, I found it very useful!
    I have one question, just to be sure. When I use the Deming regression tool in Excel, lambda is defined as the quotient of s2/t2 as in this page?
    I ask because in Saylor et al, 2006, they define the quotient as t2/s2.
    My results using your package seem more compatible with your definition, but I need to make sure. Which reference do you suggest to check the formulas (in addition of the obscure book by Deming itself)?
    Cheers,
    Julian

    Saylor, R., Edgerton, E., & Hartsell, B. (2006). Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation. Atmospheric Environment, 40(39), 7546–7556. https://doi.org/10.1016/j.atmosenv.2006.07.018

    Reply
  9. Dear Charles,
    For comparison’s reason, I’m looking for some kind of “coefficient of determination” to serve as an analogy for the least squares R2.

    Which formula can be used to calculate R2 for the Deming regression?
    Thank you in advance!

    Kind regards,
    Andreas

    Reply
    • Mehmet,
      It is not stated where these values come from. You can consider them to be assumptions. In reality, they probably came from previous testing.
      Charles

      Reply
  10. Please correct: The formula shown in cell Q6 is =K20-N6, the formula in cell R6 is =L20-N6

    should be:
    The formula shown in cell Q6 is =K6-N6, the formula in cell R6 is =L6-O6

    Suggestion: Why not shift fig.2 two lines up, so that it would align with fig.1 (i.e., “Subect” would b in line 3 in both figures)?

    Reply
    • Hello JMS,
      Thanks for identifying these two typos. I have now made the corrections that you suggested.
      I don’t quite understand what shifting Fig 2 up to lines does.
      Charles

      Reply
      • “I don’t quite understand what shifting Fig 2 up to lines does.”

        Nothing! Just keeps corresponding things in the same line of the sheet. Ex.: Subject1 is in line 4 of fig.1 and in line 6 of fig, 2. If you put the information os bothe figures side by side in the same sheet, the alignment of both sections will make it more readable.

        Reply
  11. Should the λ be τ2 in the fourth paragraph?

    “If the values of σ2 and λ are known then we use σ2 and τ2 as the variances of ɛi and δi errors. If they are not known then we must have multiple estimates of the x and y values. Assuming that for each i we have ki estimates for the xi and mi estimates for the yi, then for each i, we can estimate xi as the mean of these ki estimates xij and yi as the mean of these mi estimates yij, i.e.”

    Reply
  12. Hello Dr. Zaiontz,

    I am trying to perform regression analysis for my data, where the x values have a known error that is not constant across the whole range. Can Deming regression still be applied in this case, or if not what analysis is more appropriate?
    Thank you.

    Reply
    • Sabrina,
      Deming regression requires homogeneity of variances for each of the two types of errors.
      I don’t know how to modify Deming regression when this assumption is not met.
      Charles

      Reply

Leave a Comment