Regression Models

The goal of regression is to describe the relationship between one or more independent variables and a dependent variable and to predict the value of the dependent variable based on the values of the independent variable based on observed data.

Topics

References

Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

37 thoughts on “Regression Models”

  1. Hi Dr. Charles,
    I’m new to statistical analysis yet I have a task of predicting spend for particular category (for eg. travel) for my org. I need to factor in COVID19 impact and forecast how my spend may behave in coming year.. where do I start? What regression method should i use? looks like there are multiple independent variables asCOVID19 is such a random event like “Act of GOD”.. can you please guide me?

    Regards,
    Abhishek

    Reply
  2. Hi Charles,

    I have two questions :
    1) I would like to add a variable (EPU index) to the Fama-French model in order to ‘enrich’ the model. I already checked for multicollinearity between the variables and there isn’t any, which if I understood correctly is good news. Now, does it make sense to just plug in the new variable to the equation and based on the regression results to determine if it makes the model a better one ?
    I see a lot of studies that test with the vector-autoregression model but haven’t found it on your website or on another where it explains in plain simple English what its for.

    2) I would like to use the returns of 100 portfolios as dependent variables in my regressions. Do you know if there is any way of automating the regression process instead of manually doing a regression for each portfolio ?

    Also, thanks a lot for you website, it’s pretty clear and cool !
    Best,
    Chris

    Reply
  3. Dr Charles,
    Your course in Regression can only be explained by this adage during Adi Shankara (8th century saint) called somebody as Hastamalaka meaning you have put a fruit on the palm. Your subject brings this clarity and beyond there is no doubt.

    Thanks a lot.

    Reply
  4. There was a little mistake in my previous message. I repeat the question. Sorry

    Dear Charles,

    It is a question about Lasso Regression. I ask it here because I do not see the option to ask in the Lasso regression explanations.
    First of all thanks very much for this tool (Lasso Regression) It is very difficult to find a so good explanation of this type of regression.
    I have different questions:

    1) I obtain the coefficients and with the trace graph I can select the variables. One of the variables has a little strange trace because it goes to 0 but after it takes a higher value. The coefficients for lambda values of 0, 0.0017, 0.017, 0.17, 1.7, 17, 170 of this variable are -0.361158979, -0.359956549, -0.349134681, -0.240915994, 0, 0.088087844, 0.088087844 respectively. Look that after 0 it takes the value 0.088087844. What is happening?

    2) Which values of lambda should I select?

    3) How can I obtain the r^2 value? In the ridge regression I use the tool but in lasso regression I do not know how to do.

    Thanks very much

    Reply
  5. Dear Charles,

    It is a question about Lasso Regression. I ask it here because I do not see the option to ask in the Lasso regression explanations.
    First of all thanks very much for this tool (Lasso Regression) It is very difficult to find a so good explanation of this type of regression.
    I have different questions:
    In the ridge regression it is easy to compute with the tool but in the lasso regression I do not know how to do…
    1) I obtain the coefficients and with the trace graph I can select the variables. One of the variables has a little strange trace because it goes to 0 but after it takes a higher value. The coefficients for lambda values of 0 0.0017 0.017 0.17 1.7 17 170 of this variable are -0.361158979, -0.359956549, -0.349134681, -0.240915994, 0, 0.088087844, 0.088087844 respectively. Look that after 0 it takes the value 0.088087844. What is happening?

    2) Which values of lambda should I select?

    3) How can I obtain the r^2 value? In the ridge regression I use the tool but in lasso regression I do not know how to do.

    Thanks very much

    Reply
  6. Dear Charles

    I tried to replicate the example of Poisson Regression that you use on program website, using the Data Analysis Tool. I got a new sheet plenty of #¿VALUE! messages.

    I would appreciate any tip on this.

    Best regardas

    Jorge

    Reply
    • Justine,
      The simple answer is given on the referenced webpage. You need to read more of the other webpages about regression to get a more complete understanding.
      Charles

      Reply
  7. Thank you Charles for the quick feedback.

    It should, but for one minor detail that I think is getting in the way.
    We cultured the parasitic organism in 100 different flasks and measured trait A in 10% of the flasks. This gave us a large range/variance. All 100 flasks were then fed to a host and trait B was then measured in 10% of the host. We did not keep track of which parasite was given to which host. Parasite exhibiting a magnitude of 89 of trait A could have gone to hosts with a magnitude of 20 or 200 of trait B.

    I could get the average magnitude of trait A and of trait B for one run, but the large variance/range makes me think this comparison may be unreliable. Is there a test that takes into account this large variance? I am looking at ANOVAs, but not sure if I am on the right path.

    Additionally, Out of 10 runs, the averages of trait A-trait B gave us multiple R = 0.52, R^2 = 0.27, with significance F =0.083.
    Trait A-Trait C: multiple R = 0.83, R^2 = 0.69, significance F = 0.0008
    Trait B-Trait C: multiple R = 0.82, R^2 = 0.68, significance F= 0.001

    I found it odd that A-C and B-C had moderate/strong correlation, but A-C looks horrible or/and failed the significance f.

    Anyways, please keep up the good work on this site! I will find my answers here somewhere =)

    Best,
    Andrew Liem

    Reply
  8. This is an amazing reference source for amateur statisticians to grab a foothold. I been browsing this site for the past week, learning as go, to organize and make sense of my data. I hope to hear back from you about some suggestions.

    I have a large data set to organize and visualize, but my statistical skills are quite lacking. I have been reading few introductory statistic textbooks and know that I am looking into regression models and correlation.
    This seems like a straightforward and easy plot to make, but there are few complications. We have a parasite that have a life-cycle spanning three disparate host vectors. We would like see whether there is any correlation between traits exhibited in each life cycle. For example, if the presence of in vector 1 can be used to quantitatively predict the presence of trait B in vector 2 and trait C in vector 3. We have been quantitatively measuring trait A, trait B, and trait C in their respective host and I am now trying to connect these dots together. Trait A, B, and C are not dichotomous variables (either occur or not occur), but as the term I believe is interval. We quantitatively measure how much of the trait is present.

    How each trait was measured in each host is however very different from the other two traits. We culture the parasite in vitro and take measurements from 10 flask for trait A. We then feed these parasites to a colony of bugs and take <10% of the bugs to dissect and measure trait B. We mix contents of the 10 flask together so we are not able to know which bugs ate from which flask. These bugs are then fed to another organism and we dissect these organisms to collect data on trait C. (All measurements taken are quantitatively).
    We have done this process 20x and I hope to show whether there is any correlative power between trait A, B, and C.
    Please note that these experiments wasn't intended to prove my hypotheses that there is a correlation. I joined the lab later and wanted to organize the dataset my team have. Unfortunately, none of us is a real statistician.
    Are there any specific kind of regression models commonly used for culturing organisms with multiple life cycles? Any thoughts would be appreciated

    Reply
    • Andrew,

      I don’t know of any special type of regression that is used for culturing organisms with multiple life cycles, but it seems that the usual regression techniques should work.

      You should be able to do a regression with dependent variable B and independent variable A, and see whether this is useful in predicting trait B from trait A. The fact that the types of measurements are different shouldn’t a priori matter.

      You can then try doing a regression with dependent variable C and independent variables A and B. If in the first regression the R-square value is close to 1, then you shouldn’t use both A and B in the second regression since will cause problems with colinearity.

      Charles

      Reply
  9. Here’s a subjective question: when aiming to forecast / predict continuous variables for business objectives (e.g. predicting the quantity of customer orders on a monthly basis) which statistical method do you suggest is most suitable? With the goal of maximizing prediction accuracy, what are your thoughts? I’ve exercised linear / multiple regression but feel that I can still do better on modeling the customer behavior. Maybe time series forecasting is more will yield more precise results?

    Reply
    • Ryan,
      There is no “one size fits all” answer to your question. This is why there are so many different methods (linear regression, logistic regression, etc., etc.).
      Charles

      Reply
        • Ryan,
          I don’t know of any such version of logistic regression, although you may be referring to ordinal logistic regression. The dependent variable is ordinal, usually with a limited number of values but with a clear order.
          Charles

          Reply

Leave a Comment