The goal of regression is to describe the relationship between one or more independent variables and a dependent variable and to predict the value of the dependent variable based on the values of the independent variable based on observed data.
Topics
- Linear Regression
- Multiple Regression
- Logistic Regression
- Multinomial Regression
- Ordinal Regression
- Log-linear Regression
References
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Hi Charles,
Will you add PLS in the add-in?
Thanks,
Ryan
Ryan,
Yes, I will likely add Partial Least Squares Regression, but I don’t have a specific date yet.
Charles
Hi Dr. Charles,
I’m new to statistical analysis yet I have a task of predicting spend for particular category (for eg. travel) for my org. I need to factor in COVID19 impact and forecast how my spend may behave in coming year.. where do I start? What regression method should i use? looks like there are multiple independent variables asCOVID19 is such a random event like “Act of GOD”.. can you please guide me?
Regards,
Abhishek
I would start with multiple linear regression.
https://real-statistics.com/multiple-regression/
Charles
Hi Charles,
I have two questions :
1) I would like to add a variable (EPU index) to the Fama-French model in order to ‘enrich’ the model. I already checked for multicollinearity between the variables and there isn’t any, which if I understood correctly is good news. Now, does it make sense to just plug in the new variable to the equation and based on the regression results to determine if it makes the model a better one ?
I see a lot of studies that test with the vector-autoregression model but haven’t found it on your website or on another where it explains in plain simple English what its for.
2) I would like to use the returns of 100 portfolios as dependent variables in my regressions. Do you know if there is any way of automating the regression process instead of manually doing a regression for each portfolio ?
Also, thanks a lot for you website, it’s pretty clear and cool !
Best,
Chris
Hello Chris,
1) Yes, you can add the extra variable. See
https://real-statistics.com/multiple-regression/testing-significance-extra-variables-regression-model/
I have not added vector-autoregression to Real Statistics yet.
2) You can use the Real Statistics regression functions. See
https://real-statistics.com/real-statistics-environment/real-statistics-regression-anova-functions/
Charles
Hi Charles,
Thanks for your swift response.
Not sure I understand which function I should use though. My issue is that I have 100 Y dependent variables, with 4 independent ones and would like to regress each Y portfolio. This would take ages though, and in the regression function of excel you cannot put more than 1 Y variable at a time…
Also, is it possible to aggregate AIC values ? Like if I regress the 100 Y portfolios with the Fama-French model, can I aggregate the 100 AIC values obtained to compare them with the AIC values obtained from the regression of the 100 Y portfolios with the Augmented Fama-French model ?
Thanks a million,
Chris
Chris,
It depends on what results you are looking for. E.g. to get the R-square value, you can use the Real Statistics RSquare function. You can place the 100 different combinations of variables in 100 rows and then copy down the results of the RSquare value.
Charles
Dr Charles,
Your course in Regression can only be explained by this adage during Adi Shankara (8th century saint) called somebody as Hastamalaka meaning you have put a fruit on the palm. Your subject brings this clarity and beyond there is no doubt.
Thanks a lot.
Thank you very much. I am very honored and I hope to merit such a beautiful way of expressing things.
Charles
Dr Charles, good afternoon, can I develop a general linear model, using real statistics?
Many thanks
Hello Gerardo,
Real Statistics implements a general linear model in many situations (e.g. for ANOVA), but there is no general linear model data analysis tool.
Charles
Doc many thanks
There was a little mistake in my previous message. I repeat the question. Sorry
Dear Charles,
It is a question about Lasso Regression. I ask it here because I do not see the option to ask in the Lasso regression explanations.
First of all thanks very much for this tool (Lasso Regression) It is very difficult to find a so good explanation of this type of regression.
I have different questions:
1) I obtain the coefficients and with the trace graph I can select the variables. One of the variables has a little strange trace because it goes to 0 but after it takes a higher value. The coefficients for lambda values of 0, 0.0017, 0.017, 0.17, 1.7, 17, 170 of this variable are -0.361158979, -0.359956549, -0.349134681, -0.240915994, 0, 0.088087844, 0.088087844 respectively. Look that after 0 it takes the value 0.088087844. What is happening?
2) Which values of lambda should I select?
3) How can I obtain the r^2 value? In the ridge regression I use the tool but in lasso regression I do not know how to do.
Thanks very much
Dear Charles,
It is a question about Lasso Regression. I ask it here because I do not see the option to ask in the Lasso regression explanations.
First of all thanks very much for this tool (Lasso Regression) It is very difficult to find a so good explanation of this type of regression.
I have different questions:
In the ridge regression it is easy to compute with the tool but in the lasso regression I do not know how to do…
1) I obtain the coefficients and with the trace graph I can select the variables. One of the variables has a little strange trace because it goes to 0 but after it takes a higher value. The coefficients for lambda values of 0 0.0017 0.017 0.17 1.7 17 170 of this variable are -0.361158979, -0.359956549, -0.349134681, -0.240915994, 0, 0.088087844, 0.088087844 respectively. Look that after 0 it takes the value 0.088087844. What is happening?
2) Which values of lambda should I select?
3) How can I obtain the r^2 value? In the ridge regression I use the tool but in lasso regression I do not know how to do.
Thanks very much
Dear Charles
I tried to replicate the example of Poisson Regression that you use on program website, using the Data Analysis Tool. I got a new sheet plenty of #¿VALUE! messages.
I would appreciate any tip on this.
Best regardas
Jorge
Jorge,
If you send me an email with an Excel file containing your data and results I will try to figure out what is happening.
Charles
Dear Charles, thank you for your response. An email with data and output is on the way.
Best Regards
Jorge
Hi, Charles
Does the site have a LOESS function or adaptable utility?
Regards,
Rich
Rich,
Sorry, but the site does not yet describe the LOESS function or LOESS regression.
Charles
When do we use regression analysis?
Justine,
The simple answer is given on the referenced webpage. You need to read more of the other webpages about regression to get a more complete understanding.
Charles
Dear Dr Charles
How could we do the Probit Analysis using the REAL STATISTICS.
Best Regards
M.R. Vaezi
Sorry, but I don’t support probit yet, only logit.
Charles
Thank you Charles for the quick feedback.
It should, but for one minor detail that I think is getting in the way.
We cultured the parasitic organism in 100 different flasks and measured trait A in 10% of the flasks. This gave us a large range/variance. All 100 flasks were then fed to a host and trait B was then measured in 10% of the host. We did not keep track of which parasite was given to which host. Parasite exhibiting a magnitude of 89 of trait A could have gone to hosts with a magnitude of 20 or 200 of trait B.
I could get the average magnitude of trait A and of trait B for one run, but the large variance/range makes me think this comparison may be unreliable. Is there a test that takes into account this large variance? I am looking at ANOVAs, but not sure if I am on the right path.
Additionally, Out of 10 runs, the averages of trait A-trait B gave us multiple R = 0.52, R^2 = 0.27, with significance F =0.083.
Trait A-Trait C: multiple R = 0.83, R^2 = 0.69, significance F = 0.0008
Trait B-Trait C: multiple R = 0.82, R^2 = 0.68, significance F= 0.001
I found it odd that A-C and B-C had moderate/strong correlation, but A-C looks horrible or/and failed the significance f.
Anyways, please keep up the good work on this site! I will find my answers here somewhere =)
Best,
Andrew Liem
This is an amazing reference source for amateur statisticians to grab a foothold. I been browsing this site for the past week, learning as go, to organize and make sense of my data. I hope to hear back from you about some suggestions.
I have a large data set to organize and visualize, but my statistical skills are quite lacking. I have been reading few introductory statistic textbooks and know that I am looking into regression models and correlation.
This seems like a straightforward and easy plot to make, but there are few complications. We have a parasite that have a life-cycle spanning three disparate host vectors. We would like see whether there is any correlation between traits exhibited in each life cycle. For example, if the presence of in vector 1 can be used to quantitatively predict the presence of trait B in vector 2 and trait C in vector 3. We have been quantitatively measuring trait A, trait B, and trait C in their respective host and I am now trying to connect these dots together. Trait A, B, and C are not dichotomous variables (either occur or not occur), but as the term I believe is interval. We quantitatively measure how much of the trait is present.
How each trait was measured in each host is however very different from the other two traits. We culture the parasite in vitro and take measurements from 10 flask for trait A. We then feed these parasites to a colony of bugs and take <10% of the bugs to dissect and measure trait B. We mix contents of the 10 flask together so we are not able to know which bugs ate from which flask. These bugs are then fed to another organism and we dissect these organisms to collect data on trait C. (All measurements taken are quantitatively).
We have done this process 20x and I hope to show whether there is any correlative power between trait A, B, and C.
Please note that these experiments wasn't intended to prove my hypotheses that there is a correlation. I joined the lab later and wanted to organize the dataset my team have. Unfortunately, none of us is a real statistician.
Are there any specific kind of regression models commonly used for culturing organisms with multiple life cycles? Any thoughts would be appreciated
Andrew,
I don’t know of any special type of regression that is used for culturing organisms with multiple life cycles, but it seems that the usual regression techniques should work.
You should be able to do a regression with dependent variable B and independent variable A, and see whether this is useful in predicting trait B from trait A. The fact that the types of measurements are different shouldn’t a priori matter.
You can then try doing a regression with dependent variable C and independent variables A and B. If in the first regression the R-square value is close to 1, then you shouldn’t use both A and B in the second regression since will cause problems with colinearity.
Charles
Here’s a subjective question: when aiming to forecast / predict continuous variables for business objectives (e.g. predicting the quantity of customer orders on a monthly basis) which statistical method do you suggest is most suitable? With the goal of maximizing prediction accuracy, what are your thoughts? I’ve exercised linear / multiple regression but feel that I can still do better on modeling the customer behavior. Maybe time series forecasting is more will yield more precise results?
Ryan,
There is no “one size fits all” answer to your question. This is why there are so many different methods (linear regression, logistic regression, etc., etc.).
Charles
Is there a form of logistic regression that predicts continuous variables instead of a qualitative response?
Ryan,
I don’t know of any such version of logistic regression, although you may be referring to ordinal logistic regression. The dependent variable is ordinal, usually with a limited number of values but with a clear order.
Charles