If y is a dependent variable (aka the response variable) and x1, …, xk are independent variables (aka predictor variables), then the multiple regression model provides a prediction of y from the xi of the form
Topics
- Basic Concepts
- Matrix Approach to Multiple Regression Analysis
- Using Excel to Perform the Analysis
- Real Statistics Capabilities
- Regression using the Solver option
- Categorical Coding
- Testing Regression Coefficients
- Seasonal Forecasts using Regression
- Sample Size Requirements
- Alternative Approach to Multiple Regression Analysis
- Interpreting Regression Coefficients
In addition, some theoretical issues are described on the following webpage that may be of interest to some readers who know calculus:
Reference
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Hi.
I have about 10 Likert scales (the mean of several individual Likert items). The sample size is relatively small (only 31 participants). I would like to explore their correlations and relationships. As for correlations, is Spearman better than Pearson in my case? As for relationships, I would like to explore whether scale 1-8 predicts scale 9 and 10. Can I use any form of regression analysis?
Looking forward to your help.
Kris,
1. Are you saying that you have 10 groups each with 31 data elements? Does each group represent an independent sample or do you have 10 data points for each participant? Do the samples contain data in Likert scale (e.g. 1, 2, 3, 4, 5) or are these data elements derived from a Likert scale and so can contain any numeric value between say 1.0 and 5.0 (e.g. 3.7)?
2. Which type of regression to use depends on the answers to the questions I posed above.
Charles
Hi, thanks for the quick reply.
I have made a 5-point Likert-scale questionnaire and invite 31 participants to complete it. In the questionnaire, I have classified all Likert items into 10 scales (i.e., 8 are independent variables, and 2 are dependent variables). Those scales are averaged, so they have numeric values (ranging from 1 to 5).
So, if I understand it correctly, yes, “I have 10 groups each with 31 data elements”,and “the samples contain data in Likert scale (e.g. 1, 2, 3, 4, 5) and these data elements derived from a Likert scale and so can contain any numeric value between say 1.0 and 5.0 (e.g. 3.7)”.
Hello Kris,
Since the data elements can take any numeric value from 1 to 5, you can use Pearson’s correlation. In Excel, the correlation between each pair can be calculated via the CORREL function. You can use the Real Statistics CORR array function to calculate all pairwise correlations at once. You can also use the Real Statistics Correlation Tests data analysis tool to perform hypothesis analysis.
You can use Multiple Regression to make predictions for Group 9 from groups 1 through 8 and then a separate regression analysis to make predictions about Group 10. Since the independent variable is constrained to a value between 1 and 5, you may get predictions outside this range.
Charles
Hi, Charles,
As for correlations, I will follow your advice and use SPSS to conduct it.
As for regression, today I found a file said that dependent variables being a Likert scale can only run logistic regression. All my dependent and independent variables are Likert scales, could i still use Multiple Regression as you advised? (see: https://www.st-andrews.ac.uk/media/ceed/students/mathssupport/Likert.pdf )
Plus, will a sample size (n=31) be proper to conduct Multiple Regression when there are 2 dependent variables and 8 independent variables?
Heartfelt thanks 🙂
Hi Kris,
From your previous response, you said that the data could take values such as 3.7 and not just 1, 2, 3, 4 and 5. This would mean that your data, while derived from a Likert scale, doesn’t follow a Likert scale.
If your data follows a Likert scale 1, 2, 3, 4, and 5 are the only values, then you could use Ordinal Regression.
Whether you use ordinal regression or multiple linear regression you need to carry out two such regressions, one for each dependent variable.
There is a multivariate version of these, whereby you perform both regressions at the same time (taking into account the covariance between
the two dependent variables).
Whether or not a sample of size 31 is sufficient depends on which regression you use and assumptions (namely effect size, alpha, power goal). See the following webpage regarding multiple linear regression:
https://www.real-statistics.com/multiple-regression/statistical-power-sample-size-multiple-regression/
Charles
Hi, Charles, so many thanks for your answer:)
Regression may not out of option, as I examine my data and find them violate normal distribution and possibly the sample size requirement.
Much as I wish to see how the independent variables affect independent ones, I may use correlation coefficient to describe relationships between variables, and cannot explore causality among them.
Kris
HI Charles,
Is there a “calculator” oe app on the web for calculating “confidence and prediction intervals” using two predictor variables.
Thanks in advance
Hi Prof. Hall,
I am not aware of such a web calculator, but I have not tried to search for one.
The Real Statistics software will provide this information. You can download the software for free at
https://www.real-statistics.com/free-download/real-statistics-resource-pack/
Charles
Dear
I was wondering if you can suggest a way to perform non-linear multiple regression without knowing the equation to use. I.e I would like to get a non linear equation without prior knowledge of the equation. One thing to keep in mind is that, I would like to do step wise non-linear multiple regression.
Thanks in advance
Generally, you can use Solver for these sorts of problems, but you would at least need to know something about the form of the equation.
Charles
Hi Sir,
Can you help me with this problem
I have 16 independent variable with no dependent variable
How can I find which independent variable are statistically significant?
Thank you
Hello Andro,
Statistical significance only makes sense with respect to some statistical test. Which statistical test are you referring to?
Charles
Thank you for your answer
I already do it with the help of my friend.
Hi
i have the following Data- a quadratic of power 4.
Pl tell me how to analyse and interpret this
Temp Average Force
5 13.061
10 15.046
20 17.339
30 14.300
40 12.154
Hi Vinayak,
See Polynomial Regression
Charles
Hi Charles,
For my report, I have 28 variables (including independent and control varibles), and when I use multiple regression method in excel, it says that I cannot put more than 16 varibles in‘Input X Range’. Therefore,I am very confused about how to handle those variables.
Yolande,
The Excel data analysis tool only handles 16 variables. You have the following choices:
1. Use Excel’s LINEST function. It handles more than 16 variables.
2. Use the Real Statistics Linear Regression data analysis tool. It handles more than 16 variables.
Both of these are described on the Real Statistics website.
Charles
HI bro can you provide me that data .actually i am beginner and wants to learn . i am not getting data
my mail id is – rajeshmirdul32@gmail.com
You can find all the workbooks with the data at
Examples Workbooks
Charles
Hi Charles,
I am selecting a best model out of 10 model tested by comparing all criteria (RMSE, RSE, AIC, MAE, Bias assessment from graph etc). I found model 3 with the best performance but all coefficients of this models are not significant. Can I say this model is best for predicting my response variable? Does model need to have all significant variables to be a best in predicting response variable?
Model 3: Y = a + b(x1*x2) + c(x3). This model shows lowest AIC, RMSE, MAE, less bias of residuals as compared to all other models. However, a and c are significant but b is not insignificant.
Thanks
-Mira
Mira,
It is possible that none of the variables are significant, although not so likely. In this case you should be able to get a model with lower AIC by removing one or more of these variables. I suggest that you double check your results. You can use a model with variables that are not significant for prediction (although one or more of the variables probably won’t have much of an impact on the prediction).
As far as which model is best at predicting, it is probably best to hold out some of your data (or get additional data) and not use it it to create your model(s). You can then see how well alternative models do at predicting the values of the y variable with this additional data. Since you know the correct values of the y variable, you can measure the accuracy of the predictions (using RMSE, MAE, etc.) for the predictions.
Charles
Dear charles,
Can you please explain how to find independent variables with the help of dependent variable.
Utilizing the data set starting with x2 in cell C1
x2 x1 Y
20 1.5 55
21 2 65
25 2.8 80
27 2.9 90
28 3.4 95
31 3.6 105
38 3.8 125
40 3.9 135
41 4 140
42 4.5 150
I have generated a chart with the x1 and Y values and then formatted the line and checked the box to display the formula and R2 values. I then successfully reproduce the coefficient(s) and y-intercept b values that Excel produces in a chart with by utilizing these formulas as an array.
Linear – =LINEST(E2:E11,D2:D11,TRUE,TRUE)
Logarithmic – =LINEST(E2:E11,LN(D2:D11),TRUE,TRUE)
Power – =EXP(LINEST(LN(E2:E11),LN(D2:D11),TRUE,TRUE)) and =(LINEST(LN(E2:E11),LN(D2:D11),TRUE,TRUE))
Exponential – =EXP((LINEST(LN(E2:E11),D2:D11,TRUE,TRUE))) and =(LINEST(LN(E2:E11),D2:D11,TRUE,TRUE))
2nd Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2},TRUE,TRUE)
3rd Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2,3},TRUE,TRUE)
4th Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2,3,4},TRUE,TRUE)
5th Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2,3,4,5},TRUE,TRUE)
I then modified the Excel chart X values to include the X2 values (Column C) which of course changed all of the formulas that the chart produces, I assumed that changing the D2:D11 values in the aforementioned formulas to C2:D11 I would again replicate the formula in the Excel charts, however I have not been successful, here is the outputs for a 3rd Order Polynomial that the Excel Chart formula produces: y = -0.0369×3 + 0.5332×2 + 8.7519x + 46.5
Can you provide suggestions on how I would modify the above formulas so I can calculate the coefficient(s) and y-intercept b values?
Brett,
If I understand correctly you want to do polynomial regression with more than one independent variable. Note that the situation is more complicated than for one variable. E.g. a 3rd order polynomial with two independent variables can take the form
y = b1 * x1^3 + b2 * x2^3 + b3 * x1^2 * x2 + b4 * x1 * x2^2 + b5 * x1^2 + b6 * x2^2 + b7 * x1 * x2 + b8 * x1 + b9 * x2 + b10.
You can create a linear regression model of this type using the approaches in both
https://real-statistics.com/multiple-regression/interaction/
and
https://real-statistics.com/multiple-regression/polynomial-regression/
Charles
Charles, Can you put the calculation Y= B0 + B1*x1 + B2*x2 … into a Excel Cell formula?
Brett,
I am not sure that I understand your question, but doesn’t the FORECAST function do what you are looking for?
Charles
After reading my question I can understand your confusion, please allow me to restate the question.
Utilizing the data set starting with x2 in cell C1
x2 x1 Y
20 1.5 55
21 2 65
25 2.8 80
27 2.9 90
28 3.4 95
31 3.6 105
38 3.8 125
40 3.9 135
41 4 140
42 4.5 150
I have generated a chart with the x1 and Y values and then formatted the line and checked the box to display the formula and R2 values. I then successfully reproduce the coefficient(s) and y-intercept b values that Excel produces in a chart with by utilizing these formulas as an array.
Linear – =LINEST(E2:E11,D2:D11,TRUE,TRUE)
Logarithmic – =LINEST(E2:E11,LN(D2:D11),TRUE,TRUE)
Power – =EXP(LINEST(LN(E2:E11),LN(D2:D11),TRUE,TRUE)) and =(LINEST(LN(E2:E11),LN(D2:D11),TRUE,TRUE))
Exponential – =EXP((LINEST(LN(E2:E11),D2:D11,TRUE,TRUE))) and =(LINEST(LN(E2:E11),D2:D11,TRUE,TRUE))
2nd Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2},TRUE,TRUE)
3rd Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2,3},TRUE,TRUE)
4th Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2,3,4},TRUE,TRUE)
5th Order Polynomial – =LINEST(E2:E11,D2:D11^{1,2,3,4,5},TRUE,TRUE)
I then modified the Excel chart X values to include the X2 values (Column C) which of course changed all of the formulas that the chart produces, I assumed that changing the D2:D11 values in the aforementioned formulas to C2:D11 I would again replicate the formula in the Excel charts, however I have not been successful, here is the outputs for a 3rd Order Polynomial that the Excel Chart formula produces: y = -0.0369×3 + 0.5332×2 + 8.7519x + 46.5
Can you provide suggestions on how I would modify the above formulas so I can calculate the coefficient(s) and y-intercept b values?
Brett,
I see that you have resent your comment on 8 May 2017. I will respond by tomorrow.
Charles
I want to conduct a multilinear regression and both my independent (location, age, gender, livelihood, land holding type, education) and dependent variables are all categorical, and moreso, the dependent or response are multi-response where they had options of mentioning animals upto 9 individuals which i categorised as herbivore, carnivore and primate. How can I go about this to have a single equation which explains the variation each independent variable contributes to response variable
Sakimba,
If you have one dependent variable which is categorical, then you could use multinomial logistic regression. How to perform such a regression is described at Multinomial Logistic Regression.
The independent variables must be coded as dummy variables. E.g. a variable with k possible outcomes would be coded by k-1 dummy variables which take only values of 0 or 1.
Charles
Hi,
I was wondering how to perform a multiple regression analysis using MS Excel if there are gaps in the main dataset (dependent variable) and a response time between the dependent variable and the two predictor variables.
Basically, I need to determine the relationship between glacier area and climatic factors(air temperature, precipitation) in a multidecadal time series. There are gaps in the years for glacier area but no gaps in the climate data.
Thanks in advance.
Laz
See Handling with Missing Data
Charles
Hi. Am doing a research. The independent variables are 2 and dependent variable is one. Which test is apt? is it multiple regression?
Kamakshi,
It could be multiple linear regression, but it really depends on what sort of analysis you want to do.
Charles
I have survey research data that I’ve collected. It’s been converted into nominal data I.e numbers based on the likert replies. One dependent variable with three independent variables and a host of controls such as age, gender, etc. is the a simple way to test for inferences that the independent variables affect the dependent variable? And how do I use the controls to confirm they don’t change the findings?
Sounds like a fit for some sort of ANOVA or regression, but, as usual, “the devil is in the details”.
Charles
Can you offer a resource to reference?
The website is full of references. See Bibliography.
Charles
Charles I am one bcom student ,teachers giving one assignment about multiply regression analysis ,he say choose one industry and choosing this industrys one product .and collect the data obout his revenue .and show this products forecasting for next year.please help me tomorrow last date for submission
Akshay,
I am happy to answer questions and provide tutorials about various statistics topics, but you will need to do your own homework assignment.
Charles
Sir,
I am trying to make a credit scoring model for sub-prime loans. I would be looking at various variables such as length of employment, length of residence, income, credit utilization, debt-to-income ratio, length of credit etc. All this information we can gather from our past accounts and the outcome would be “paid as agreed”, “charged off”,” paid with minor collection activity”. How do I come up with a multi-variable regression model?
Thank you
Tanvir
Tanvir,
Is the outcome from the model a numeric credit score or an assignment of some evaluation (excellent, very good, good, etc.)? This will help determine whether a linear regression or logistic regression model (or some other model) is best.
Sorry, but I didn’t understand the part about “the outcome would be “paid as agreed”, “charged off”,” paid with minor collection activity”. Are you referring to the outcome from one of the independent variables or the regression model (i.e. the dependent variable)?
Charles
The outcome from the model would be a numeric score which would help us determine the risk assigned to a particular client.
The independent variables would be length of credit, credit utilization, debt-to-service ratio etc.
” Paid as agreed”, “charged off” would be the dependent variables
Tanvir,
The independent variable(s) is the generally considered the outcome from a regression model. You say that ” Paid as agreed”, “charged off” are the independent variables, but these are categorical, which seems to contradict the statement that you are looking for a numerical score.
Charles
Charles,
Perhaps you can help me. I am currently teaching a high school psychology class and I am thinking about having my students observe their teachers for one week and document their observations concerning the number of times teachers ask them to recall, comprehend, analyze, synthesize, and evaluate information. Once they have recorded the data, I would like for them to run a multivariate analysis against the number of years the teacher has taught. It’s been awhile since I have taken statistics, statistically speaking, how would I instruct them to accomplish the task? Your assistance will be very much appreciated.
Dawn, it would take me longer time than I have now to answer your question in detail, but you can look at the MANOVA webpage for more details.
Charles
In practice this assumption can virtually never be confirmed; fortunately, multiple regression procedures are not greatly affected by minor deviations from this assumption.
Dear Charles,
Thank you for the article. I would like to get your advice on a project I am working on.
I have monthly data for an oil index (ABC) which is priced based on other oil price indexes such as (WTI or Brent – nobody knows what exactly). For e.g. February price for ABC oil index will likely be priced based on January prices from other oil indexes.
I ran multiple linear regression using ABC oil index as Y and the other oil indexes as X variables in an attempt to obtain a formula so that I can forecast ABC oil index prices for the next month but results were not accurate enough.
Are there any other regression models I could run on excel to come up with formulas to forecast my ABC oil index?
Appreciate all the help I can get, thank you.
Lucas,
I suggest that you take a look at the Time Series Analysis webpage.
Charles
Hello! Your site is fantastic I must say. I’d like some assistance with excel modeling, my stats is a bit rusty. Here is the scenario:
– I have quarterly data of stock prices for stock ABC going back to 1990
– I have quarterly data for the same time period of one the the industries leading indicators (call this deliveries)
– when I plot the two data sets on a simple line graph, I can clearly see that they move in the same direction at the same time
– I have my own prediction on what the independent variable will be in the coming quarters (deliveries)
– how do I run regression in excel to determine a regression formula such that I can use my forecast delivery numbers by quarters to predict the future stock prices
– what type of regression do I use? Given I am using stock prices, values cannot be negative. I know I’m supposed to use a certain type of analysis for this but I can’t remember what.
Any help would be appreciated! Thanks
Brian,
It looks like you need to use regression with seasonality. You can learn more about this on the webpage
Regression with Seasonality.
You might also want to look at the information about Time Series Analysis
Charles
Hi charles,
I dont think Regression with Seasonality would be what i’m looking for, but time series seems right. There are many options on the Time Series page…. any advice on which method would be best for me to use?
Thanks,
Brian
Brian,
The Time Series Analysis described on the website looks at univariate time series (i.e. the regression depends on time and possible dependent variable values at earlier points in time). It sounds like in your situation deals with bivariate time series (i.e. the regression also depends on another independent variable besides time). It is for this reason that I suggested ordinary regression but taking seasonality into account.
Charles
Dear sir Good morning.
I would like to create a regression equation from the available experimental data in the following format.
Nu=C.A^x.B^y.C^z
But, i am getting the equations in the below format.
Nu = C+xA+yB+zc.
How to get the equation in “Nu=C.A^x.B^y.C^z” format using Excel or any other software?
Please help me. Thanks
Raja,
The trick is to take the natural log of both sides of the equation. This changes the equation to ln Nu = ln C + lnA*x + lnB*y + lnC*z. This takes the form of a linear regression. For more details see the following webpage
Power Regression
Charles
how to forecast input energy, billing energy , with T&D loss
Abhi,
You need to provide more information before I am able to respond.
Charles
Hello Sir!
I am trying to run MLR for my trace gases data. It is possible to run MLR with one dependent variable and one independent variable?
Yes. You can use the same tools as for multiple regression. See also the part of the website that relates to the simple case that you describe, namely:
Linear Regression
Charles
Hello,
I am trying to create a predictive model for earnings of automotive companies using a multiple regression model. I will be using 6 independent variables. However, there are 10 companies. Is there some sort of error I should consider. If you could give me name for this type of analysis I could do some of my own research.
or would it not matter if the data comes from different companies?
Aamir,
It depends on how you selected the companies (e.g. randomly) and how you plan to use the data. Does this mean that you have 10 values for each of the variables (based on the 10 companies)?
Charles
I ve picked the 10 biggest companies by units sold. its a time series data.
so each company will have 6 independent variables for the past 30 years. eg. earnings = B1x1 + B2x2….
so each variable has 30 values for 10 companies. i.e 300
the independent variables are rnd spending, profit margin, employee expenses…
I will use the model and do tests to conclude if R&D spending affects earnings.
Question:
would disregard the fact that they values for independent variable come from different companies? how can i take the company into account in my analysis?
thank you for your reply Charles. Really appreciate it.
Hello
I am working with time series data (daily data of rainfall) and have some missing data values. Out of 60 years of data data for few months are missing. I want to fill in those missing values using the multiple regression analysis technique. Can you please help me to do this using the real stat package?
Please look at the following webpage for how to do this
https://real-statistics.com/handling-missing-data/traditional-approaches-handling-missing-data/
Charles
Thank you for the reply. I see that this method can work great if the missing data are continuous in nature. What if the missing values are random instead of being in a sequence?
Thanks again!
Som
It really depends on how much of the data is missing. If only a little is missing listwise deletion could still work or you could simply generate random values for the missing data*provided data is truly random(.
Charles
Hi Prof,
I am trying to fit a multiple linear regression model to my data (200 observations of one dependent & 6 independent variables) using excel – data analysis tool pack.
I’ve also checked correlation between the variables, 3-4 pairs have correlation >= 0.45
My R^2 is very low 0.26 (adjusted R^2 is 0.23) shows 26% of explained variation; Significance F is very low (0.05 (say, 0.483737793 and 0.074517895) [For these variables correlation is 0.4577]
And the regression coefficient for all 6 variables are very low (say, -0.0027809, -0.027634038, 0.052207914, -1.74467E-06, -4.78887E-05, -9.98705E-06)
So I tried deleting one of the independent variable with p-value but still getting lower values of R^2, regression coefficients, one p-value > 0.05
I don’t find the correct path to obtain a suitable model.
Please suggest the steps to follow, while building a strong multivariate regression model.
Please email me (see Contact Us) an Excel file with your data so that I can understand the situation better.
Charles
Hi,
I am slowly picking up in learning regression techniques. I have a situation here like i need predict a rating score for labour (response variable) and I have 6 predictor variables of different types. Please suggest the steps to start with and complete the analysis, thus designing a predictive model.
Thanks!
You can find information about this subject at
Testing extra variables in multiple regression
Also relevant are
Interaction
Polynomial regression
Charles
Hi Charles,
I am trying to predict the next days stock price based on past data with todays open values.
A stock named ABC depends on Constituent Indices : CNX 500,CNX 200,CNX SMALLCAP,CNX MIDCAP INDEX.
For CNX 500 we have Open,High,Low, Close
For CNX 200 we have Open,High,Low, Close
For CNX SMALLCAP we have Open,High,Low,Close
For CNX CNX MIDCAP we have Open,High,Low,Close
For stock named ABC we have Open, High,Low, Last, Close
Totally i have 21 variables. Since there are 21 variables, i assumed atleast 9 data points for each variable. So i collected datapoints of 21 *9 = 189 data rows.
I wanted to do multiple regression but excel does not support for these 21 variables.
Could you please suggest the best approach to predict the next days price?
I have seen Interaction, Tolerance, VIF in the links you have provided. Does interaction also applies for the stock as well?
Thanks,
Venkat
Hi Charles,
Small correction, typo in earlier message
I am trying to predict the next day stock price based on past data with next day open values.
Thanks,
Venkat
Venkat,
Excel’s Regression data analysis supports up to 16 independent variables. Excel’s LINEST function can be used instead, and it supports up to 64 variables.
You are better off using the Real Statistics Multiple Linear Regression data analysis tool since it supports as many independent variables as you need and is easier to use than LINEST. The tool also generates the VIF values as well (Tolerance is just the reciprocal of VIF).
If you have two variables x and y, then the interaction is modeled via the variable x*y.
Caution: Since you are using stock market data, autocorrelation needs to be taken into account. I have provided the Durbin-Watson function in the Real Statistics Resource Pack to let you test whether there is significant autocorrelation, but have not yet explained how to revise the regression analysis to take autocorrelation into account; this will be done in one of the next release of the software and website.
Charles
Hi Venkat
You might want to use R (programming language) instead of Excel for Data Analysis. It’s brilliant!
Antonio
Hello,
I am running a multiple regression and am trying to figure out if using three variables instead of two will give me better results. I am trying to figure out the correlation between three independent variables. I can’t run a regression through excel without picking one of the variables as dependent so I was wondering if you have any tips. I am hoping to use the correlation to determine the variance inflation factor to see which additional variable is the least correlated to my other chosen independent variables.
Thank you!
Hello Gabriella,
To determine whether you will get a significantly better result with three variables rather than two, you can use the RSquareTest, as described on the webpage Testing the significance of extra variables on the model. This webpage will also describe the AIC statistic, which can also be used for this purpose.
You can calculate the variance inflation factor using the Real Statistics VIF function, as described on the webpage
Collinearity
Finally, with three you can calculate the multiple correlation using the Real Statistics MCORREL function, but you do need to pick one of the variables as the dependent variable.
Charles
Dear sir,
I am doing analysis on my epidemiological data about the association between dioxin in mother breast milk and their children hormone. I have some questions, could you please help me to find it out?
– I do the log 10 transform to improve normality first. Then, when I make the linear correlation, which data type should I use, log 10 transformed type of base date?
– The same with above, when I make a multiple regression analysis, which type of data should I use?
– Normally, if we find a significant linear relation, we will do multiple analysis to check that relation again. In my case, I found no significant linear relation between all variables, is it necessary for me to do the multiple analysis next?
Thank you very much for reading my questions.
Anh LT
Anh LT,
If you need the data to be normal then you should use the data after the log transformation, otherwise you can use the original data.
It is entirely possible to not find a linear relationship between x1 and y nor between x2 and y, yet find a linear relationship between x1, x2 and y. The following is a very simple example:
y x1 x2
2 4 0
3 0 6
4 -5.5 14
5 15 -5
The correlation for x1 and y is .41, p-value = .59 (regression)
The correlation for x2 and y is -.11, p-value = .89
The correlation for x1, x2 and y is .999, p-value = .039
Thus, you should do the multiple linear regression. In fact, there isn’t much point in doing the simple regressions.
Charles
Is there a way to perform regression analysis and correlation matrix without the data analysis pak installed. Not able to install at public library
Yes. Excel provides regression and correlation data analysis tools. These can be accessed at Data > Analysis | Data Analysis.
Charles
Dear sir,
Im trying to find out what method to apply when analysing results from a questionnare. I have 1 dependent variable and 3 independent variable. The dependet variable is privacy.
Many thanks
Hi Charles,
I am trying to analyze the correlation and regression between 1 dependent variable(market value of the stock) and 4 independent variables(external factors of the economy) where i need to do this for 20 companies. what model should i use? can i make one table with all the market values of 20 companies and external factors??please help
Kiran,
If I understand correctly, your data can be organized as follows:
Company Fact1 Fact2 Fact3 Fact4 Value
GM 34.2 12.5 2 10 34.5
etc.
You would have 20 rows of data. You could use multiple regression to predict the value of the stock based on the values of the four factors. The value of R in the output is the correlation.
Charles
Dear Sir,
What stat formula will I use if I have one independent variable (personal formation program of the school) and 3 dependent variables (behavior at home, in school and in the community? You may email me for the answer. Thanks.
gadzilva
It depends on what hypothesis you want to test.
Charles
I’m doing a multiple regression on 23 independent variables. Granted, many of them are interaction variables. The 23rd variable is not giving any coefficients nor standard error data, which is then causing errors in the following cells. I’ve tried running it again without the 23rd variable and the 22nd variable is fine, but when I add in the 23rd, it doesn’t work. Any thoughts?
David,
A possible explanation is collinearity (see Collinearity), but I’d have to see the data to figure out what the problem is. Presumably you are using the Real Statistics data analysis tool since the Excel Regression tool is limited to 16 independent variables.
Charles
I am using that and thank you for that tool. I’m pretty confident that it is Collinearity as the variable is actualy interacted between 3 other variables and one is a squared variable. (X2sqX3X4) but there are other collinear variables in the entire data set so I was surprised when it was the only one that did that.
Please,who can help me among you here guys in doing the stat of my research. it’s really nerve cracking for because i’m not really good in stat.E mail me please,I would really appreciate your help. =(
sir i am un able to aply the regression on 1 dependent and 3 independent variabls plz help me
Are you trying to do the reression usin the standard Excel Regression data analysis tool or are you using the Real Statistics Linear Regression data analysis tool or something else?
Charles
Dear Charles,
I am investigating the dependency of a set of PCA components (each component with 27 values, that would be my dependent variables) with regard to a set of design evaluations (12 values, independent variables), I would like to ask you which is the best regression method for datasets which don’t have the same dimension?
Many thanks in advance
Alicia
Alicia,
I don’t completely understand your question. Regression is normally performed on one dataset and so I don’t know how to intend to use multiple datasets, unless you you are simply referring to the data for each variable as a different dataset. In this case, perhaps by different dimensions you mean that some data is missing which is causing the sample sizes to be different. Please clarify these points.
Charles
Dear Charles,
Your Real Statistics Add-In tools are great! Just wondering if there is a way to constrict the constant to zero when using the Linear Regression tool?
Regards,
Jason
Jason,
At present I don’t provide any way of constricting the constant to zero. I understand that there is no general agreement for how to do this. Excel’s Regression data analysis tool does provide this capability, but I am not sure whether the approach used is widely accepted.
Charles
Dear Sir,
thank you very much for your response.
Now the results of regression analysis are updating as per the changes in the data set.
but if I change the no. of inputs (like- earlier I have used 50 data points and now if I try the same with 48 data points), then this regression analysis is not showing any results.
So kindly do the needful to resolve the issue.
Thank you once again for your support.
Regards,
Jai
Dear all,
I am Jai, I am using this regression analysis for Weibull calculation, but I am looking for some kind of procedure, by which the regression analysis will update automatically. Because currently if I change the input data, then only graph is changing (other calculations are not changing, so if I change the data then I have to run regression analysis once again)
I would really appriciate, if someone can provide me the details about autoupdation of Regression Analysis.
Regards,
Jai
Jai,
The analysis performed by Excel’s Regression data analysis tool does not update automatically when the input data is changed.
The Regression data analysis tool provided by the Real Statistics Resource Pack will update automatically when the data is changed. It also handles 64 independent variables (instead of only 16 variables supported by the standard Excel Regression tool).
You can download the Real Statistics Resource Pack for free by going to the webpage https://real-statistics.com/free-download/real-statistics-resource-pack/. Please make sure that you install the software as described on that webpage.
See webpage https://real-statistics.com/multiple-regression/multiple-regression-analysis/ for more information about how to use the Real Statistics Regression data analysis tool.
Charles
Dear Sir,
I need to carry out multiple regression analysis on ordinal (satisfaction measures) independent variables. There are three parameters each categorised into factors and subfactors (variables). Each subfactor includes multiple questions to get satisfaction rating. Thus summing up the scores from questions to subfactors; subfactors to factors; factors to each respective paramater and finally combined score of all three parameters considered as score of the aspect of interest. My first question is whether the method is correct, and second is which specific regression analysis method should I use. Sample size is 300 households within 16 clusters equally divided into two categories.
Charu,
The approach seems reasonable from what I understand, but with such little detail I cannot say whether the method is correct or not, nor which type of regression analysis to use.
Charles
Dear Sir
I’m preparing budget for my company (production & sales budget) for different products. I had data of 3 years of production & sales for different product. Can you please help me for future projection which statistical formula will be helpful
Thanks
Depending on what the data looks like multiple regression might be a way to make the projection. When you say that you have 3 years, does that means that you have three data elements for production & sales for each product? Or do you have monthly or weekly data for the 3 years? I ask since with only 3 data elements, projections will be somewhat limited.
Charles
3 dependent vars (surveys) and one independent var (performance)
Hi! I am doing my research using likert scale (effect of attitudes) as dependent variable to english performance( ind. var). What stat tool would I employ? thanks
It really depends on what you are trying to demonstrate. What hypothesis are you tryin to test? It is not sufficient to say how many variables you have.
Charles
Hello Mr. Zaiontz,
I am conducting a study with 4 independent variables and one dependent variables. These 4 independent variables include 3 survey data and the other one is method of teaching. The dependent var is grade. Can I use multiple regression? How am I going to come up with a regression model? Please help me…. thanks
survey questions utilizing Likert scale?
Please be more specific. What would you like to find out about such questions?
Charles
Niel,
Yes you can use multiple regression, although the assumptions will likely not be fully satisfied. As to how to do it, please read the website.
Charles
thanks a lot!
According to this formula
Poverty = 0.457 + .00142 ∙ Infant Mortality + .0363 ∙ White + 1.279 ∙ Crime
for a certain case of Alabama where Infant Mortality =9, White= 71 and Crime= 448
Poverty should be 0.457 + 0.01278 + 2.577 + 573 = 573
And not around 15.7 as it is in the table. The result is clearly off and contribution from Crime is significantly higher then contribution from first 2 factors. This would make sense if first 2 factors do not correlate at all with Poverty but i think this is not the case.
I might be wrong but it appears to me that something is a bit off…
You are correct. I made two typing mistakes when I copied the coefficients in Figure 3. The correct regression line should be:
Poverty (predicted) = 0.437 + 1.279 ∙ Infant Mortality + .0363 ∙ White + 0.00142 ∙ Crime
Thanks for catching this error.
Charles
Compressive Strength Tensile Strength % polymer added
25.33 2.44 0
28 2.69 1
33.63 3.11 2
34.52 3.3 3
32.74 3.06 4
28.89 2.78 5
25.93 2.55 6
24 2.12 7
21.62 1.93 8
thirdt column values are independent and remaining 2 columns r dedenedent how 2 do multiple regression? plz
The usual multiple regression model would have two independent variables and one dependent variable. Since you have two dependent variables and one independent, you can create two separate regression models with one dependent and one independent variable. If you want a single regression model with two dependent variables then you probably want multivariate regression. R provides this capability.
Charles
i am doing the same Ctrl-Shift-Enter but the result is not coming. final result is coming like this #Value.
for my problem Y= b0+b1x1 +b2x2 +b3x3 +b4x4 +b5x5 +b6x6
Y matrix=(22*1)
x matrix=(x1 x2 x3 x4 x5 x6)
x1….x6 matrix=(22*6)
please tell how i can solve this problem using multiple regression. using TREND and LINSET command.
Sir i am unable to get the result by applying the command =TREND(B4:B53,C4:E53,G6:I8) at figure 2 of multiple regression analysis.
please inform me how i will over come the problem and i will get the same result what you got at table 2.
TREND is an array formula. It is essential that you press Ctrl-Shift-Enter (and not just Enter) after inserting the formula. If this is not the problem, let me know.
Charles
Very useful and practical. Actually the section for coefficient analisys is sustantive.