Multiple Linear Regression

This app requires you to install a custom plug-in, which can be downloaded here.

The app gives an example of how to take a dataset and create a predictive model using a multiple linear regression.

Multiple Linear Regression Example

In this app's example, several cars have been test-driven and their results and technical specifications logged in the data set. You want to find how each factor affects the fuel consumption of a car and then create a model to predict the fuel consumption of cars in the future.

In a multiple linear regression, take the columns of a training set and divide them into independent variables and one dependent variable.

The independent variables are: distanceTravelled (how far the cars were driven in the test), price, horsepower, weight, acceleration, carLength, carWidth. The dependent variable (i.e., the one you want to predict) is mpg (miles per gallon).

The goal of this analysis is to combine all these columns into the equation below:

 

The Xs in the equation represent the independent variable columns. The Y is the dependent variable column.

The betas (e.g., β 0 , β 1 , β 2 , …, β Nare numbers that are trying to be found. The app finds these values and then makes predictions of Y (in this case, mpg).

Example Results

The results are the values of the beta coefficients from the equation. The app produces a list of value. The index of their position in the list is the same as the subscripted number next to each beta (e.g., β2 has index 2. This is the third position of the list as lists are zero-indexed).

All of these calculations take place in the first workbook. The second workbook uses the model (equation) to predict future results.

In the car example, the training data used was collected from an experiment trying and find an equation that could describe mpg in terms of the other factors such as distance travelled, car width, acceleration etc.

In the workbook “_02_Model Predictions”, it takes the above list as input, along with a prepared sheet of values to predict the effects of.

The goal is to predict the effects of changing the horsepower, weight, acceleration and traveling distance.

The following table was uploaded as a CSV.

Once Datameer has both the model and our new prediction values, you can combine them into the original equation and produce the predictions for mpg.

Connection and Import

The data used in this app is provided through the input file upload. It has 9 columns of manually-generated information and simulates car specifications and performance measured during testing.

The file upload is used as the input sheet in the _01_CorrelationMatrixInput workbook but can be swapped with real life data by adding the new data sheet. Use it to create a column dependencies sheet and then adjusting the following sheets to reference the new column dependencies.

Workbook

The calculation is carried out in the first workbook and the modeling in the second. The details of each sheet are outlined in this section.

_01_MultipleRegression

input - The input sheet, with the independent variables titled as X 1 , X 2 , etc. and the dependent variable as Y.

initializedInput - This is a copy of the input sheet, with any extraneous columns from the original data replaced with the 'interceptColumn'.

betaCoefficientInverse - This page takes the input from 'initializedInput', calculates the necessary matrix entries and performs the first linear algebra operation: the inverse of the product of the X matrix and the X transpose. 

Two custom functions are used in this sheet: Firstly, GROUP_MATRIX_TRANSPOSE_PRODUCT takes the input columns as elements. Then MATRIX_INVERSE is then applied.

UntransformedInverse - This sheet performs the second linear algebra step, which is the product of the X transpose with the vector of independent variables. Here, the custom function GROUP_MATRIX_TRANSPOSE_VECTOR_PRODUCT is used, which also takes the input columns as elements.

Joined - The first two calculations are brought together in this sheet, in preparation for the final calculation.

MultipleRegressionCoefficients - The final calculation is performed here, which is the product of the first two calculations. The custom function MATRIX_PRODUCT is applied to the two lists from Joined. This is a list, the entries of which are the entries of a the final beta vector and the desired result of the workbook. 

_02_ModelPredictions

MultipleRegressionCoefficients - Takes the last sheet from the previous workbook as input.

InputWithGroupKey - Duplicate of 'MultipleRegressionCoefficients, but with a group key added.

ProposedXvalues - These are the X values that are inputted in to the model. This is currently a series of randomly-generated numbers, but can be replaced with an external data sheet.

ProposalsWithModel - This is a join of the model and the proposed X values.

Predictions - The X values are entered in to the model and the results are displayed here. In this sheet you can also define the value N (default setting:10) for the following sheets.

_02_ ModelPredictions (demonstrative sheets)

The prediction analysis is finished with sheet 'Predictions', these extra sheets are included as a possible following course of action.

TopNpredictions - Takes the highest 'N' Y values from the prediction.

BottomNpredictions - Takes the lowest 'N' Y values from the prediction.

HighestPredictions - 'TopNpredictions' is joined back with 'ProposedXvalues' so the original X input values are listed alongside the predicted Y outcome. The 'RowIndex' is the join key.

LowestPredictions - 'BottomNpredictions' is joined back with 'ProposedXvalues' so the original X input values are listed alongside the predicted Y outcome. The 'RowIndex' is the join key.