# A Working Understanding of Regression

July 23, 2009 Leave a comment

The units don’t match in g=τc^2.

I get this question a lot and I am beginning to understand that there is a vast difference in most people’s understanding of analytical methods and numerical methods. This is a question I will have to address in my next revision of the book. Allow me to explain by way of examples. I found that if I jump straight into regression some people don’t get it.

The standard distance traveled formula is

s = ut + (1/2) at^2 (1)

It shows that all the individual terms on the RHS must have the same units as the LHS. If this were a regression equation one would write it in the form:

s = p.(u.t) + q.(a.t^2) + r (2)

where p, q & r are some coefficients of constant value. Noting that r would also have the same units as s. This is a non-linear relationship in t, and to adapt this equation for multiple-linear regression, one would combine the first & second term variables into a ‘meta’ (if that is the right word) variables say x1 (=u.t) & x2 (=a.t^2), giving

s = p.(x1) + q.(x2) + r (3)

Regressing for p, q and r using the known values of u, t and a, in the form of x1 and x2, would give the following solution

p = 1 p has the numerical value of 1, and p does not have any units

q = 1/2 q has the numerical value of 1/2 and q does not have any units

r = 0 r has the numerical value of zero and r has the units m (meters)

When we compare equation (3) with equation (1), we note 2 points,

(a) That you don’t write p is 1 in equation (1) even though the coefficient p still exists, and it is dimensionless.

(b) That you don’t write the r term because r is zero, and even if its units is m.

If we were to insists on writing everything out then equation (1) would always be written as,

s m = 1.u.t m + (1/2).a.t^2 m + 0 m (4)

this is technically correct, but not helpful.

Let’s take another example. In the RMBS* & CMBS* sub-industries, the regression loan default model would take the form,

P(d) = a + b.t + c.t^2 + d.t^3 + e.t^4 + f.t^5 (5)

where t is the loan age in months and P(d) the probability of default is dimensionless. Again the units of each term on the RHS must be the same as the units on the LHS which in this case must be dimensionless. Therefore, the units for each constant, a, b, c, d, e & f would take on the following units,

a would be dimensionless

b would have the units month^-1

c would have the units month^-2

d would have the units month^-3

e would have the units month^-4

f would have the units month^-5

In equation (1) the coefficients were dimensionless. Here is an example where the constant term is not dimensionless.

g = GM/r^2 (6)

A regression version of this equation would take the form,

g = a.M/r^2 +b (7)

Since the every term in the RHS must have the same units as on the LHS, this tells us that,

i. a has the numerical value of 6.67428 x 10^-11 and the units of a would be m^3 kg^-1 s^-2

ii. b has the numerical value of zero and the units of b would be m/s^2

since b=0 equation (7) is simply written as.

g = a.M/r^2 (8)

That is we observe 3 points,

- That the coefficients have compensating units.
- That when the coefficients are unity, it is not written into the formula even though these coefficients are not dimensionless; they are ‘silent’ and do not appear in the formula.
- The units of the coefficients do not have to be the same as those of the other coefficients or the LHS.

Returning to g=τc^2. This equation was derived using multiple-linear regression, so the unsolved regression equation took the form,

g = a.(t1-t2)/(r1-r2) + b (9)

where a and b are regression coefficients. Since, the units of each term on the RHS must be the same as on the LHS,

a .(t1-t2)/(r1-r2) would have the units m/s^2

b would also have the units m/s^2

Since the time dilations t1 and t2 have the units of seconds, s and contracted distances r1 and r2 have the units of meters, m, this informs us that

a would have the units of m^2/s^3

b would have the units m/s^2

to solve for a and b using multiple-linear regression, equation (9) would take the form

g = a.(x1) + b (10)

where x1 = (t1-t2)/(r1-r2). The regression solution shows that

a has the numerical value of c^2, the square of the velocity of light

b has the numerical value of zero

and the regression takes the form

g m/s^2 = c^2.(x1) m/s^2 + 0 m/s^2 (11)

since b is zero and can be ignored as with (3) and (1), equation (11) can now be rewritten as

g m/s^2 = (x1).c^2 m/s^2 (12)

or to write it more elegantly

g=τc^2 (13)

Hope this clears all confusions.