Log-normal distribution

Revision as of 01:12, 29 September 2018 by Lchrisman (Talk | contribs) (categories)


LogNormal(median, gsdev, mean, stddev)

Generates a sample with a lognormal distribution given «median» and «gsdev» (geometric standard deviation), or «mean» and «stddev» (standard deviation).

The logarithm of a lognormal random variable has a normal distribution. Lognormal distributions are useful for many quantities that are always positive and have long upper tails, such as concentration of a pollutant, or amount of rainfall.

A Normal distribution is symmetric around its mean:

If x := Normal(mean, sdev), then P(x <= mean - sdev) = P(x >= mean + sdev) = .15.

Analogously, a lognormal distribution is ratio-symmetric around its median:

If y := LogNormal(median, gsdev), then P(y <= median/gsdev) = P(y >= median*gsdev) = .15.

If you specify no parameters, it defaults to standard lognormal -- i.e. whose natural logarithm is a unit normal, mean 0 and standard deviation 1.

You can actually specify any two of the four parameters, from which it can compute the other two:

LogNormal(median: med, gsdev: gs) or just LogNormal(med, gs)
LogNormal(median: med, stddev: sd)
LogNormal(median: med, mean: mu)
LogNormal(mean: mu, stddev: s)
LogNormal(mean: mu, gsdev: gs)
LogNormal(gsdev: gs, stddev: sd)

If you specify more than two parameters, it will give an error.

Like other distributions, you can also give one or more «Over» indexes. These cause it to generate an array of independent lognormal distributions over the specified index(es). For example,

LogNormal(m, gsd, Over: i)

Syntax:

LogNormal(median, gsdev, mean, stddev: Optional Positive; over: ... Optional Atom)

Parameter Estimation

Suppose X contains sampled historical data indexed by I, and consisting solely of positive values. To estimate the parameters of the best-fit LogNormal distribution, the following parameter estimation formulae can be used:

«median» := Median(X, I) or Exp(Mean(Ln(X), I))
«gsdev» := Exp(SDeviation(Ln(X), I))

A more general form, with one extra degree-of-freedom, is the LogNormal with an offset, i.e.,:

LogNormal(median, gsdev) - offset

The more general form can be adapted to data sets containing negative numbers. The offset is constrained so that

offset > -Min(X, I)

To my knowledge, a closed form formula for offset does not exist, so that finding the optimal value of offset requires a 1-D search or optimization. However, I have found that the following heuristic estimation formulae comes extremely close to the best-fit parameters with offset:

offset := -Min(X, I) + 2*(Median(X, I) - Min(X, I))/Sum(1, I)
median := Median(X + offset, I)
gsdev  := Exp(SDeviation(Ln(X + offset), I))

See Also

Comments


You are not allowed to post comments.