PoissonRegression

Revision as of 00:07, 14 January 2016 by Bbecane (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


PoissonRegression(y, b, I, K, priorType, priorDev)

A Poisson regression model is used to predict the number of events that occur, «y», from a vector independent data, «b», indexed by «K». The PoissonRegression() function computes the coefficients, «c», from a set of data points, («b», «y»), both indexed by «I», such that the expected number of events is predicted by this formula.

$ E[y] = \exp\left(\sum_k c_k b_k\right) $

The random component in the prediction is assumed to be Poisson-distributed, so that given a new data point «b», the distribution for that point is

Poisson(Exp(Sum(c*b,K)))

Bayesian Prior

Poisson regression is extremely susceptible to overfititng, and when overfitting occurs, you end up with a model that is too overconfident in its predictions. Overfitting causes predicted probabilities to be too close to 0% and 100% when presented with new data points, when such confidence is unwarranted. This problem is circumvented by using a Bayesian prior, which can also viewed as a penalty function for coefficients.

The «priorType» parameter allows you to select a Bayesian prior. The allowed values are

  • 0: Maximum likelihood (i.e., no prior)
  • 1: Exponential L1 prior
  • 2: Normal L2 prior

The L1 and L2 priors impose a penalty for larger coefficient values, imposing a bias to keep coefficients small. Each imposes a prior probability distribution over the possible coefficient values, independently for each coefficient. The L1 prior takes the shape of an exponential curve, while the L2 prior takes the shape of a normal curve. There is no obvious reason for knowing whether an L1 or L2 would be better for your particular problem, and most likely that choice won't matter much.

The «priorDev» parameter specifies the standard deviation of the prior -- i.e., how quickly the prior probability falls off. Larger values of «priorDev» correspond to a weaker prior. If you don't specify «priorDev», a guess is made by the function, which will typically be based on very little information. Cross-validation approaches can use the «priorDev» parameter to determine the best prior strength for a problem (see the Logistic Regression prior selection.ana example model in the Data Analysis folder in Analytica for an example).

Weaker priors will almost always result in a better fit on training data (and maximum likelihood should outperform any prior), but on examples that don't appear in the training set, the performance can be quite a bit different. Typically, performance on new data will improve with weaker priors only up to a point, and then it will degrade and the prior is weakened further. The degradation is from the overfitting phenomena.

Library

Advanced Math

Example

You have data collected from surveys on how many times TV viewers were exposed to your ads in a given week, and on how many times you ran ads in each time slot on those weeks. You want to fit a model to this data so that you can predict the distribution of exposures that you can expect in the future for a given allocation of ads to each time slot.

Each data point used for training is one survey response (from one person) taken at the end of one particular week (Training_exposures indexed by Survey_response). The basis includes a constant term plus the number of times ads were run in each time slot that week (Training_basis indexed by Time_slot_k and Survey_response).

Index Time_Slot_K := [1, 'Prime time', 'Late night', 'Day time']
Variable exposure_coefs := PoissonRegression(Training_exposures, Training_basis, Survey_response, Time_slot_K)

To estimate the distribution for how many times a viewer will be exposed to your ads next week if you run 30 ads in prime time, 20 in late night and 50 during the day, use

Decision AdAllocation := Table(Time_slot_K)(1, 30, 20, 50)
Chance ViewersExposed := Poisson]](Exp(Sum(Exposure_coefs*AdAllocation, Time_slot_K)))

This example can be found in the Example Models / Data Analysis folder in the model file "Poisson regression ad exposures.ana".

History

PoissonRegression is new to Analytica 4.5. In releases before 4.5, the Poisson_Regression function is available to Analytica Optimizer users. The function here supersedes that function and does not require the Optimizer edition.

See Also

Comments


You are not allowed to post comments.