Difference between revisions of "User:Lchrisman/JQPN Cowlick"

(Cowlick ubiquity)
Line 19: Line 19:
 
== Cowlick ubiquity==
 
== Cowlick ubiquity==
  
Although the cowlick was quite obvious after I had implemented the algorithm, I found it to be non-trivial to prove that it happens.  Here I'll prove that the cowlick always happens except in the very special and unusual case where <math>B = (L+H)/2</math>L, B and H are the probit-transformed values for the 10th, 50th and 95th percentiles as defined in the paper. Loosely speaking, this is the case where the median is closer to the 10th percentile than to the 90th percentile.
+
Although the cowlick was quite obvious after I had implemented the algorithm, I found it to be non-trivial to prove that it happens.  This is my attemptI focus just on the upper bound first.
  
:'''''Theorem:''''' ''In the fully-bounded Hadlock & Bickel JQPN distribution, when <math>B < (L+H)/2</math>, the probability density <math>f_B</math> at the upper bound <math>u</math> approaches infinity -- i.e.,''
+
:'''''Theorem:''''' ''In the fully-bounded Hadlock & Bickel JQPN distribution, the probability density <math>f_B</math> at the upper bound <math>u</math> and at the lower bound approaches infinity (in all cases) -- i.e.,''
 
::<math>\lim_{x\rarr u} f_B(x) = \infty</math>
 
::<math>\lim_{x\rarr u} f_B(x) = \infty</math>
:''and when <math>B>(L+H)/2</math>, the probability density at the lower bound <math>l</math> approaches infinity''
+
:''where <math>f_B(x) = {{\partial F_B(x)}\over{\partial x}}</math> and <math>F_B(x)</math> is the cumulative probability function given in Eq (8) of the paper.''
::<math>\lim_{x\rarr l} f_B(x) = \infty</math>
+
:''where <math>f_B(x) = {{\partial F_B(x)}\over{\partial x}}</math> and <math>F_B(x)</math> is the cumulative probability function given in Eq (8) on Page 42 of the paper.''
+
  
 
=== Proof of the theorem ===
 
=== Proof of the theorem ===

Revision as of 23:06, 4 January 2020

This page has notes by Lonnie on an oddity that I've named the "Cowlick" of the fully-bounded Johnson QPN distribution from the paper:


This oddity impacts the implementation of DensUncertainLMH, where I was hoping to use their algorithm but ran into this glitch. Other than this, I am enthusiastic about their paper and am hoping to use their algorithm. I want to have a robust built-in function in Analytica, but I need to explore whether this is something that needs to be fixed first. So this page contains my private notes, just so I can understand what is going on, perhaps find a fix.

The cowlick refers to an artifact in their fully-bounded distribution in which the density goes to infinity at the upper and lower bounds. At first I thought it happened at the upper bound when n=1 (when B < ave(L,H)), or goes to infinity at the lower bound when n=-1 (when B>ave(L,H)). But results below (in the absence of a mistake on my part) show that it is always there. (In the less extreme cases, the spike is so thin that you can only detect it by zeroing in very closely around the bound). This is an undesirable trait, and causes there to be what is essentially a step in the CDF at the upper bound, as if the continuous part of the CDF doesn't really go to the upper bound. It isn't a step in a strict sense (it is continuous), but it flicks up so fast at the end that it acts like a step. It would be better if it tended to ease up to the bound more smoothly. I think there are extreme cases where it would need to approach an infinite slope at the bounds (like where the 90th percentile gets arbitrarily close to the UB), but I'm seeing it were that doesn't seem necessary.

The figure here is the cowlick in the illustrative example from their paper, where you can see the density jumping to infinity at the upper bound. Cowlick graph.png This graph is the same as the graph that appears in their paper as Figure 7b, where you can make out this phenomena if you look carefully. Here is their graph with my arrow pointing out the start of the cowlick. Hadlock bickel fig7b.png

The CDF for the distribution parameterized by (-11, -10, -9.99, -9, -8) is shown next, where the cowlick causes a "step" to occur in the CDF at the upper bound. It isn't quite the smooth CDF I had hoped for. Cowlick CDF step.png

The paper doesn't acknowledge this oddity, but it can be detected in their Figure 7b at the upper bound. The oddity does not happen with the unbounded and semi-bounded cases.

Cowlick ubiquity

Although the cowlick was quite obvious after I had implemented the algorithm, I found it to be non-trivial to prove that it happens. This is my attempt. I focus just on the upper bound first.

Theorem: In the fully-bounded Hadlock & Bickel JQPN distribution, the probability density $ f_B $ at the upper bound $ u $ and at the lower bound approaches infinity (in all cases) -- i.e.,
$ \lim_{x\rarr u} f_B(x) = \infty $
where $ f_B(x) = {{\partial F_B(x)}\over{\partial x}} $ and $ F_B(x) $ is the cumulative probability function given in Eq (8) of the paper.

Proof of the theorem

In the paper, the quantile function $ Q_B(p) $ is the inverse of the cumulative probability function $ F_B(x) $, where $ x=l $ corresponds to $ p=0 $ and $ x=u $ occurs when $ p=1 $. Hence, the derivative of $ F_B(x) $ is the reciprocal of $ Q_B(p) $ at the corresponding $ p $. Since the equations are a little simpler for the derivative of $ Q_B $, what I'll actually be showing is that when $ B<(H+L)/2 $

$ \lim_{p->1} {{\partial Q_B(p)}\over{\partial p}} = 0 $

which, one shown, will establish that $ f_B(u) $ approaches $ \infty $.

$ Q_B $ is defined in Eq (7) of the paper as

$ Q_B(p) = l + (u-l) \Phi\left( \xi+ \lambda \sinh( \delta (\Phi^{-1}(p) + n c)) \right) $

where $ l, u $ are the specified lower and upper bounds for the bounded distribution, $ \Phi(x) $ is the standard normal cumulative probability function, and $ \xi, \lambda, \delta, n $ and $ c $ are defined in the paper as a function of the distribution parameters.

From their definitions, $ c>0, \delta>0, \lambda>0 $ always. When $ B<(L+H)/2 $ as assumed by the theorem, by definition $ n=1 $. The proof does not require more precision that this. (For the proof of the lower bound when $ B>(L+H)/2 $, $ n=-1 $).

To simplify notation, I make this variable substitution:

$ z = \delta( \Phi^{-1}(p) + n c) $

yielding

$ Q_B(z) = l + (u-l) \Phi\left( \xi+ \lambda \sinh( z ) \right) $

and

$ {{\partial Q_B(p)}\over{\partial p}} = {{\partial Q_B(z)}\over{\partial z}} {{\partial z}\over{\partial p}} = {{\partial Q_B(z)}\over{\partial z}} \sqrt{2\pi} \delta e^{{1\over 2} \Phi^{-1}(p)^2 } = {{\partial Q_B(z)}\over{\partial z}} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta - n c)^2} $

where $ x\rarr u $ corresponds to $ p\rarr 1 $ and $ z \rarr \infty $, so

$ \lim_{p->1} {{\partial Q_B(p)}\over{\partial p}} = \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2} (z/\delta - n c)^2} {{\partial Q_B(z)}\over{\partial z}} $

and

$ \lim_{p->} {{\partial Q_B(p)}\over{\partial p}} = \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta - n c)^2} {{\partial Q_B(z)}\over{\partial z}} $

and we need to show that these approach 0.

Take the derivative (I used Wolfram Alpha's online derivative calculator):

$ {{\partial Q_B(z)}\over{\partial z}} = {{(u-l) \lambda}\over{2 \sqrt{2 \pi}}} \left( e^z + e^{-z} \right) e^{-{1\over 2} \left( {1\over 2} \lambda \left(e^z - e^{-z}\right) + \xi\right)^2} $

Expand the limit

$ \begin{array}{rcl} \lim_{p\rarr 1} {{\partial Q_B(p)}\over{\partial p}} &=& \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta- n c)^2} {{\partial Q_B(z)}\over{\partial z}} \\ &=& \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta - n c)^2} {{(u-l) \lambda}\over{2 \sqrt{2 \pi}}} \left( e^z + e^{-z} \right) e^{-{1\over 2} \left( {1\over 2} \lambda \left(e^z - e^{-z}\right) + \xi\right)^2} \\ &=& {1\over 2} (u-l)\delta \lambda \left( \lim_{z\rarr \infty} e^z + e^{-z} \right) \left( \lim_{z\rarr \infty} e^{{1\over 2} (z/\delta- n c)^2} \right) \left( \lim_{z\rarr \infty} e^{-{1\over 2} \left( {1\over 2} \lambda \left( e^z - e^{-z}\right) + \xi\right)^2} \right) \\ &=& {1\over 2} (u-l)\delta \lambda \left( \lim_{z\rarr \infty} e^z \right) \left( \lim_{z\rarr \infty} e^{{1\over 2} \delta^{-2} z^2} \right) \left( \lim_{z\rarr \infty} e^{-{1\over 4} \lambda^2 {e^z}^2} \right) \\ &=& {1\over 2} (u-l)\delta \lambda \lim_{z\rarr \infty} e^z e^{{1\over 2} \delta^{-2} z^2} e^{-{1\over 4} \lambda^2 e^{z^2}} \\ &=& 0 \end{array} $

Last line follows because the $ e^{-{1\over4} \lambda^2 e^{z^2}} $ term goes to zero and dominates both $ e^z $ and $ e^{{1\over 2}\delta^{-2} z^2} $.

Note: I did not use the fact that $ n=1 $, so the same proof works for $ z \rarr -\infty $, which if correct would mean that the cowlick appears on both bounds always. However, this is not what I observe. So something is not quite right yet.