{ From user Lonnie, Model Intro_to_rank_correl at 25-Mar-2010 11:21:16 AM }
Softwareversion 4.3.0
{ System Variables with non-default values: }
Samplesize := 10K
Usetable := 0
Displayoutputs Run: ,
Typechecking := 1
Checking := 1
Saveoptions := 2
Savevalues := 0
Distresol := 500
{!40000|Att_contlinestyle Graph_primary_valdim: 4}
{!40000|Att_contlinestyle Graph_pdf_valdim: 1}
Model Intro_to_rank_correl
Title: Intro to Rank Correlation
Description: This model, created during the Analytica User Group Webinar on 25 March 2010, was used to illustrate the uses and estimation of Rank Correlation.
Author: Lonnie Chrisman~
Lumina Decision Systems
Date: Wed, Mar 24, 2010 11:51 PM
Saveauthor: Lonnie
Savedate: Thu, Mar 25, 2010 11:21 AM
Defaultsize: 48,24
Diagstate: 2,33,12,472,386,17
Windstate: 2,98,83,476,224
Fontstyle: Arial, 15
Fileinfo: 0,Model Intro_to_rank_correl,2,2,0,0,W:\Training\User Group Webinars\Rank-Correlation-Analysis.ana
Module Soccer_data
Title: Soccer Data
Description: Data was collected from girls' varsity and JV soccer teams at a high school as part of a Masters project in sports Kinesiology, examing whether self-confidence and injury rates are statistically related. The athletes took a standard self-confidence survey before and after the season, and the coach tracked all injuries sustained during the season in detail which have been condensed here to an "injury score" (which scores both severity of injuries and number of injuries during the season).~
~
This data is used here to demonstrate small-sample rank correlation analysis for the Analytica user group webinar.
Author: Lenka Berenova and Lonnie Chrisman
Date: Thu, Mar 25, 2010 9:16 AM
Defaultsize: 48,24
Nodelocation: 88,56,1
Nodesize: 48,24
Diagstate: 2,79,17,587,475,17
Index Player
Title: Player
Definition: ['V1','V2','V3','V4','V6','V7','V8','V9','V10','V11','V12','V13','V14','V15','JV1','JV2','JV3','JV5','JV6','JV7','JV9','JV10','JV11','JV12','JV13','JV14','JV15','JV16','JV17']
Nodelocation: 100,48,1
Nodesize: 48,24
{!40000|Att_previndexvalue: ['V1','V2','V3','V4','V6','V7','V8','V9','V10','V11','V12','V13','V14','V15','JV1','JV2','JV3','JV5','JV6','JV7','JV9','JV10','JV11','JV12','JV13','JV14','JV15','JV16','JV17']}
Constant Pre_season_self_conf
Title: Pre-season Self Conf.
Definition: Table(Player)(~
84,72,82,90,87,86,96,93,91,78,99,86,90,71,94,92,91,92,83,93,104,62,65,93,95,70,87,96,80)
Nodelocation: 100,104,1
Nodesize: 52,24
Valuestate: 2,260,2,416,568,0,MIDM
Constant Post_season_self_con
Title: Post-season Self Conf.
Definition: Table(Player)(~
90,88,91,82,«null»,90,100,«null»,98,67,98,76,97,76,100,93,«null»,98,78,100,«null»,71,86,«null»,89,79,93,88,83)
Nodelocation: 100,160,1
Nodesize: 52,24
Valuestate: 2,36,43,313,451,0,MIDM
Constant Injury_score
Title: Injury Score
Definition: Table(Player)(~
1,2,0,0,0,3,0,0,3,0,0,6,3,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0)
Nodelocation: 100,216,1
Nodesize: 52,24
Valuestate: 2,515,11,360,558,0,MIDM
Variable Rc_pre_conf_vs__inj
Title: rc pre-conf vs. inj
Definition: RankCorrel( Pre_season_self_conf,Injury_score, Player )
Nodelocation: 240,104,1
Nodesize: 48,24
Valuestate: 2,499,272,416,303,0,MIDM
Function Rc_p_value(x,y : ContextSamp[I] ; I : Index=Run)
Title: rc p value
Definition: var n := Sum( x<>null and y<>null, I );~
var rc := RankCorrel(x,y,I);~
var z := 0.4856 * ln( (1+rc)/(1-rc) ) * Sqrt(n-3);~
CumNormal(z)
Nodelocation: 104,320,1
Nodesize: 48,24
Windstate: 2,506,216,476,224
Paramnames: x,y,I
Variable P_value_for_mono_rel
Title: p-Value for mono relation exists
Definition: Rc_p_value(Pre_season_self_conf,Injury_score,Player)
Nodelocation: 240,217,1
Nodesize: 48,40
Valuestate: 2,557,238,416,303,0,MIDM
Numberformat: 2,%,4,2,0,0,4,0,$,0,"ABBREV",0
Function Rc_dist(dataIndex : Index ; sampleRc : scalar)
Title: Rc dist
Description: Estimate the distribution for the underlying rank correlation given the measured sample rc and the number of samples (from the data index).
Definition: If IsSampleEvalMode then (~
var x := BiNormal(0,1,xy,sampleRc, over:DataIndex);~
RankCorrel( x[@xy=1], x[@xy=2], dataIndex )~
) else ~
sampleRc
Nodelocation: 248,320,1
Nodesize: 48,24
Windstate: 2,521,215,476,309
Paramnames: dataIndex,sampleRc
Index Xy
Title: xy
Definition: ['x','y']
Nodelocation: 248,384,1
Nodesize: 48,24
Objective Underlying_rc
Title: Underlying RC
Definition: Rc_dist(Player, Rc_pre_conf_vs__inj)
Nodelocation: 368,104,1
Nodesize: 48,24
Valuestate: 2,356,182,575,339,1,PDFP
Objective Prob_rc_0
Title: Prob rc>0
Definition: Probability(Underlying_rc>0)
Nodelocation: 368,184,1
Nodesize: 48,24
Valuestate: 2,495,251,416,303,0,MIDM
Numberformat: 2,%,4,2,0,0,4,0,$,0,"ABBREV",0
Objective Prob_independent
Title: Prob independent
Definition: Probability( abs(Underlying_rc) < 0.05 )
Nodelocation: 464,184,1
Nodesize: 48,24
Valuestate: 2,564,263,416,303,0,MIDM
Numberformat: 2,%,4,2,0,0,4,0,$,0,"ABBREV",0
Objective Prob_rc__0_3
Title: Prob rc<-0.3
Definition: Probability(Underlying_rc < -0.3)
Nodelocation: 368,248,1
Nodesize: 48,24
Valuestate: 2,551,226,416,303,0,MIDM
Numberformat: 2,%,4,2,0,0,4,0,$,0,"ABBREV",0
Close Soccer_data
Module Synthetic_dataset
Title: Synthetic Dataset
Author: Lonnie
Date: Thu, Mar 25, 2010 9:16 AM
Defaultsize: 48,24
Nodelocation: 200,56,1
Nodesize: 48,24
Diagstate: 2,235,36,550,276,17
Index Test_subject
Title: Test Subject
Definition: 1..25
Nodelocation: 96,48,1
Nodesize: 48,24
{!40000|Att_previndexvalue: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25]}
Variable Data_x
Title: data x
Definition: Table(Test_subject)(~
6.6,5.3,7.8,5.9,3.6,3.7,8.4,9.699999999999999,3.6,3.6,10.2,7.3,5.9,4.2,3.4,9.1,8.1,2.3,7.5,4.2,8.9,3.4,6.6,3.9,5.4)
Nodelocation: 96,104,1
Nodesize: 48,24
Variable Data_y
Title: data y
Definition: Table(Test_subject)(~
5.9,8.300000000000001,8.800000000000001,2.5,0.8,5.6,11.6,8.5,0.1,9.4,11.3,10.7,10.4,3.1,1.3,0.1,6.1,1,3.1,8.1,5.6,9.1,0.7,3,0.6)
Nodelocation: 96,160,1
Nodesize: 48,24
Defnstate: 2,268,246,416,303,0,MIDM
Valuestate: 2,472,250,445,303,1,MIDM
Graphsetup: {!40000|Att_contlinestyle Graph_primary_valdim:4}
Numberformat: 2,D,4,2,0,0,4,0,$,0,"ABBREV",0
Objective Data_scatter
Title: data scatter
Definition: [data_y,data_x]
Nodelocation: 96,224,1
Nodesize: 48,24
Valuestate: 2,105,191,465,349,1,MIDM
Graphsetup: {!40000|Att_contlinestyle Graph_primary_valdim:4}
Reformval: [Test_subject,Undefined]
{!40000|Att_xrole: -2}
{!40000|Att_yrole: -1}
{!40000|Att_coordinateindex: Self}
Variable Rc_xy
Title: rc xy
Definition: RankCorrel(data_x,data_y, Test_subject )
Nodelocation: 224,128,1
Nodesize: 48,24
Valuestate: 2,505,270,416,303,0,MIDM
Close Synthetic_dataset
Library Multivariate_distrib
Title: Multivariate Distributions
Description: A library of multivariate distributions.~
~
In a multivariate distribution, each sample is a vector. This vector is identified by an index, identified by the I parameter of the functions in this library. A Mid value from a distribution function will therefore be indexed by I, whlie a Sample from a distribution function is indexed by both I and Run. These distribution functions can also be used from within the Random function to generate a single monte-carlo sample, which will be indexed by I.~
~
This library also contains functions for generating correlated distributions. Correlate_with, for example, allows you to generate a univarite distribution with an arbitrary marginal distribution that has a specified rank correlation with an arbitrary reference distribution. Several functions may be used for generating serial correlations, where each distribution along an index is correlated with the previous point along that index.
Author: Lonnie Chrisman, Ph.D.~
Lumina Decision Systems~
~
With contributions by:~
John Bowers, US FDA.~
Max Henrion, Lumina Decision Systems
Date: Fri, Aug 01, 2003 7:12 PM
Saveauthor: Lonnie
Savedate: Tue, Nov 11, 2008 1:59 PM
Defaultsize: 48,24
Nodelocation: 328,56,1
Nodesize: 56,24
Nodeinfo: 1,1,1,1,1,1,0,0,0,0
Diagstate: 1,42,10,649,1009,17
Windstate: 2,401,199,483,316
Fontstyle: Arial, 15
Function Wishart( cv : Number[I,J,Run] ; n :positive ; I,J : Index ; ~
singleSampleMethod : optional hidden scalar)
Title: Wishart(cv,n,I,J)
Description: Suppose you sample N samples from a Gaussian(0,cv,I,J) distribution, X[I,R]. (R is the index that indexes each sample, R:=1..N). The Wishart distribution describes the distribution of sum( X * X[I=J], R ). This matrix is dimensioned by I and J and is called the scatter matrix. ~
~
A sample drawn from the Wishart is therefore a sample scatter matrix. If you divide that sample by (N-1), you have a sampled covariance matrix. ~
~
If you compute a sample covariance matrix from data, and then want to use this in your model, if you just use it directly, you'll be ignoring sampling error. That may be insignificant of N is large. Otherwise, you may want to use:~
Wishart( SampleCV, N, I, J) / (N-1)~
instead of just SampleCV in your model. The extended variance will account for the uncertainty from the finite sample size that was used to obtain your sample CV.~
~
If you can express a prior probability on covariances in the form of an InvertedWishart distribution, then the posterior distribution, after having computed the sample covariance matrix (assumed to be drawn, by nature, from a Wishart), is also an InvertedWishart.
Definition: var T := if i0~
Each sample of a Dirichlet distribution produces a random vector whose elements sum to 1. It is commonly used to represent second order probability information.~
~
The Dirichlet distribution has a density given by ~
k * Product( X^(alpha-1), I)~
where k is a normalization factor equal to~
GammaFn( sum(alpha,I )) / Sum(GammaFn(alpha),I)~
~
The parameters, alpha, can be interpreted as observation counts. The mean is given by the relative values of alpha (normalized to 1), but the variance narrows as the alphas get larger, just as your confidence in a distribution would narrow as you get more samples.~
~
The Dirichlet lends itself to easy Bayesian updating. If you have a prior of alpha0, and you observe N
Definition: var a:=Gamma(alpha,singleSampleMethod:singleSampleMethod);~
a/sum(a,I)
Nodelocation: 272,120,1
Nodesize: 58,16
Windstate: 2,26,18,624,485
Paramnames: alpha,I,Over
Function Binormal(MeanVec :numeric[I,Run]; Sdeviations : positive[I,Run]; I:IndexType; correlationCoef : numeric[Run];~
Over : ... optional atomic ;~
singleSampleMethod : optional hidden scalar)
Title: BiNormal (m, s, i, c )
Description: A 2-D Normal (or Bi-variate Gaussian) distribution with the indicated individual standard deviations (>0) and the indicated correlation coefficient. The index, I, must have exactly 2 elements, Sdeviations must be indexed by I.
Definition: if size(I)<>2 then ~
Error("Index to BiNormal must have 2 elements")~
else begin~
var s := product(Sdeviations,I) * correlationCoef;~
Index J:=CopyIndex(I);~
Gaussian( meanVec, If I<>J Then s else Sdeviations^2, I,J,~
singleSampleMethod: singleSampleMethod )~
end
Nodelocation: 288,72,1
Nodesize: 78,16
Windstate: 2,2,24,525,540
Paramnames: MeanVec,Sdeviations,I,correlationCoef,Over
Function Multinomial(N:NonNegative ; theta:NonNegative ; I : IndexType;~
Over : ... optional atomic ;~
singleSampleMethod : hidden optional scalar )
Title: Multinomial (n, theta, i )
Description: Returns the Multinomial Distribution.~
~
The multinomial distribution is a generalization of the Binomial distribution to N possible outcomes. For example, if you were to roll a fair die N times, an outcome would be the number of times each of the six numbers appears. Theta would be the probability of each outcome, where sum(theta,I)=1, and index I is the list of possible outcome. If theta doesn't sum to 1, it is normalized.~
~
Each sample is a vector indexed by I indicating the number of times the corresponding outcome (die number) occurred during that sample point. Each sample will have the property that sum( result, I ) = N.
Definition: var z := n;~
var k := size(I);~
~
var j:=cumulate(1,I) in I do begin~
Index I2 := j..k;~
var theta2 := Slice(theta,I,I2); /* unnormalized sub-process */~
var p := theta2/sum(theta2, I2);~
p := if IsNan(p) then 0 else p;~
var xj := Binomial(z,p[I2=j],~
singleSampleMethod:singleSampleMethod);~
z := z - xj;~
xj~
end~
Nodelocation: 117,120,1
Nodesize: 85,16
Windstate: 2,75,167,476,522
Paramnames: N,theta,I,Over
Function Correlate_dists(dists : Context[I,RunIndex] ; rankcorrs : numeric array[I,J] ; ~
I,J : IndexType;~
RunIndex : optional Index = Run )
Title: Correlate Dists (d, rc, i, j )
Description: Reorders the samples in dists so as to match the desired rank correlations between distributions as closely as possible. RankCorrs must be positive definite, and the diagonal should contain all ones.~
~
The result will be distributions having the same margins as the original input, but with rank correlations close to those of the rankcorrs matrix.
Definition: if not IsSampleEvalMode and Handle(RunIndex)=Handle(Run) Then~
dists {Mid mode}~
Else begin~
var u := if Handle(RunIndex)=Handle(run) ~
Then Sample(Gaussian(0,rankcorrs,I,J))~
Else Random(Gaussian(0,rankcorrs,I,J),Over:RunIndex);~
var dsort := sortIndex(dists,RunIndex);~
var urank := UniqueRank(u,RunIndex);~
dists[RunIndex=dsort[RunIndex=urank]]~
end
Nodelocation: 136,392,1
Nodesize: 100,16
Windstate: 2,301,193,557,477
Paramnames: dists,rankcorrs,I,J,RunIndex
Function Correlate_with( S, ref : Context[RunIndex] ; rc : scalar ; ~
RunIndex : optional Index = Run )
Title: Correlate With (s, ref, rc )
Description: Reorders the samples of S so that the result is correlated with the reference sample with a rank correlation close to rankcorr. ~
~
Example: To generate a logNormal distribution that is highly correlated with Ch1, use, e.g.,: Correlate_With( LogNormal(2,3), Ch1, 0.8 )~
~
Note: This achieves a given unweighted rank correlation. If you have a non-default SampleWeighting of points, the weighted rank correlaton may differ.
Definition: if IsSampleEvalMode or Handle(runIndex)<>Handle(Run) Then begin~
Index q := 1..2;~
var u := If Handle(RunIndex)=Handle(Run) ~
Then binormal( 0, 1, q, rc )~
Else Random(binormal(0,1,q,rc),Over:RunIndex);~
var rrank := UniqueRank(ref,RunIndex);~
var u1sort := sortIndex(u[q=1],RunIndex);~
var u2rank := UniqueRank(u[q=2],RunIndex);~
var ssort := sortIndex(S,RunIndex);~
S[RunIndex=ssort[RunIndex=u2rank[RunIndex=~
u1sort[RunIndex=rrank]]]]~
end ~
else {mid mode}~
S
Nodelocation: 128,312,1
Nodesize: 96,16
Windstate: 2,205,170,545,485
Paramnames: S,ref,rc,RunIndex
Function Uniformspherical(I : IndexType ; R : optional Numeric[I,Run] ;~
Over : ... optional atomic ;~
singleSampleMethod : optional hidden scalar )
Title: Uniform Spherical (i, r )
Description: Generates points uniformly on a sphere (or circle or hypersphere).~
Each sample generated is indexed by I -- so if I has 3 elements, the points will lie on a sphere.~
~
The mid value is a bit strange here since there isn't really a median that lies on the sphere. Obviously the center of the sphere is the middle value, but that isn't in the allowable range. So, an arbitrary point on the sphere is used.
Definition: if IsNotSpecified(R) then R:=1;~
var u := Normal(0,1,over:I,~
singleSampleMethod:singleSampleMethod); ~
var d := sqrt( sum(u^2,I) );~
ifall d=0 and @I then R/sqrt(size(I)) else r*u/d
Nodelocation: 328,168,1
Nodesize: 86,16
Windstate: 2,151,227,476,424
Paramnames: I,R,Over
Function Multiuniform(corr : Numeric[I,J,Run] ; I,J : IndexType ; lb,ub : optional Numeric[I,J,Run] ;~
Over : ... optional atomic ;~
singleSampleMethod : hidden optional scalar )
Title: MultiUniform ( c, i, j, lb, ub )
Description: The multi-variate uniform distribution.~
Generates vector samples (indexed by I) such that each component has a uniform marginal distribution, and such that each component have the pair-wise correlations given by corr. Indexes I and J must have the same number of elements, corr needs to be symmetric and must obey a certain semidefinite condition (namely that the transformed matrix [ 2*sin(30*cov) ] is positive semidefinite. In most cases, this roughly the same as corr being, or not being, positive semidefinite). Lb and ub can be used to specify upper and lower bounds, either for all components, or individually if these bounds are indexed by I. If lb & ub are omitted, each component will have marginal Uniform(0,1).~
~
The correlation specified in corr is true sample correlation - not rank correlation. ~
~
The transformation here is based on:~
* Falk, M. (1999), "A simple approach to the generation of uniformly distributed random variables with prescribed correlations," Comm. in Stats - Simulation and Computation 28: 785-791.
Definition: if IsNotSpecified(lb) then lb:=0;~
if IsNotSpecified(ub) then ub := 1;~
var R := if I=J then 1 else 2*sin(30*corr);~
var g := Gaussian(0,R,I,J,~
singleSampleMethod:singleSampleMethod);~
Cumnormal( g ) * (ub-lb) + lb
Nodelocation: 132,168,1
Nodesize: 100,16
Windstate: 2,67,106,608,611
Paramnames: corr,I,J,lb,ub,Over
Module Depricated_multi_var
Title: Depricated multi-variate stuff
Description: Functions found in this module are here for legacy reasons. They existed in older versions of the Multivariate library, but have been become obsolete for whatever reason.
Author: Lonnie
Date: Mon, Apr 30, 2007 3:49 PM
Defaultsize: 48,24
Nodelocation: 80,944,1
Nodesize: 56,32
Function Samplecovariance(X ; I : Index ; J : optional Index ; R : Index)
Title: Sample Covariance
Description: This function is obsolete. In Analytica 4.0, the builtin function Variance can be used to compute a covariance matrix. The equivalent of this function would be: Variance( X, R, CoVarDim:I, CoVarDim2:J ).~
~
Returns a covariance matrix based on the sampled data, X, indexed by I and R. (I is the dimensionality of X, R corresponds to the samples). The result will be indexed by I and J -- supply J to be the same length as I.~
~
Note that the mean is simply Average(X,R), and doen't warrant a separate function.
Definition: var I2 := if IsNotSpecified(J) ~
Then (Index K/((identifier of I)&"2") := I do VarTerm(K)) ~
Else VarTerm(J);~
var Z:=X-Average(X,R);~
var Zt := Z[@I=@I2];~
Sum(Z*Zt,R)/(size(R)-1)
Nodelocation: 80,48,1
Nodesize: 48,24
Windstate: 2,222,299,476,297
Paramnames: X,I,J,R
Function Samplecorrelation(X : array[I,R] ; I,J,R : IndexType)
Title: sample correlation
Description: This function is obsolete. A covariance matrix can be computed in Analytica 4.0+ using the built-in function Correlation. The equivalent of this function is Correlation(X,X[@I=@J],R).~
~
Returns a correlation matrix based on data in X, where each data point is a vector indexed by I, and the entries in the correlation matrix are the pair-wise correlations of the columns of data. A second index, J, of size identical to I, is required in order to index the 2-dimensional result.
Definition: var z:=x-average(x,R);~
var zt := slice(z,I,cumulate(1,J));~
sum(z*zt,R) / sqrt(sum(z^2,R) * sum(zt^2,R))~
Nodelocation: 208,48,1
Nodesize: 48,24
Windstate: 2,70,24,523,377
Paramnames: X,I,J,R
Close Depricated_multi_var
Text Multvar_te1
Description: Parametric Multivariate Distributions
Nodelocation: 160,40,-1
Nodesize: 136,12
Text Multvar_te2
Description: Creating an array of mutually correlated distributions:
Nodelocation: 232,368,-1
Nodesize: 200,16
Text Multvar_te3
Description: Creating a single univariate distribution correlated with another existing dist:
Nodelocation: 296,280,-1
Nodesize: 268,12
Function Normal_correl(m, s, r, y: Numeric ;~
over : optional atomic ;~
singleSampleMethod : optional hidden scalar )
Title: Normal_correl(m, s, r, y)
Description: Generates a normal distribution with mean m, standard deviation s, and correlation r with normally distributed value y. In a deterministic context, it will return m.~
~
If y is not normally distributed, the result will also not be normal, and the correlation will be approximate. It generalizes appropriately if any of the parameters are arrays:The result array will have the union of the indexes of the parameters.
Definition: IF r<-1 OR r>1 THEN Error('Correlation parameter r in function Normal_correl(m, s, r, y) is outside the expected range [-1, 1].');~
IFOnly IsSampleEvalMode ~
THEN m + s * (Sqrt(1-r^2) ~
* Normal(Sameindexes( 0, m ), Sameindexes( 1, s ),~
singleSampleMethod:singleSampleMethod ) ~
+ r * (y - Mean(y))/Sdeviation(y))~
ELSE m
Nodelocation: 352,312,1
Nodesize: 108,16
Windstate: 2,102,90,503,416
Paramnames: m,s,r,y,over
Module Multivariate_interna
Title: Multivariate Internal Functions
Author: Lonnie
Date: Tue, May 01, 2007 9:29 PM
Defaultsize: 48,24
Nodelocation: 200,944,1
Nodesize: 52,32
Diagstate: 1,605,145,550,300,17
Function Sameindexes(x, y)
Title: SameIndexes(x,y)
Description: Returns an array with the same indexes as y, and value x in each cell.
Definition: IF y=y THEN x ELSE x
Nodelocation: 120,64,1
Nodesize: 80,20
Paramnames: x,y
Function Uniquerank(X : Array[I]; I : Index)
Title: UniqueRank
Description: Returns the Rank of X along I, but such that the rank assigned is unique for every element. Thus, when there are ties, instead of getting the same rank, as would happen with the Rank(X,I) function, the ranks will be assigned arbitrarily. Consider:~
[ 3, 1, 3, 2, 3, 2, 1 ]~
Ranks become:~
[5,1,6,3,7,4,2 ]
Definition: index Pos := @I;~
var s := SortIndex(X[@I=@Pos],Pos);~
var result := 1;~
for n:=Pos do ( result[@I=s[@Pos=n]] := n );~
result
Nodelocation: 272,64,1
Nodesize: 52,20
Windstate: 2,477,347,537,379
Paramnames: X,I
Close Multivariate_interna
Function Multinormal(m, s: Numeric; cm: ArrayType[i, j,Run]; i , j: IndexType ;~
Over : ... optional atomic ;~
singleSampleMethod : optional hidden scalar )
Title: Multinormal(m,s,c,i,j)
Description: A multi-variate normal (or Gaussian) distribution with mean m, standard deviation s, and correlation matrix cm. m and s may be scalar or indexed by i. cm must be symmetric, positive-definite, and indexed by i & j, which must be the same length.~
~
Multinormal uses a correlation matrix. Compare with Gaussian, which also defines a multi-variate normal but which uses a covariance matrix.
Definition: Gaussian(m,cm*s*s[@i=@j],i,j,over,singleSampleMethod)
Nodelocation: 472,72,1
Nodesize: 84,16
Windstate: 2,391,248,512,343
Paramnames: m,s,cm,i,j,Over
Text Multvar_te4
Description: Reshaped distributions:
Nodelocation: 136,448,-1
Nodesize: 100,16
Function Dist_reshape(x : Numeric[R] ; newdist : all Numeric[R] ; ~
R : optional Index = Run )
Title: Dist_reshape(x, newdist)
Description: Reshapes the probability distribution of uncertain quantity x so that it has the same marginal probability distribution (i.e, same set of sample values) as newdist, but retains the same ranks as x. Thus:~
Rank(Sample(x), Run) ~
= Rank(Sample(Reshape_dist(x, y)), Run)~
In a Mid context, it simply returns the mid value of newdist, with any indexes of x.~
~
The result retains any rank correlations that x may have with other predecessor variables. So, the rank-order correlation between a third variable z and x will be the same as the rank-order correlation between z and a reshaped version of x, i.e.~
RankCorrel(x, z) = RankCorrel(Reshape_Dist(x, y), z)~
~
The operation may optionally be applied along an index other than Run.
Definition: IFOnly IsSampleEvalMode or Handle(R)<>Handle(Run) THEN BEGIN~
VAR dsort := SortIndex(newdist, Run);~
VAR xranks := Rank(x, Run);~
newdist[Run = dsort[Run=xranks]]~
END~
ELSE newdist * (x=x)
Nodelocation: 152,472,1
Nodesize: 116,16
Windstate: 2,102,90,646,469
Paramnames: x,newdist,R
Text Multvar_te5
Description: Arrays with serial correlation
Nodelocation: 208,532,-1
Nodesize: 168,12
Function Normal_serial_correl(m, s, r: Numeric; i: IndexType ;~
over : ... optional atomic;~
singleSampleMethod : optional hidden scalar )
Title: Normal_serial_correl(m,s,r,i)
Description: Generates an array over index i of normal distributions with mean m, standard deviation s, and correlation r between successive values over index i. You can give each distribution a different mean and/or standard deviation if m and/or s are arrays indexed by i. If r is indexed by i, r[i=k] specifies the correlation between result[i=k] and result[i=k-1]. (Then the first correlation, slice(r, i, 1) is ignored.)
Definition: Var x := Normal(0, 1,singleSampleMethod:singleSampleMethod);~
(FOR j := i DO ~
x := Normal_correl( 0, 1, r[i = j],x,~
singleSampleMethod:singleSampleMethod ) ) ~
* s + m
Nodelocation: 160,560,1
Nodesize: 120,16
Windstate: 2,353,325,540,383
Paramnames: m,s,r,i,over
Function Normal_additive_gro(x, m, s, r: Numeric; i: IndexType ;~
over : ... optional atomic ;~
singleSampleMethod : optional hidden scalar )
Title: Normal_additive_gro(x,m,s,r,i)
Description: Adds a normally distributed percent growth g with mean m and standard deviation s to x for each value of index i. The growth g for each i has serial correlation r with g for i-1.
Definition: x *( 1 + Cumulate(Normal_serial_correl(m, s, r, i,~
singleSampleMethod:singleSampleMethod), i))
Nodelocation: 159,600,1
Nodesize: 119,16
Windstate: 2,102,90,519,306
Paramnames: x,m,s,r,i,over
Function Normal_compound_gro(x, m, s, r: Numeric; t: IndexType ;~
over : ... optional atomic;~
singleSampleMethod : optional hidden scalar )
Title: Normal_compound_gro(x,m,s,r,t)
Description: An array of values over time index t, starting from with value x, and with compound growth applied for each time interval, with normal uncertainty with mean m and standard deviation s The growth g for each i has correlation r with g for i-1.
Definition: x * Cumproduct(IF t = Slice(t, 1) THEN 1 ELSE Normal_serial_correl(m, s, r, t, singleSampleMethod:singleSampleMethod ) + 1, t)
Nodelocation: 159,640,1
Nodesize: 119,16
Windstate: 2,102,90,529,366
Paramnames: x,m,s,r,t,over
Function Dist_serial_correl(x; r; i: IndexType ;~
over : ... optional atomic;~
singleSampleMethod : optional hidden scalar )
Title: Dist_serial_correl(x,r,i)
Description: Generates an array y over index i where each y[i] has a marginal distribution identical to x, and serial rank correlation of r with y[i-1]. If x is indexed by i, each y[i] has the same marginal distribution as x[i], but with samples reordered to have the specified rank correlation r between successive values. If r is indexed by i, r[i=k] specifies the rank correlation between y[i=k] and y[i=k-1]. Then the first correlation, r[i=1], is ignored.~
~
In Mid context, it returns Mid(x).~
~
Note: The result retains no probabilistic dependence on x.
Definition: Dist_reshape(Normal_serial_correl( 0, 1, r, i, singleSampleMethod:singleSampleMethod ), x)
Nodelocation: 408,560,1
Nodesize: 120,16
Windstate: 2,302,78,477,447
Paramnames: x,r,i,over
Function Dist_additive_growth(x, g, r: Numeric; i: IndexType;~
over : ... optional atomic;~
singleSampleMethod : optional hidden scalar )
Title: Dist_additive_growth(x,g,r,i)
Description: Generates an array of values over index i, with the first equal to x, and successive values adding an uncertain growth with probability distribution g, and serial correlation r between growth[i = k] and growth[i=k-1]. x, g, and r each may be indexed by i if you want them to vary over i.
Definition: x + Cumulate(Dist_serial_correl( g, r, i, singleSampleMethod : singleSampleMethod), i)
Nodelocation: 407,600,1
Nodesize: 119,16
Windstate: 2,102,90,506,300
Paramnames: x,g,r,i,over
Function Dist_compound_growth(x, g, r; i: IndexType ;~
over : ... optional atomic ;~
singleSampleMethod : optional hidden scalar )
Title: Dist_compound_growth(x,g,r,i)
Description: Starts with x and applies a compound growth g for each value of index i. The growth g for each i has correlation r with g for i-1.
Definition: x * Cumproduct(~
IF i = Slice(i, 1) THEN 1 ~
ELSE (Dist_serial_correl( g, r, i, ~
singleSampleMethod:singleSampleMethod ) + 1)~
, i)
Nodelocation: 407,640,1
Nodesize: 119,16
Windstate: 2,102,90,489,307
Paramnames: x,g,r,i,over
Text Multvar_te6
Description: Distributions on Linear Regression coefficients
Nodelocation: 296,688,-1
Nodesize: 256,12
Function Regressionnoise( Y : Numeric[I,Run] ; B : Numeric[I,K,Run] ; I,K : Index; C : optional Numeric[K,Run] )
Title: RegressionNoise(Y,B,I,K,C)
Description: When you have data, Y[I] and B[I,K], generated from an underlying model with unknown coefficients C[k] and S of the form:~
~
Y = Sum( C*B, I) + Normal(0,S)~
~
This function computes an estimate for S. ~
~
When using in conjunction with RegressionDist, it is most efficient to provide the optional parameter C to both routines, where C is the expected value of the regression coefficients, obtained from calling Regression(Y,B,I,K). Doing so avoids an unnecessary call to the builtin Regression function.
Definition: if IsNotSpecified(C) Then C := Regression(Y,B,I,K);~
Var resid := Y - Sum(C*B,K);~
sqrt( Sum(resid^2,I) / (size(I)-size(K)) );~
Nodelocation: 384,736,1
Nodesize: 104,20
Windstate: 2,332,211,498,542
Paramnames: Y,B,I,K,C
Function Regressionfitprob( Y : Numeric[I,Run] ; B : Numeric[I,K,Run] ; I,K : Index; C : optional Numeric[K,Run] ; ~
S : optional Numeric[I,Run] )
Title: RegressionFitProb(Y,B,I,K,C)
Description: Once you've obtained regression coefficients C (indexed by K) by calling the Regression function, this function returns the probability that a fit this poor would occur by chance, given the assumption that the data was generated by a process of the form:~
~
Y = Sum( C*B,K) + Normal(0,S)~
~
If this result is very close to zero, it probably indicates that the assumption of linearity is bad. If it is very close to one, then it validates the assumption of linearity.~
~
This is not a distribution function - it does not return a sample when evaluated in Sample mode. However, it does complement the multivariate RegressionDist function also included in this library.~
~
To use, first call the Regression function, then you must either know the measurement knows a priori, or obtain it using the RegressionNoise function.~
~
Var E_C := Regression(Y,B,I,K);~
Var S := RegressionNoise(Y,B,I,K,C);~
Var PrThisPoor := RegressionFitProb(Y,B,I,K,E_C,S)
Definition: if IsNotSpecified(C) then C:=Regression(Y,B,I,K);~
if IsNotSpecified(S) then S:=RegressionNoise(Y,B,I,K);~
var resid := Y - sum(C*B,K);~
var n := size(I);~
var chi2 := sum( resid^2 / Mean(S)^2, I);~
GammaI( n/2 - 1, chi2/2 )
Nodelocation: 152,800,1
Nodesize: 112,20
Windstate: 2,287,69,586,548
Paramnames: Y,B,I,K,C,S
Close Multivariate_distrib
Close Intro_to_rank_correl