SipEncode


SipEncode(x, i, name, type, origin, version, useDll)

Encodes a Stochastic Information Packet (SIP), that is a random sample of numbers, as XML, using the DISTTM 1.0 Distribution String standard introduced in:

Sam Savage, Stefan Scholtes and Daniel Zweidler (Feb 2006), "Probability Management - Part 1", OR/MS Today 33:1.

SipEncode(x) takes a 1-D vector of numbers, «x», by default an uncertainty sample indexed by Run, and returns an xml text value that encodes the vector of numbers in a compressed form. The compression loses some precision (except for the type = "Binary" case, which is non-lossy when the vector contains only 0s and 1s). SipDecode decompresses the xml SIP, returning a vector of numbers. The result is usually very close to the original, and preserves the average value precisely.

SipEncode fully supports the DIST 1.0 standard, and an improved 1.1 standard. It also offers more precise options that may be included in a proposed DIST 2.0 standard.

Parameters

«x» 
A vector of numbers indexed by «i», or by Run if «i» is omitted.
«i» 
(optional) The index of «x». Defaults to Run.
«name» 
(optional) The name of the distribution. If «x» is an identifier, it defaults to «x»; otherwise, when «x» is a general expression, it defaults to the name of the variable being defined with SipEncode in its definition.
«type» 
(optional) Determines how precisely the values are encoded in the compression, and must have one of the following values:
"Single" (or "S")
Uses about 1 1/3 characters per number. It is a lossy encoding - the values recovered will be slightly different from the originals. Uses up to 256 distinct bins.
"Double" (or "D")
Uses about 2 2/3 characters per number and preserves the original numbers with greater precision than "Single". Uses up to 65536 distinct bins.
"Binary" (or "B")
Encodes Bernoulli outcomes (true/false, i.e. 1 or 0), encoding 6 binary values per character.
"Integer" (or "I")
Not currently part of the DIST 1.1 standard but is expected to be part of the DIST 2.0 standard. Encodes integer values exactly, where the characters per number depends on the integer range (e.g., if the difference between min and max is less than 4098, then the encoding uses 2 characters per integer).
"Float" (or "F")
Not part of the DIST 1.1 standard, but may be in DIST 2.0. It uses 5 1/3 characters per number and is exact up to a precision of about 9 decimal digits with numbers in the range of 10-37 to 1038 (an exact encoding of an IEEE 32-bit floating point number).
«origin» 
(optional) Documentation included in the xml specifying the source for the data or assessment.
«version»
(optional) Select the DIST standard, version: 1 is DIST 1.0. version: 1.1 is DIST 1.1 (the default).
«useDll» 
(optional) See "Using an external DLL to compute" below.

Uses

SipEncode is useful when you want to store a vector of numeric data in an external database. An entire 1-D vector can be stored in a single textual database record field in a compressed form. This may save space in the database, relative to storing every number explicitly, and can be extremely convenient to use. When using in this fashion, remember that the compression is lossy. If your data is a Monte Carlo sample, where uncertainties were already assessed, the loss of precision is likely to be insignificant; hence, its use for storing Monte Carlo samples is compelling.

Using an External DLL

This function, and SipDecode, can be used without the use of any external DLL. However, Analytica's built-in compression implementation can be replaced by one implemented in an external DLL. The hook is here to allow researchers within Sam Savage's research group to utilize their own implementation, or to change the implementation or encoding standard should the need arise.

To use an external DLL, you must set a registry setting to point to the DLL file. To do this, run regedit.exe and navigate to the following hive:

  • HKCU/Software/Lumina Decision Systems/Analytica/4.2 (if using Analytica 32-bit)
  • HKCU/Software/Lumina Decision Systems/Analytica/4.2x64 (if using Analytica 64-bit)

Then use New String Value to create a value named Sip.dll and specify a complete file path to the DLL file as the value. If you are using Analytica 32-bit, you must use a 32-bit DLL, likewise, if you are using Analytica 64-bit, your DLL must be compiled as 64-bit. The DLL must export a function with the following prototype, which is then called by this function:

void __stdcall CompressDst(double sip[], int sipSize, wchar_t* sipName, DstType,
wchar_t* dstOrigin, wchar_t* outputStr)

If you have a DLL configured as described here, SipEncode will call the exported function CompressDst to perform the encoding by default. You can force the use of Analytica's built-in implementation by specifying the «useDll» parameter as false. If a DLL is not configured, the specified DLL file not found, or the DLL found does not export a function named CompressDst, the «useDll» parameter is ignored and Analytica's built-in implementation is always used.

Version

The «version» parameter must be 1 or 1.1, for DIST 1.0 and DIST 1.1. The DIST 1.1 standard fixes some flaws in the binning algorithm that cause a loss of precision when numeric ranges are small. Analytica uses the 1.1 algorithm by default, but if you need to send a dist to an application that uses the DIST 1.0 standard (supported by @Risk, XLSim, Frontline, and Oracle), you'll need to encode with version: 1.

  • To convert a DIST 1.0 to DIST 1.1, use: SipEncode(SipDecode(X))
  • To convert a DIST 1.1 to a DIST 1.0, use: SipEncode(SipDecode(X), version: 1)

In either case, you may need to add the index (if it is something other than Run) or other parameters as desired.

The version: 1.1 routine uses an improved binning algorithm for "Single" and "Double" «type»s that works substantially better when the range of values is less than 2. Unlike DIST 1.0, the lossiness of the "Single" and "Double" encoding does not vary with the units of measurement. The error introduced from encoding and decoding a Normal(0, 1) sample is about 10% larger using version: 1 compared with version: 1.1, and roughly 100% larger on a Uniform(0, 1) distribution. On a Uniform(0, 1m) distribution, the error using version: 1.1 three orders of magnitude smaller than with version: 1, using the current prototype. When the range of data («x»max - «x»min) exceeds 2.0, the differences are small.

History

Introduced in Analytica 4.2.3.

See Also

Comments


You are not allowed to post comments.