Parsing and formatting data
• • • • • • 6.0
Analytica has tools for reading and writing data in most common formats, including CSV (comma-separated values in a table), XML, JSON, spreadsheets, and relational databases. This page is an overview of how to read and write files in CSV, XML, and JSON, plus some additional functions for parsing custom data formats. Normally, you start by reading a data file using ReadTextFile, or ReadFromURL to read from a web page, and then use functions described below to parse the data. To write a file, you use the corresponding function to generate the desired format, and WriteTextFile to generate the file. See sections on spreadsheets and databases for how to access data in those formats. You can also read and write binary files with ReadBinaryFile and WriteBinaryFile.
Data in rows and columns (CSV)
A CSV file organizes data as a table of cells organized into rows and columns. Each "cell" contains one datum, which may be a number, a date, or text. The first line of the file may (or may not) contain column headings rather than actual data. This format is usually called a CSV, which stands for Comma-Separated Values, because the cells are often separated by commas. But, CSV files may also use other separators, such as the tab character. The ParseCSV and MakeCSV functions (introduced in Analytica 5.0) make it easy to parse or produce CSV.
Even though CSV is one of the most widely used data formats, there is no official CSV standard. While all CSV conventions have a lot in common, particularly table structure, there are many details that can vary, in addition to the separator. Key variations include when quotes are needed around cells, an escape character to allow separator, new-line, and quotes within a single-cell text value. ParseCSV and MakeCSV functions use Excel's conventions by default, and offer a lot of flexibility with optional parameters to handle other CSV conventions. They also can convert from text to numbers or dates, and vice-versa.
Here's how to read and parse a CSV file using Excel conventions, including commas as separators:
The result is a 2-D array, indexed by local indexes named
For a CSV file that uses a tab character as a separator, use
ParseCSV includes many other options, including whether to use an existing index for the column index or row index, instead of the local indexes
.Column , to get the row index labels from a specific column in the data, other quoting conventions, different international/regional conventions, or to extract only a subset of the columns. See ParseCSV for details.
To write a 2-D array,
x, to a CSV file, use:
J are the indexes of
x. To write a tab-separated file, use
See MakeCSV for details on other options.
The eXtensible Markup Language is a flexible (but verbose) standard for data encoding. The standard nails down all the encoding details, but leaves the specific schema specification to the application, so the actual structure of the data is virtually unlimited. Hence, it is quite common that data in an XML file has a rich structure that may look quite a bit different than a rectangular array.
A single richly-structured XML source may contain many pieces of information that you can extract, which typically will fit well into arrays and indexes. A good way to think about XML data is that you'll extract information from it from a series of "queries".
Typically, you read in the XML file into an Analytica text variable, and the use the Microsoft XML DOM parser to generate an XML object, that offers a rich set of queries to access that object, for example:
Variable XML_text := ReadTextFile("My File.XML") Variable XmlDoc := Var d := COMCreateObject("Msxml2.DOMDocument.3.0"); d->async := False; d->loadXML(XML_Text); If (d->parseError->errorCode <> 0) Then Error(d->parseError->reason); d
After loading the data into the XML DOM, you extract data via a series of queries using XPath expressions. XPath is a extremely rich and powerful query language, making it suitable to the wide variations in schema among XML files. The methods and properties of the XML parser are documented at Microsoft XML DOM Parser API. Here is one useful pattern to extract all the XML tags from a
Variable PeopleNodes := Var nodes := xmlDoc->selectNodes("//*/row"); Index Row := 1..nodes->length; nodes->item(Row-1)
nodes is an array of IXMLDOMNode objects. If each of these
<person> tag contains a single tag
<age>, you can extract the age using
Because XML schemas are so open-ended, there is no generic
MakeXML function. But, it is relatively easy to write Analytica code to concatenate your information using the Text Concatenation Operator: & and JoinText. When including text, you should XML-encode it in case it contains XML-reserved characters. For example:
For numbers or dates, use NumberToText to control the number format used for the numbers.
A second method for creating XML, which we have found less convenient, is to use the same Microsoft DOM that we use for parsing XML. Methods within the DOM let you to add and modify tags and content. After making these changes, you can write the XML to a file using the save method or extract the XML text using the text property.
In the simplest usage, you can parse a JSON data file without specifying a schema, e.g.,
The function infers the class structure from the data itself and, in general, returns tree-structured data using references and local indexes. There are several ways to map JSON collections into Analytica data structures. It is usually convenient to use existing indexes from your model for class instance data within the JSON. For more control over how the JSON data structures are processed, you can use an explicit schema, that describes the file structure and how it maps to your own indexes. See Parsing JSON with a schema.
To write data with very simple structure to JSON, use:
x with indexes
I, J, K, L, and
For details, see MakeJSON.
Custom data formats
When another application uses a non-standard textual data format (i.e., something other than rows and columns of cell (CSV), Excel files, Analytica import-export format, XML or JSON), you can "program" the parsing of the data yourself. Start by reading the textual data with ReadTextFile or ReadFromURL. You can then parse it using text Text Functions, especially FindInText, SplitText, ParseNumber and ParseDate. The task is often greatly facilitated by using Regular expressions.