People talk about data easily using natural language, but in software, things get complicated.  Database schemas, object models, spreadsheets labels, variables in program code, XML, UML, OWL, JSON ... each has its purpose in the tower of data Babel.

Truenumbers is a higher-level, universal approach to data, intelligible to humans and machines, that unifies the way users, developers and every part of a system talk about information, independent of implementation.  It is based on two principles:

  • Treating data as a collection of independent, atomic facts
  • Expressing those facts in a structured natural language that both humans and machines can read
atomData.gif

Atomic theory

A truenumber is a data value sealed in a secure package together with its own description.  A single atom of data.  Once it is created it can't be changed, so you know that it hasn't been tampered with, and you can never lose track of what it means.  

Truenumbers can be created by the millions in programs and back office systems, but the TrueOffice add-in connects Excel, Word and PowerPoint to the Truenumbers system so you can make and use truenumbers directly using tools you're familiar with.

Truenumbers are specified in a simple language that's natural and easy to learn.  We'll talk about that language in more detail later, but here's an example.  Using the TrueOffice Excel add-in, we write:

                         "the estimated construction cost of a new data center = 35 USD millions".

MakeTnum1.gif

TrueOffice sends this to the cloud where it's interpreted, stored and returned to your spreadsheet cell as a truenumber. The metadata DNA locked inside it is used to generate a cell comment.

The truenumber still looks like "35" to Excel, so the spreadsheet will work normally for any Excel users without Truenumbers.   Copy it to Word or email and its DNA goes with it where it can generate a footnote.

 

Having self-contained data like truenumbers opens up many new possibilities.  Similar to email, each truenumber has a subject, author and creation date to help you organize and search, so you could find all truenumbers that have "new data center" as their subject, or search for "costs" or  "construction costs".  Convert from USD to any other currency because our number knows it's in millions of dollars.  Truenumbers can also be tagged, which is the way to organize truenumbers on the fly, and create business processes without programming that provide better governance than your best enterprise software.  

Talking facts

Now let's take a deeper look at the Truenumbers language.  Here's another example of a truenumber specification:

                  “the nominal height of the Chrysler building’s antenna is 71 feet

This looks like human natural language, but is restricted to structured statements giving a value for a property of something.  Because it is structured, it support storage, computation and search as well or better than relational or object based data can.  Truenumbers can express facts about anything, depending on the vocabulary you choose to use. No data model or schema is required. Let’s see exactly how this language is structured. 

The heart of the language is the Structured Resource Descriptor (SRD), which is a way to encode noun phrases that in turn, are used to write truenumber sentences.  An SRD is a path-like string of words separated by colon ( : ) and forward-slash ( / ) operators. The colon operator denotes an adjective-noun pair, basically an “IS-A” relationship. For example, building:Chrysler is the SRD for the phrase “Chrysler building”.  Note that the noun is first in the SRD – the reverse of the English phrase. 

The slash operator acts like the preposition “of” used in English to denote belonging to, or a “HAS-A” relationship. The phrase “antenna of Chrysler building” is therefore equivalent to SRD building:Chrysler/antenna and represents the subject of the example fact above.  It begins with “nominal height of” telling us that height:nominal is the property of the antenna this fact specifies.  We represent properties as SRDs too.   It’s OK to use SRD’s in sentences if you want, instead of equivalent phrases, so this fact could also be written:


                "building:Chrysler/antenna has height:nominal = 71 ft"

So, a truenumber consists of the subject the fact is about, a property of that subject, and the value of that property. Truenumber sentences are convenient for people to read and write, and they correspond to a data structure in the computer. This structure can have different implementations for storing in different databases, but the most portable representation is JSON, the format used by the Truenumbers APIs. In simplified form, a JSON truenumber looks like the following (the actual format for the API is a bit more complex):

{
  subject: "building:Chrysler/antenna",
  property: "height:nominal",
  value: {
    type: "numeric",
    magnitude: "71",
    units: "ft",
    dimension: "length"
  },
  tags: [
    "location:USA/city:New_York",
    "project:demo:truenumber"
  ]
}


In the JSON above, we notice an array of SRDs labeled “tags”. A truenumber can be decorated with any number tags to enrich, qualify or classify it The tags shown tell us that the Chrysler building is located in New York, and is associated with a truenumber demo project. Tags allow us to aggregate facts or create relationships among them.

It’s SRD subjects, tags and properties that shape the knowledge in truenumbers. Given a bag of truenumbers, the SRDs tell us what subjects being talked about, what sort of properties are of interest, and so forth. SRDs read left to right, from the general to the specific, so they also make excellent keys for indexing or wildcard searches, and they naturally form trees, for example: These are useful tools for visualizing the vocabulary of a domain. As we gather more facts about New York, for example, we might find that the building tree has hundreds of branches if we’re talking about many buildings. But an SRD is computationally very light-weight, being only a string, so vocabularies can be dynamic, complex and large.