How truenumbers work
Truenumbers is a decentralized data representation intelligible to humans and machines. It unifies the way users, developers and every part of a system talk about information, independent of storage implementation. Truenumbers is based on these principles:
- Information is a collection of portable values each with its own description
- Descriptions are represented in a structured natural language for machines and people
- Domain knowledge can emerge dynamically from these descriptions
- Physical quantities and units of measure are built in to represent real-world data
A truenumber is an atom of data
We're accustomed to thinking of data as many values stored in a database, with a schema, object model or ontology giving meaning and authority to them collectively. When we extract values out of a database, their meaning and authority stay behind. But what if we turned that around so the data items themselves carried meaning and authority along with them?
A truenumber is a single, immutable data value with a description of itself encrypted inside. No central authority is needed to understand it, or to prove its validity. Truenumber descriptions are composable, so that having lots of truenumbers builds up a picture of a domain without any a-priori schema.
In this way, a collection of truenumbers in a data store is like an upside-down database, where the domain representation emerges dynamically from the values, instead of values conforming to a fixed data model.
Quick look: a truenumber in Excel
A truenumber is described in a simple language that's natural and easy to learn. We'll define that language in more detail later, but here's an example of making a truenumber manually using the TrueOffice Excel add-in. We simply write the following into a spreadsheet cell:
[estimated construction cost of the new data center = 35 USD millions].
This is sent to the cloud where it's compiled, stored, and returned to your spreadsheet cell as a truenumber. There, TrueOffice grabs the descriptive DNA inside and uses it to generate a cell comment automatically.
The truenumber looks like the number "35" to Excel, so the spreadsheet will work normally for any Excel users with or without Truenumbers. Copy it to Word or email and its DNA goes with it where TrueOffice can use it to generate a sentence or a footnote.
Having self-contained data like truenumbers opens up many new possibilities. Each truenumber has a subject, author and creation date to help you organize and search them, so you could find all truenumbers that have "new data center" as their subject, or search for "costs" or "construction costs". Convert from USD to any other currency because our number knows it's in millions of dollars. Truenumbers can also be tagged, which is the way to organize truenumbers on the fly, and create business processes without programming that provide better governance than your best enterprise software.
Computer data is just bits. To make sense and be useful, there must be natural language descriptions of it, like labels on classes and program variables, concept names in ontologies and prompts in user interfaces. All these are just comments, or "descriptive names" for human developers or users, and ignored by system code. Truenumbers elevates the descriptive connection between data and domain to be the core of its data representation for the system as well as for people. Let's see how this works. Here's another example of a truenumber sentence:
“the antenna of the Chrysler building has nominal height = 71 feet“
This looks like human natural language, but Truenumbers is restricted to statements of a very specific form, that give a value for a property of something. One of these allowed sentence forms is <subject> has <property> = <value>. In the sentence above, the subject is "antenna of Chrysler building" and the property is "nominal height".
Subject and property get encoded as special strings called Structured Resource Descriptors (SRD). An SRD is a path-like sequence of words separated by colon ( : ) and forward-slash ( / ) operators. The colon operator encodes an adjective-noun relationship so building:Chrysler would be the SRD for the phrase “Chrysler building”.
The slash operator acts like “of” used in English used to mean belonging to, or part of. So, the phrase “antenna of Chrysler building” has SRD building:Chrysler/antenna and is the subject of the sentence. The property measured by a truenumber is encoded as an SRD too. It’s OK to use SRD’s in sentences if you want, instead of equivalent phrases, so this example could also be written:
"building:Chrysler/antenna has height:nominal = 71 ft"
JSON and tags
So, a truenumber consists of the subject the fact is about, a property of that subject, and the value of that property. Truenumber sentences are convenient for people to read and write, and they correspond to a data structure for the computer. This structure can have different implementations for storing in different databases, but the standard representation is JSON, used by the Truenumbers APIs for systems generating and querying millions of truenumbers. In simplified form, a JSON truenumber looks like the following:
In the JSON above, we notice an array of SRDs labeled “tags”. A truenumber can be decorated with any number tags to enrich, qualify or classify it The tags shown tell us that the Chrysler building is located in New York, and is associated with a truenumber demo project. Tags allow us to aggregate facts or create relationships among them. The subject, property and tag SRDs are part of the data, and as any path-like strings, they excellent keys for indexing and search, and can be converted to phrases for user-interfaces.
Emergent domain knowledge
SRD subjects, tags and properties shape the knowledge contained in truenumbers. Given a bag of truenumbers, the SRDs tell us what subjects are being talked about, what sort of properties are of interest, and so forth. SRDs compose naturally to form trees, useful tools for managing and visualizing the vocabulary of a domain. As we gather more facts about buildings, we might find that the building tree has hundreds of branches, one for each building. Yet, an SRD is computationally very light-weight, being only a string, so vocabularies can be complex and large. You can choose to lock down your vocabularies up front, let them grow organically or anything in-between.
Truenumbers are statements of measurement
In a database, person's height would be a floating-point number, and the data model would imply in some way that the number was a height, and in what units. For example, a column-name like HEIGHT_INCHES. Truenumbers, instead, have units of measure and tolerances built into all numbers, and the property measured is part of the data as an SRD. This lets data benefit from the fact that physical quantities are a kind of "standard" that real-world data naturally adheres to, giving data from differing domains a baseline for comparisons and cross-domain analytics.
This picture shows an error being reported when trying to create a truenumber where the units don't match the property. In this case, "ft" is known to be a unit of length, not area. It would have to be "ft^2" or "in^2" or "acres", etc. to create the number.
Which unit of area you choose doesn't matter because Truenumbers internal math compares all values in standard SI units no matter how they are expressed.
The truenumber internal math engine is both units and tolerance aware, and can be used to define one truenumber as an expression combining other truenumbers. The example at left show a truenumber for the area of an angled metal bar defined in terms of the bar dimensions. Tolerance on the bar length is reflected in the uncertainty on the resulting area.