Data Science

The DIKW pyramid is often quoted though there is no consensus on what data, information or knowledge mean [1]. We can all agree that in computers, they are just ones and zeros, but for ones and zeros to make sense and be useful, some interpretation must be imposed. Like, “this 16-bit integer represents gallons of fuel remaining in the tank”. We can argue about whether a number, table column, class or attribute comprises data, information or knowledge, but it will always need human language to bind it to reality – usually by giving it a descriptive name like gallonsFuelRemaining.

Unfortunately, this binding to reality is literally, all in our minds. We learn language in social contexts, which is the reason why we understand each other. We do not understand each other because of some external relationship between language and reality [2] . This means that any data structure of any kind can’t be more faithful to reality than what its nomenclature tells us in natural language terms. This is a hugely important point. Models are a perspective on reality, whether expressed in UML or ontologies , are simulated, purpose-built, well-characterized and computable worlds. But they are connected to the world only by the descriptive intent of the names in the model.

Ironically, computational models pay no attention to this vocabulary beyond its function to provide unique identifiers. Truenumbers believes the contrary, that description, being the intent and purpose of knowledge representation, should be the focus of a knowledge base. Computationally useful perspectives for inference, command, control or any other purpose can then be based ad-infinitum on those descriptions. To construct a perspective a-priori which then constrains knowledge representation to that perspective seems to us to be exactly wrong.

