Havel and the Semantic Web
Image by Duncan Hull via flickr
Project Samarai’s graph language Havel does not use Semantic Web technologies like RDF and OWL, but there is compelling technical and conceptual reasons why.Havel and the Semantic Web are not in competition, they have different approaches that fit their intended uses and they can complement each other.The Semantic Web will be integrated into Havel, users will be able to enrich their infoverse with information available on the Semantic Web.Havel can be seen as a front-end to the Semantic Web, enabling consumers to use the Semantic Web in their daily lives.
The Semantic Web
As the W3C describes it, the Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It promises a world in which semantic information drive autonomous machines that make our lives much easier.
In the Semantic Web, content providers publish semantic data in form of content databases or as annotations to traditional web content. That data is consumed by client services and applications that largely run on servers, at least outside of the academic and business world. Today, the Semantic Web is almost inexistent on consumer devices, the “magic” usually happens somewhere in the cloud, we only see results in form of web pages and client applications. This is a bottom-up approach. As a consequence, end users rarely come into contact with semantic data and they almost never create semantic content outside of the feature-box of applications. For consumers, the Semantic Web is invisible.
There is, of course, nothing inherently wrong with this approach. It is an inspiring and grandiose idea, the Semantic Web is today an established standard that is driving a lot of very useful things. But one can argue that the Semantic Web has failed to deliver some of its promises. It is too challenging for developers to implement and it is too abstract for consumers to understand. This is very unfortunate, the technology has enormous potential that is largely unused.
Project Samarai
Project Samarai has with Havel a top-down approach to semantic information processing: We want to establish semantic information on consumer devices. We want to enable users to create and use semantic data in their daily life. We want them to manage their contacts, shopping list, pictures, projects, businesses and communities with semantic data. We believe that only when semantic information become omnipresent, when most of the data we use on a daily basis become semantic, will we be able to fulfill the promises of the Semantic Web.
In order to do that, we need semantic technology to run on consumer devices. Running it exclusively in the cloud is not an option for us, besides that the internet is not always and everywhere accessible it would be a privacy nightmare and a waste of available processing power. Our technology is specifically designed for this approach. We are using Universal Data and Havel instead of RDF, OWL and related technologies.
There are technical reasons for this which I will discuss shortly. Before I do so, I want to point out that Universal Data and Havel are compatible with the Semantic Web. OWL and RDF can be imported into Havel, there will be mechanism in the language that can implement rules and a SPARQL compatible query language or implementation will be available. The other way around, it will be possible to export Havel expressions into RDF. Havel does not want to be an enclosed world, importing and exporting from and to other formats will be central to its development. But there are features in Havel that would be very difficult to implement with other technologies, if not impossible.
Fig1: Information exchange between information universes, Semantic Web and closed sources
UUID vs. URI
Our maybe most obvious deviation from Semantic Web technology is to use Universal Data instead of RDF. These technologies are very closely related, both use triples of unique identifiers in order to express information. The main difference is that Universal Data uses UUIDs instead of URIs. The main reason behind this is performance. Universal Data tries to be closer to the hardware than RDF because of its focus on consumer hardware. A UUID is a 128-bit binary, an URI is a variable length string; it is simply more effective to process fixed length binaries than URIs.
Additionally, Havel's intended use as a language for consumers as well as the way how Havel expresses information creates huge amounts of user-objects and anonymous constructs that need globally unique identifiers. UUIDs are ideal for that, solving that with urn::uuid would be possible but it would slow down processing comparatively to using binary UUIDs.
There is also a conceptual difference between RDF and Universal Data, or rather RDF and Havel in this case. Havel can express information as well as processing instructions, we see Havel as a semantic abstraction layer between the hardware and applications rather than as an application itself; or as a kind of semantic instruction set. It is simply much more effective if the source, the interpreter and the runtime environment can all work with one native data type.
While just speculation at this point, there is also the future possibility of hardware accelerated semantic data processing, a technology that would most likely require a native datatype for semantic tokens, a 128-bit long integer seems to be a reasonable candidate for that.
There would, of course, have been very good reasons to go with RDF. Primarily, using an established format would be good practice. I tried to explain why we diverted in the previous paragraphs. Then there is human readability, XML can be edited using any text editor, whereas a Universal Data source is a binary format that requires specialized tools. We argue that it is only a question of how common Universal Data will become. Nobody expects a JPEG image or CAD file to be editable using a text editor, we use specialized applications for that. The same principle applies to Universal Data.
Havel vs. RDF/OWL
Havel has been designed as a universal, all-purpose language. Havel unifies information modelling, ontology, schema, rules and processing instructions. This universality is the single-most important concept in the language design, it is what enables information-centric computing, protocol-free communication and semantic social networks. And it is this concept that will enable semantic information processing for consumers. Against this background, it is inevitable to divert from Semantic Web technologies.
There is other things that Havel does differently. First and foremost, all information are subject to interpretation in Havel, facts and even meaning can change depending on context. Its expressive abilities allows highly complex abstracts of reality, reflecting details and subtleties that would be difficult to depict using other technology. The interpretation process reduces this complexity to a desired level, it resolves contextual expressions, calculates dynamic values, creates structure, translates into natural language, applies rules and checks integrity. We believe this is a natural approach to information processing as it resembles the process in our own brains when we evaluate complex situations. The interpreter is not an external tool but an integral part of a comprehensive framework that allows creation, search, edit, interpretation, analysis and execution of Havel content.
Fig2: A raw Havel expression, object hPeter has a contextual name, the contextual values are indirected and annotated
Fig3: The same object hPeter as in Fig1 after interpretation for english language
Havel has a build-in upper ontology which is inspired by human logic. The language itself is part of that upper ontology, all concepts and grammars that together make up Havel are rooted in it. Many general constructs of human logic are implemented natively, users can build custom functionality on top of that. The border between ontology, schema and data is less clearly defined in Havel than in the Semantic Web, this fuzziness enables expressive freedom and detailed semantics in user context.
Fig4: The border between Havel's grammars is fuzzy
Havel will implement many information management and processing features natively. It will be able to qualify information, it will be able to integrate false and misleading information and it will have mechanisms for automatic information trustworthiness assessment. A native state and flow framework will allow workflow management and a governance framework will manage computer aided, automated collaboration between different users and computers. Native functions, scripts, modules, event handling, agents, plug-ins and a UI-framework will drive Havel-based applications.
Havel's universality enables new approaches to application design and information processing in general. Havel is a vision of application independent data, information centric computing, protocol-free communication and collaborative information networks.