Knodules Plaintext Microformat

The plain text representation of knodules is often the simplest way to describe terms and their relationships. This representation is also used implicitly in the HTML and XML microformats. The top level of the plaintext representation is simple. A plaintext knodule is a string of unicode characters (in some encoding) which consists of one or more entries separated by semicolons. Each entry consists of one or more clauses separated by vertical bars. The plaintext representation collapses all whitespace into a single space character and removes leading and trailing spaces from all clauses.

Leading punctuation in the plaintext format is used to specialize entries and clauses. An entry without any leading punctuation is called a subject entry and it's first clause identifies the subject. Leading punctuation can be escaped with a backslash, as can the separator characters of semicolon and vertical bar.

Terms. The most basic component of any knodule is the term. A term is a string of unicode characters with no leading or trailing spaces and normalized by collapsing whitespace sequences into a single space character (0x20) and normalizing the Unicode encoding.

Domains. Knodules are always specific to a particular context called a domain. Domains look just like Internet hostnames: names or numbers separated by periods. One special kind of term is defined relative to a particular domain: a dterm is a term which represents a single unique meaning with respect to a domain. Dterms are displayed like like “this” in this document.

DTerms can be regular natural language words or phrases (like dog) if the word or phrase is unambiguous in the domain. Some dterms are natural compounds (like river bank) when another term (like “bank”) may be ambiguous. Constructed dterms are artifical terms which disambiguate the meaning of a base term, for example bank:institution.

Translations. Every knodule has a default language, but the syntax $langid$string can be used to refer to string in a language identified by langid (an ISO639/1 language code). For example, “$fr$chien” refers to the french word “chien.” This syntax can be used anywhere a term (or dterm) occurs as well as with descriptive glosses or references to external URIs or tags.

Entries and clauses. The plain text representation of knodules consists of one or more entries separated by semicolons (";" unicode 0x3b). Entries starting with punctuation may have special interpretations but subject entries (which don't start with any punctuation) consist of a head dterm followed by zero or more clauses separated by vertical bars (|).

Kinds of clauses. Clauses describe relations of dterms with natural language synonyms, other dterms, and rules for disambiguating ambiguous natural language into dterms. Clauses are distinguished by their initial punctuation and any clause without initial punctuation indicates a simple synonym of the subject. The asterisk (*) and tilde (~) characters can be used as modifiers (called 'major' and 'minor' respectively) on either synonyms or other relations. When applied to synonyms, for example, a major synonym indicates a term which is commonly used for the corresponding dterm; a minor synonym indicates a term which is a 'search hook' that might indicate the subject but is not a real synonym.

In addition to the synonym relation between dterms and terms, there are seven kinds of relationships between dterms:

=*dtermlogically equivalent
=~dtermvaugely equivalent

There are eight kinds of simple clauses:

There are three kinds of compound clauses:

Strictness. A knodules application can interpret the microformat strictly meaning that any non-head or non-synonym reference must be an explicit dterm (i.e. something which occurs in the head of an entry). A non-strict interpretation allows non-dterms to be used as dterms proving that:

  1. The term is a synonym of a known dterm which is unambiguous in the domain;
  2. the strictness can be disabled generally, for a particular domain, or for a particular block of entries; and
  3. the application signals an exception when a term assumed to be unamibugous (1, above) is found to be ambiguous.

Complex entries. Entries starting with punctuation may be interepreted especially. The simplest of these consists of a normal entry preceded by an asterisk and indicates that the subject of the entry is a key concept in the domain. This fact be may be used in analytics, presentation, or interaction.

The two current kinds of complex entries are disjoins and taxpaths.

Knodule Blocks are groups of entries together with comments or a small set of control statements. A control statement looks like setting=value. Currently the only valid settings are domain, lang, and strict.

Comments are either line comments or block comments. Line comments are prefixed by some whitespace and either "#" or "//". Block comments are enclosed by "/*" "*/" and may be nested.

Special characters. Any of the special characters can be escaped with a single backslash. Note that any special characters outside of their designated constructions can be used without escaping. For example, since the ampersand constructions are always followed by dterms (which don't have leading spaces), ampersands followed by spaces do not need to be escaped (e.g. as in “Penn & Teller”).

In addition, the standard C escape sequences (such as \n) are interpreted as the corresponding character and an escaped newline is a line continuation character causing the newline and any succeeding whitespace to be ignored.