The Syntax of TreeCode

Overview

TreeCode is a little more than a markup language, tending toward a programming language. It is a way to model information and computation in an easy to read and write format, suitable for hierarchical note taking and other means of capturing data down into structured form.

It emerged out of the desire to have one way of writing things (notes, code, data models, etc.), that was not too verbose and was easy to learn with very few rules. Writing in all lowercase without having to use the shift key streamlines your typing so you can stream out your knowledge the most quickly. Existing markup languages like XML, JSON, and YAML are too static and don't let you define things in as concise a way as possible. It is indentation-based, and the typical style of a DSL is to write it in a somewhat repetitive way to give the quick visual cues as to the meaning of things. But the way you use it is entirely up to you, those are just style conventions.

Example

First, some images of different usages of TreeCode. Here are a few examples from existing code. The first is how you might define a "deck" (a "package" of Link code). The second is the first part of the Tao Te Ching captured in a tree, and the later we show how you might write a simple fibonacci function. These examples are DSLs designed for a specific purposes. As you will see in the syntax section, the Link language is independent of a DSL and simply defines some simple idioms for defining trees of text.

Here is a package definition:

deck @termsurf/base
  mark <0.0.1>
  head <A TreeCode Package Manager>
  term link-text
  term computation
  term philosophy
  term information
  term platform
  term white-label
  term compiler
  face <Lance Pollard>, site <foo@bar.com>
  task ./task
  read ./note
  lock apache-2
  sort tool
  link @termsurf/bolt, mark <0.x.x>
  link @termsurf/nest, mark <0.x.x>
  link @termsurf/crow, mark <0.x.x>

The first block of the Tao Te Ching:

head <道德经>
  head <第一章>
    text <道可道,非恆道;>
    text <名可名,非恆名。>
    text <無名天地之始;>
    text <有名萬物之母。>
    text <故,>
    text <恆無,欲也,以觀其妙;>
    text <恆有,欲也,以觀其徼。>
    text <此兩者同出而異名,>
    text <同謂之玄。>
    text <玄之又玄,眾妙之門。>

Specification

Now we will go into the actual specification of the syntax. The Link specification language is a minimal modeling language that is transformable into code. The file extension to be used is .tree, as in file.tree. It has the following syntax.

Term

The first thing to cover are terms. They are composed of words, separated by dashes. A word is composed of lowercase ascii letters or numbers. A term can't start with a number. So the following are all words of a term.

xo
hello-world
foo-bar-baz
abc123

The following is a valid term too! Starting a variable name with a number.

1xo

You just can't have terms that are only numbers. That would be a number.

You can nest them arbitrarily into trees. These are all trees.

hello world
this is a tree
this
  is
    a
      tree

You can write multiple nodes on a line separated by comma:

this is, also a tree, and a tree

The same as:

this is
  also
    a tree
      and a tree

You can put things in parentheses too to make it easier to write on one line:

add(a, subtract(b, c))

The same as:

add a, subtract b, c

Size

You can use numbers ("sizes") in the system too:

add 1, 2

An unsigned integer is called a sided-size.

Comb

A comb is a decimal number.

add 1, subtract -2, 3.14

Text

A more complex structure is the text. They are composed of a weaving of cords (strings) and terms. A string/cord is a contiguous sequence of arbitrary unicode (utf-8).

A simple template composed only of a string is:

write <hello world>

Or multiline text:

text <
  This is a long paragraph.

  And this is another paragraph.
>

Or even:

form user
  note
    <
      This is a long paragraph.

      And this is another paragraph.
    >

Nick

Then we can add interpolation ("nick") into the template, by referencing terms wrapped in angle brackets:

write <{hello-world}>

A more robust example might be:

moon <The moon has a period of roughly {bold(<28 days>)}.>

Note though, you can still use the angle bracket symbols in regular text without ambiguity, you just need to prefix them with backslashes.

i <am \<brackets\> included in the actual string>

Code

You can write specific code points, or codes, by prefixing the number sign / hash symbol along with a letter representing the code type, followed by the code.

i #b0101, am bits
i #o123, am octal
i #xaaaaaa, am hex

These can also be used directly in a template:

i <am the symbol #x2665>

This makes it so you can reference obscure symbols by their numerical value, or write bits and things like that. Note though, these just get compiled down to the following, so the code handler would need to resolve them properly in the proper context.

An arbitrary base code can be produced with #<num>n<value>, like this for base 60:

#60n123

Knit

A knit is a selector, which is a digging down into terms. They look like paths, but they are really diving down into terms, if you think of it that way.

get foo/bar

Finally, you can do actual interpolations beyond property/array lookups:

get foo{bar}{{baz}}/{{{bing}}}boop

In theory, the number of brackets means the number of passes the compiler has to go through it, so if it's 2 brackets, that will be compiled to 1 bracket, and that 1 bracket will be evaluated at runtime.

Line

A line is a path, like a file path. Because paths are so common in programming, they don't need to be treated as strings but can be written directly. The special @ symbol is for referencing relative to some "scope" or context, which you would handle in your interpreter of Link Text.

load @some/path
load ./relative/path.png
load /an-absolute/other/path.js
load **/*.js
hook /@:user

That is, they are just special strings. You can interpolate on them like strings as well with curly brackets.

Mime Type

The preferred mime-type for TreeCode is text/note.

Types

All of the TypeScript types below, which have a form, are part of the Link Tree, the AST for TreeCode. This is the exact structure of the AST.

export type LinkFold = { base?: Leaf head?: Leaf }

export type LinkTree = {
  nest: LinkFork
  form: LinkName.Tree
}

export type LinkFork = {
  fold?: LinkFold
  nest: Array<
    | TreeCode
    | LinkFork
    | LinkSize
    | TreeCode
    | LinkCord
    | LinkNick
    | LinkCull
    | LinkComb
    | LinkCode
    | LinkKnit
  >
  base?: LinkFork | LinkNick | LinkCull
  form: LinkName.Fork
}

export type LinkComb = {
  form: LinkName.Comb
  bond: number
  base?: LinkCull | LinkFork
  leaf: Leaf
}

export type LinkCode = {
  bond: number
  mold: string
  base?: LinkCull | LinkFork
  form: LinkName.Code
  leaf: Leaf
}

export type LinkCull = {
  nest?: LinkFork | LinkSize | LinkKnit
  base?: LinkKnit
  form: LinkName.Cull
  fold?: LinkFold
}

export type LinkKnit = {
  base?: LinkFork
  nest: Array<LinkCull | LinkNick | LinkCord>
  form: LinkName.Knit
  fold?: LinkFold
}

export type LinkNick = {
  nest?: LinkFork
  base?: LinkKnit | TreeCode
  size: number
  form: LinkName.Nick
  fold?: LinkFold
}

export type LinkCord = {
  form: LinkName.Cord
  base?: TreeCode
  leaf: Leaf
}

export type TreeCode = {
  nest: Array<LinkCord | LinkNick>
  form: LinkName.Text
  base?: LinkCull | LinkFork
  fold?: LinkFold
}

export type LinkSize = {
  form: LinkName.Size
  bond: number
  base?: LinkCull | LinkFork
  leaf: Leaf
}

You'll notice, each element has a reference back to it's parent, for easier traversal. And if it has nested content, it is in the nest property.

A LinkTree is at the top, this is what gets returned by the parser. A LinkFork is a slot for an element to go in the tree. Forks can be nested, and can nest every element except the LinkTree (which is the special base element).

A LinkKnit is a weaving between LinkCord and LinkNick and LinkCull. This is the interpolation and such going on which can be in place of a simple term. Ultimately the knit gets resolved to either a term or a path in the end, which the compiler reads to figure out what to do with.

A LinkCord is a contiguous string. A LinkNick is the curly brackets and everything nested inside. And a LinkCull is the square brackets and everything nested inside.

A TreeCode is a weaving of LinkCord and LinkNick, inside the text-delimiting angle brackets. So it is strings separated by optional interpolation basically.

The LinkNick has inside of it a LinkFork, which is the space where it can start adding nested elements. Same with the LinkCull, it has a LinkFork inside of it, but the cull can also have LinkSize, an integer, for array lookups.

The "primitive" contiguous structures, like the numbers, strings, and codes, have a leaf property to access the token which was taken out of the text input stream. The non-primitive or complex nested types also have a fold property, which is used to keep track of a starting and ending leaf that sets its boundaries. This way the parser can highlight blocks of text in the source code.

So you can do essentially:

# nested seeds (seeds = elements)

seed/fold/base/text seed/fold/head/text

# leaf seeds

seed/leaf/text

Conclusion

That is all there is to it! It is a simple way of defining trees of text, allowing for template variables inside text, and for basic primitives. It is then up to you to figure out what you want to do with it.