Announcing Nyth

Published on Aug 09, 2022

Several forevers later, the 15th competing standard which should’ve taken a few evenings instead, is finally released, and that is Nyth.

Nyth is a data serialization format which aims to be easy to use for both humans and machines.

# Lists and maps have designated indicators.
channels: =
	- red
	- green
	- blue
	- alpha

# ...which allows them to be empty without being an edge case. 
`good programming languages`: =

# Indentation is used to denote structure.
targets: +
	# Multiline scalars allow to embed formatted text with ease.
	install: |
		./configure
		make
		make install
	|
	clean: |
		make clean
	|

# Nyth has raw and quoted scalars as well.
name: `Remilia Scarlet`
quote: "`Remilia Scarlet`? Using Touhou names is lazy."

To answer “what?” and “how?” questions, you are welcome to check out the specification and the reference implementation; this post is an attempt to explain why Nyth exists and why it’s this way.

…wait, what happened to nsl?

Well, nsl doesn’t sound as cool, does it? Nyth is much better in that regard.

For those wondering, “nyth” is “nest” in Welsh.

Alternatives

Why make a new language in the first place, when we already have so many battle-tested solutions for every use-case, and even more if you dig a bit deeper? Well, I have NIH syndome the existing formats aren’t that good.

JSON is great, really, but it’s clearly designed to be machine-first, even if some people try to “improve” it by adding comments and other non-standard extensions. Just use some other language, config.json is an oxymoron.

YAML is usually what people first think of when they want put their data into a nice-looking human-readable format. And indeed, it is basically industry standard at this point, has a lot of implementations, and finds itself useful in multiple areas. What’s not to love?

For starters, its grammar is way too complex. Implicit typing is also not a good thing to have (remember Norway problem?). Also, YAML is a superset of JSON… not a useful feature to have in my book, yet something your parser has to support.

StrictYAML tries to fix most of YAML problems, but is ultimately tied to Python and still carries some of the legacy of its parent.

There exist entirely new languages like NestedText and Nest, but their syntax doesn’t really sit well with me, although you, the reader, might find them appealing.

TOML tries to be a better INI, and in my opinion it succeeds at that, but a better INI is not what I’m looking for; such “flat” format is simply not fit for data with more than two levels of nesting.

Finally, languages like XML, SDLang, KDL and their friends reason about the data using the concept of nodes and therefore solve a slightly different problem.

With all that being said, let’s move on.

Some of the design choices

Indentation

Indentation is done with tabs as they are objectively better; every other opinion on this topic is, of course, bad and wrong.

Scalars

Nyth has 4 types of scalars, with each having its own set of capabilities and trade-offs.

Simple scalars are, well, simple: no whitespaces, no quotes, no conflicts with other language contructs.

Raw scalars can include almost everything that doesn’t require escaping, except line breaks and backticks; they are what you will probably be using for scalars with spaces and ambigious characters.

# Not a map!
plus: `+`
comment?: `# wrong!`

Quoted scalars are something what you’re familiar with: they can contain anything, and what can’t be written as-is is encoded with escape sequences.

名前: "\u6708\u898b\u82f1\u5b50"

That leaves multiline scalars, which are best fit for scripts, messages, and other formatted texts.

script: |
	#!/bin/sh
	name="$(date +'%Y-%m-%d_%H-%M-%S').png"
	path="$HOME/images/screenshots/$name"
	args=
	if [ "$1" = "select" ]; then
		geom=$(my-slurp)
		if [ -z "$geom" ]; then exit; fi
		args="-g '$geom'"
	fi
	echo $path
	sh -c "grim $args -" | tee $path | wl-copy
|

What about folded scalars?

Folding isn’t as straightforward as it looks. Sure, you can replace all single line breaks with ASCII spaces, which will mean that texts in languages which, say, don’t need spaces at all (e.g. Chinese), won’t get to use such luxury.

Delete those line breaks and simply concatenate lines? That would mean either leading (ugly) or trailing (error-prone) whitespace, which is bad.

Add an option to pick the character for folding? That sounds like the opposite of “simple”, and wouldn’t allow to mix different languages within one scalar. (A very frequent problem, I know.)

So, as I believe that no feature is better than a bad one, Nyth doesn’t do anything to your multiline scalars. You can fold them yourself after parsing if you want.

Collection indicators

As shown in the example, maps and lists are indicated by + and = respectively. This is done for several reasons:

  • Empty collections aren’t special with this.
  • It makes parsing easier.
  • Collections as list items look better with - + or - = instead of lone -.

But if you have =, why is - required for list items?

It looks nice.

Scalar formats

Scalars in Nyth are effectively nothing more than strings, but you can convey numbers, booleans and other values with them. To help, Nyth has some predefined types and formats for Nyth users to agree on.

number: 123
`that was a number`: true
`and that was a boolean value`: `Yes, it was.`

Whether to interpret a scalar as a value of some type or not is entirely up to the user.

version: 9.2 # It makes sense to interpret this one as a string,
depth: 9.2   # and this one as a floating point number.

And of course, you can introduce your own formats, provided that they are really needed for your case and are properly documented.


That’s it for today. I hope you’ll find Nyth interesting enough to use in one of your projects, or write an implementation for your favorite programming language. (As for me, I’ll try to make one for Hare.)

Thanks for reading!