Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Jesth – Next-level human-readable data serialization format (github.com/pyrustic)
38 points by alexrustic on May 18, 2023 | hide | past | favorite | 31 comments
Hi HN! I'm Alex, a tech enthusiast. I'm excited to show you Jesth, a next-level human-readable data serialization format.

This project started out as a markup language for writing the docstrings of functions that would ultimately be consumed by a documentation generator. Basically the idea was to split a docstring into sections like Description and Parameters. Each section would consist of a header in square brackets and a body (lines of text between two headers).

Here's what a docstring for a sum function would look like:

  This function takes in two integers a and b and returns their sum.

  [parameters]
  - a: First integer
  - b: Second integer

  [return] 
  Sum of a and b
The Description section in the example above is actually an anonymous section, i.e., a section with an empty header.

Meanwhile, I was thinking of a way to automate part of my dev workflow by storing in a file commands grouped into tasks such as project creation, build, testing, release, et cetera. Similarly with the markup language for my documentation generator, I would use square brackets to define the tasks. Thus, a task would consist of a header and a body which would be a list of commands to be executed sequentially.

I built this project and named it Backstage. Here is a hypothetical backstage.tasks file:

  [release]
  & test
  & generate_doc
  & git_stuff
  & build
  # upload to PyPI
  $ twine upload --skip-existing dist/*

  [git_stuff]
  $ git add .
  $ git commit -m {message}
  $ git push origin master
The example above is illustrative only and would not work. It contains 2 sections "release" and "git_stuff". Running the "release" task from the command line is equivalent to sequentially executing the commands in the "release" section.

The documentation generator and the scripting language, despite the obvious similarity in their formats, did not share any parsing code. So, to stop repeating myself, I created a file format and its library named Jesth which stands for "Just Extract Sections Then Hack".

The library acts as an incomplete INI file parser that only hands the programmer the sections (as headers and their associated bodies which are lists of strings). No further interpretation of the data is done by the parser, allowing the programmer to unleash their creativity through useful hacks.

In its latest iteration, Jesth has matured and also includes a proper and extensively tested hack to convert a compatible section into a dictionary data structure, making Jesth my de facto preferred format for config files. I find Jesth more readable than TOML, YAML, and JSON.

Here, encoding a dictionary data structure in its own section with another section containing a prompt for ChatGPT:

  [prompt]
  I want you to act as a detective story writer. I will provide you with
  two dictionary data structures representing the profiles of two people.
  Your goal is to write a thrilling neo-noir story. My first request is:
  "guess who the killer and victim is from the profiles, then build a story
  that includes every detail of the profiles".

  [profile]
  # This section can be converted into a dictionary data structure
  name = 'Jane Doe'
  birthday = 2000-12-23Z10:17:37Z
  photo_jpg = (bin)
      VGhpcyBpcyBub3QgYSBwaG90by4uLiBCdXQgdGhhbmsgeW91
      IGZvciB5b3VyIGludGVsbGVjdHVhbCBjdXJpb3NpdHkgOyk=
      ---
  books = (dict)
      romance = (list)
          'Happy Place'
          'Romantic Comedy'
      sci-fi = (list)
          'Dune'
          'Neuromancer'
  epitaph = (text)
      According to the law of conservation of energy,
      no a bit of you is gone;
      you are just less orderly.
      ---

  [profile]
  name = 'John Doe'
  birthday = null
  books = (list)
      'American Predator'
      'Mindhunter: Inside the FBI's Elite Serial Crime Unit'
You can learn more by reading the project's README and playing with the demo.

Let me know what you think of this project.



This looks surprisingly nice, which is a high bar to clear when it comes to my opinion of new config file formats.

Am I understanding correctly that this basically adds a type system to TOML? Have you considered calling it TypeTOML and making it a superset of TOML? (Maybe this is covered in the readme which I only skimmed.)

It would be cool if I could rename `config.toml` to `config.ttoml` and add types as I need them, similarly to how I can rename `script.js` to `script.ts` for iterative adoption of TypeScript. Although obviously this would require the consuming code to implement the Jesth (TypeTOML?:)) parsing, which would maybe defeat the point of iteratively adopting it (why bother with partial compatibility then?). Perhaps you could make it a _compatible_ superset, with types implemented using TOML comments so that existing TOML parsers can parse an (untyped) structure from a Jesth file by ignoring the comments, while Jesth can parse a typed structure from the same file.


Thank you for your kind words !

I remind you that any comparison with TOML, JSON, or YAML only concerns one of the capabilities of Jesth, namely the ability to convert a compatible section into a dictionary data structure.

A Jesth document may not have a section intended to be converted into a dictionary data structure. Therefore Jesth can be used e.g. as a markup language for docstrings (My closed-source documentation generator parses the source code to populate the 'docs' folder of my projects with Markdown files [1])

Therefore, the lines below are for Jesth sections intended to be converted to a dictionary data structure.

There is currently no type system, that is, a mechanism to ensure that values assigned to a certain key always conform to a specific data type. I'm thinking about it. Think about how we create relational database tables with SQL.

Jesth is not going to be a TOML superset, they have incompatible underlying philosophies. For example, the design decisions behind Jesth accidentally created an unlimited pool of reserved words (headers with double square brackets on either side are reserved words), from which I used [[END]] to mark the end of a Jesth stream. TOML currently doesn't have such a thing since they already use these double square brackets on each side for something that is trivially done in Jesth.

[1] https://github.com/pyrustic/jesth/tree/master/docs/modules


How is the name of this project meant to be pronounced? "jest h" or "jezzith" are the first two things that come to mind (I'm not sure how to write out the second one phonetically)


I think it should be pronounced /dʒest/ [0]

[0] https://dictionary.cambridge.org/pronunciation/english/jest

Edit: I just updated the project's README to include this detail. Thanks for making me think about it !


This looks remarkably like the dosbox conf format. Im sure you have reinvented something from the 90s?

  [sdl]

  # fullscreen -- Start dosbox directly in fullscreen.
  # fulldouble -- Use double buffering in fullscreen.
  # fullresolution -- What resolution to use for fullscreen: original or fixed size (e.g. 1024x768).
  # windowresolution -- Scale the window to this size IF the output device supports hardware scaling.
  # output -- What to use for output: surface,overlay,opengl,openglnb,ddraw.
  # autolock -- Mouse will automatically lock, if you click on the screen.
  # sensitiviy -- Mouse sensitivity.
  # waitonerror -- Wait before closing the console if dosbox has an error.
  # priority -- Priority levels for dosbox: lowest,lower,normal,higher,highest,pause (when not focussed).
  #             Second entry behind the comma is for when dosbox is not focused/minimized.
  # mapperfile -- File used to load/save the key/event mappings from.
  # usescancodes -- Avoid usage of symkeys, might not work on all operating systems.

  fullscreen=false
  fulldouble=false
  fullresolution=
  windowresolution=2048x1536
  output=ddraw
  autolock=false
  sensitivity=100
  waitonerror=true
  priority=higher,normal
  mapperfile=mapper.txt
  usescancodes=true


Thanks for your comment, but I'm still not convinced, at least until I see how you nest collections with this format.


(Mod here - I added two spaces to the beginning of most of those lines so HN's software would format it like code. I hope that's ok!)


First off, this is very cool! So don't let my comments dissuade you...

But IME serialization tends to fall into two boats:

1. I want something akin to a config that's human readable but very simple and I don't want to think about strong typing 2. I'm serializing data and I want schemas, efficiency, etc.

I feel like (1) is pretty well handled by TOML and JSON and (2) is pretty well handled by flat/proto buffers, thrift/avro, capn' proto, etc.

I guess I'm wondering where you see this being used.


Thank you for your comment !

Jesth belongs to the first boat but on one condition: if you wish.

If you need a section to represent a dictionary data structure, you should use the syntax designed for that, so you can later call the section's "make_dict" or "get_dict" methods to convert the raw lines (list of strings) in an object dictionary.

You are free to create your own hacks to convert a raw section into an object that suits your needs.

About boat 2 schemas, I'm thinking of designing a type validation schema, much like what we do when creating tables with SQL.


I'm looking for a good human readable format for my notebook app.

I don't want to use markdown, because markdown is very difficult to parse and I can't use existing parsers, because I want to extend markdown's syntax to support extended content types and revisions.

(I'm currently using djot)

I also looked at structured format, like toml and hjson. What I don't like about them is when the content contain a deep nested structure, the document will become unreadable.

for example, a blockquote can contain a paragraph and a paragraph can contain another blockquote and paragraph, etc

I can't tell if your format is practical for deeply nested structures.


Why is markdown difficult to parse or extend? I am using Markdown-it for instance and use custom parsers to extend it with custom shortlinks.



Hi ! Thank you for your reply ! I think the best way to be sure that Jesth will meet your needs is to try jesth-demo [0].

I think Jesth does a better job than TOML, YAML and JSON when it comes to nested structures or readability in general. JSON remains the boss of machine-to-machine communication, though !

It is also possible that my other project Exn [1] meets your needs.

[0] https://github.com/pyrustic/jesth-demo#readme

[1] https://news.ycombinator.com/item?id=34947927


thank you very much, here is my notebook app https://github.com/shi-yan/Epiphany

Exn does look very related.


Epiphany looks cool ! Exn does not yet support mathematical expressions and deliberately promotes editing of raw exonote text files.

It looks like under the hood you're using Webkit-like technology...

Could Epiphany embed programs like Exn does ?


My first version was something like jupyter notebook https://www.youtube.com/watch?v=rQjBhsC3oi0

but I don't run it anymore. The one on github is a rewrite and it is closer to notion.

I removed the programming part, because I feel that a notebook is not the best environment for writing code. It may be ok for ad hoc programming, but for serious coding, I want an IDE. I may create multiple files instead of mixing everything together in a single page.


> My first version was something like jupyter notebook https://www.youtube.com/watch?v=rQjBhsC3oi0

It's a nice job you've done. I hope you know about the existence of Bartosz Ciechanowski's interactive articles [1][2].

> I removed the programming part, because I feel that a notebook is not the best environment for writing code. ... I may create multiple files instead of mixing everything together in a single page.

Exn does not mix source code with prose as in literary programming. You can embed on an Exonote, a program (developed with an IDE) available in your current virtual environment for example ! [3]

[1] https://news.ycombinator.com/item?id=31261533

[2] https://news.ycombinator.com/item?id=33249215

[3] https://news.ycombinator.com/item?id=34965910


I had a similar problem recently and in the end I went for TOML, encoding the path in the section headers (square brackets), avoiding any indentation.


I'm not being negative, but I find it funny that this human readible serialization format has a name which is much less human readable than many other names.


What about CUE? (https://cuetorials.com/) It feels it'd solve your problem and more.


Jesth is like a broken INI file parser that can only split a document into sections (each section consists of a header and a body which is just a list of strings).

Now, on top of that, I can write a hack to convert an arbitrary section to a dictionary data structure (provided the body of that section is written with a specific syntax designed for my hack).

I made this hack and included it in the Jesth library, so people can use it, much like the Python standard library is just there to help people not waste time rewriting the same algorithms for common tasks.

Jesth would be like JSON which is only about data. CUE, Dhall and Jsonnet jump on top of JSON to add some cool stuff.

I used Jesth for example to design a docstring markup language (consumed [1] by a documentation generator), as well as a scripting language [2].

I will soon publish a simple data validation mechanism for Jesth dict-sections (sections intended to be converted into a dictionary data structure). It might inspire people to create a more complex data validation or data constraint language on top of Jesth. This could be more readable than what is done elsewhere.

[1] https://github.com/pyrustic/jesth/tree/master/docs/modules

[2] https://github.com/pyrustic/backstage


I can't even pronounce the name, how am I going to read it?


It should be pronounced /dʒest/ [0]

[0] https://dictionary.cambridge.org/pronunciation/english/jest


After being frustrated with

- INI

- XML

- JSON

- YAML

- TOML

... I welcome our new Jesth overlords. Progress is unstoppable even if it happens in baby steps.


I always thought prettified json looks very human readable.


A Jesth dictionary section is natively prettified. Additionally, anything you can encode in JSON can be embedded in a Jesth section, sharing the same document with another section containing, for example, a poem or a ChatGPT prompt. What JSON does is just one of things Jesth can do. For example, you won't use JSON as a markup language for, say, docstrings.

When it comes to machine-to-machine communication, JSON is more relevant.


Yes. On a similar note, the best thing about YAML is that you can just write JSON in your YAML files and it will work just fine.


Does it feel like updated ini?


It looks more like a broken INI file parser. And I could say "it's not a bug, it's a feature !", because it could be used to design a markup language for docstrings (consumed [1] by a documentation generator), which the INI file cannot do just as well.

[1] https://github.com/pyrustic/jesth/tree/master/docs/modules


Cool, so I see you've rediscovered S-Expressions.


You snark is unwarranted IMO. How's this any similar to s-expressions than, say, TOML is? Or YAML




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: