Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Gemini web client in 100 lines of C (github.com/ir33k)
91 points by ir3k on July 19, 2023 | hide | past | favorite | 45 comments
Gemini protocol documentation claims that it is possible to write basic web client in 100 lines of code proving protocol simplicity. Easy in modern scripting language but can it be done in ANSI C? Let the source code decide.

Someone suggested to share this silly project of mine with HN community so here it is. Enjoy



It's a nice project, but - I guess as is tradition with the majority of C projects - it has resource leaks and buffer overflows. There is at least one resource leak, namely, `sfd` is not closed when certain `WARN()` invocations jump back to the `start` label; for example, when `gethostbyname()` fails (i.e. try a non-existent domain and observe that the sockets remain open with `lsof`). (It also seems to be leaked in the happy path, so presumably `SSL_set_fd()` does not take ownership.) And if the user simply presses enter, i.e. the input is an empty line, there is a buffer underflow in line 55 as `j` will be -1 initially.

Also

    addr.sin_addr.s_addr = *((unsigned long*)he->h_addr_list[i]);
is a potential buffer overflow where `long` is 64 bits since only the first four bytes of `h_addr_list[i]` can be accessed, and also potentially misaligned for `unsigned long`; and it will also not work correctly on big endian platforms where `long` is 64 bits. Using `memcpy()` would have avoided all these problems. I am really confused as to how you arrived at the conclusion that "yes, this is the way to copy 4 bytes from A to B".

This sounds rude, I know, and I apologize; I don't want to single you/this project out personally. I am just frustrated that even today there are people who work on C/C++ projects seemingly without having made GCC's -fanalyzer/asan/ubsan/valgrind/etc. an important part of their development workflow.


This is such a great feedback. Thank you a lot. OFC I will try to correct my mistakes.

Not to make any excuses, just for the context. I'm a beginner in C programming and I wrote only few small programs. My tooling is basically non existent. I will try to improve tho.


Try "valgrind" to check potential memory leaks and also compile your C projects with -Wall and -Wextra, -pedantic it's fine too.


There's a page on the net somewhere, which was on HN front-page a while ago too, that listed all the useful warning flags for gcc, with an explanation.

I'd look for it, as a C programmer. On phone now, so sadly don't have it handy.


If you or someone else finds the link to this, I would love this!


Not OP, but the one that sticks in my mind¹ with comments². Although, I also could've sworn I commented on it :/

¹ https://nullprogram.com/blog/2023/04/29/

² https://news.ycombinator.com/item?id=35758898


I looked into those points and:

1. `sfd` is now closed. This cost me an extra line of code tho so I had to improvise.

2. `j` was not initially -1 as even empty input contains one "\n" character. But it could go pass value of 0 underflowing the buffer when buffer contained only white space characters (like in case of single "\n"). This was corrected.

3. `h_addr_list[i]` about this one. I'm quite sure I took it from example in one of man pages. Anyway, I replaced it with `memcpy`.

I also tried to use valgrind. This is OFC my first step but I will continue my studies on detecting memory leaks in future projects.


Nice to see someone else fall upon what I call funnel-based control flow, as opposed to stack-based control flow of most programs using functions. I used the same pattern in nobox: https://github.com/serprex/nobox/blob/master/nobox.c

Idea is that using goto your program can be mostly a loop where you make forward jumps to anywhere in the code & everything converges or jumps back to the start. It ends up making for very terse code since you're no longer passing values back & forth as much through structs/params/returns. You align variables & jump to code. Each label has a kind of ABI where the variables are the registers


That program has made me realize there exists a single stylistic justification for 8 space tabs: you can fit up to six character goto labels into the indentation margins; you can put a label on any statement without adding a line.

:)


Yes, that was my reasoning. Also there is non-written rule in this project that forbids me from going wider than 80 columns. With that I initially had indentation set to 2 spaces but later I realized that code is indented at most 2 times and it's better to make space for goto labels on the left.


https://github.com/ir33k/gmi100/blob/master/gmi100.c#L27 definitely threw me for a loop until I realized it was a line saving trick. It would be more readable to save lines elsewhere by exploiting the comma operator instead of essentially cramming irrelevant statements into a conditional.

For example:

  addr.sin_family = AF_INET;
  addr.sin_port = htons(1965);
Could become:

  addr.sin_family = AF_INET, addr.sin_port = htons(1965);


Ah yes, there are couple of line saving tricks like that. Mostly in for loops.

Thanks for suggestion. I will go through code again to see if I can save more space with normal code.

Actually that was my workflow. Each time I managed to write something in simpler way I reverted few tricks.


I remember gplaces, but it doesn't compile under OpenBSD. It needs lots of fixes to be built on systems beside GNU/Linux.

On golang, Bombadillo today supports "images" with Unicode-art and searching inside the "pages" a la ctrl-f in browsers, so it's better than Amfora and one of the best TUI clients ever. Also, it does gopher and https thru external tools like lynx.


Besides GNU/Linux I compiled on MacOS. I made no effort to test on BSD systems or Windows. I included only few std libs so I wonder - what exactly makes it not compatible with OpenBSD?


No, gplaces didn't work. Yours' work fine with a just an extra #include for inet.


Impressive, the code is dense but not too hard to follow. And you even managed to cram in history!


What do you mean by web client? From the github it looks more like a command line program.


Web refers to Gemini here, it being a command-line client for the Gemini web.

These line-based browsers used to be more common, there were a few ones for the www but also ‘ftp’ has such a mode. As does my little ‘nostt’ Teletext reader.


I've only ever heard it called "Gemini", not "Gemini web".

"Gemini web" is a bad name because Gemini's not being part of the web was the main motive in Gemini's creation.

"Geminisphere" or "Geminiverse" would be fine with me as a name for the totality of Gemini servers considered collectively.


Gemini being part of the web is pretty baked-in. You can link to http:// pages from gemini, and gemini:// pages from HTTP. And all these links participate in the hyperlinked "Web".

It even supports mime types, you can serve text/gemini from an HTTP server, or a text/html file from a Gemini server.


That's true and many Gemini clients supports multiple protocols. But in case of this "gmi100" only gemini:// protocol is supported.


> Web refers to Gopher here, it being a command-line client for the Gopher web.

Gemini, not Gopher.


Oops, edited! Thanks.


If you're going to nitpick you could nitpick on the use of "web" (though, as others note, that's not wholly inaccurate), but there are plenty of command-line web clients. For example, wget and curl.


Yes, wget and curl are web clients. This is not.


Right, but that has nothing to do with it being a command line app, which is what OP was objecting to.


I wasn't objecting to anything. I was just confused because I initially thought it was a Gemnini client web app.


Oh, that makes way more sense! I misunderstood you completely, my bad.


Yea, you are right. Poor choose of words on my side. One can argue the definition of "web" but it would be much more precise to write "CLI client".


There’s nothing to argue. The "World Wide Web" or "Web" is defined as servers and clients communicating using the HTTP protocol. (Usually, it means that content is using HTML but this is not mandatory).

"Gemini" is defined as servers and client communicating using the "Gemini" protocol (and content is usually in the gemtext format).

Both are part of the Internet, which is defined as an INTERconnection of NETworks, thus a physical worldwide network of computers. Any computer with a public IP (Internet Address) is thus part of the Internet (in your house, is usually your box which is part of the Internet and acting as a gateway for your computer).

So : Internet is mostly an hardware network. You then join different part of the Internet by running adhoc software: Web, Gemini, Gopher, Mail, FTP, Usenet, etc…

Those have very clear and crisp definitions, there absolutely nothing to argue about. A "Web Gemini client" means a browser capable of accessing both Gemini and the Web, which is not the case here. It is a "Gemini browser".


The word "web" is actually defined as "a complex system of interconnected elements." There is nothing to argue.

In spoken human language we have this concept called connotation, "web" being associated with http servers and clients is one example of this.

There is no rule that a word must only have one connotation, and Gemni is very much a "a complex system of interconnected elements".

So basically, this is just your opinion and you aren't the authority on what the word "web" means, and it's okay to tell people that the word is often used in a context they might not have known about, but claiming that "there is nothing to argue" is not correct.


Meh. I can see why it's unpopular in the Gemini community, but I don't see anything wrong with referring to the "Gemini web".

It's basically the Web but instead of HTTP it uses its own protocol over TCP thus still within the Internet protocol suite. Qualifying it as the "Gemini web" already communicates that just fine and is less less whimsy than Geminiverse or whatever.

I suppose I could see grounds for suggesting OP use the community-preferred vernacular as a personal preference, but that's about all the bite I see in this bark. Maybe that's reason enough to change it so you don't piss off Gemini's 12 users—I can grant that one.

Also, "Gemini web client" does make it sounds like it's a web client for Gemini. I'd just change it to "a client for the Gemini web".


> A "Web Gemini client" means a browser capable of accessing both Gemini and the Web

It could also be a Gemini client running on the web, which is what I initially thought it would be from reading the title.


Yes, sry for confusion. I don't see any way to correct my mistake as edit button for title and description is no longer available to me. At least I don't have this mistake on GitHub.


Nice, except the dependency on openssl.

It would have been a great opportunity to use BearSSL[0].

0. https://bearssl.org/


I found out about BearSSL after I had first version of client working. So I kept OpenSSL. I will remember to try BearSSL in next project.


Is BearSSL still actively maintained? My approach is abstracting SSL away so the API can be exchanged wit OpenSSL, MbetTLS, etc.


It is a slow-moving project[0], but not dead.

0. https://bearssl.org/gitweb/?p=BearSSL;a=summary


This is the most unreadable code I may have ever seen. Congrats on successfully writing it and getting it to work.


Try the J source code, like https://github.com/jsoftware/jsource/blob/master/jsrc/v.c

  F1(jttable){A z,zz;I r,wr;
   F1PREFIP;ARGCHK1(w);
   // We accept the pristine calculations from ravel
   wr=AR(w); r=(RANKT)jt->ranks; r=wr<r?wr:r;  // r=rank to use
   //  obsolete  RZ(IRSIP1(w,0L,r-1<0?0:r-1,jtravel,z));  // perform ravel on items
   RZ(IRSIP1(w,0L,r-((UI)r>0),jtravel,z));  // perform ravel on items
   R r?z:IRSIP1(z,0L,0L,jtravel,zz);  // If we are raveling atoms, do it one more time on atoms
  } // ,."r y
Quoting https://corecursive.com/065-competitive-coding-with-conor-ho... :

> Yeah. I started a project a couple months ago. I’m slowly porting that code base, the J source code to C++ 20. And yeah, I think there should be a name for these types of code bases, because it’s not C, it’s like a macro variant of C, where it’s a CDSL, where 80% of your “library” is macros. I did a search, there’s 10,000 macros in the source code, and those macros are used like functions.


Yea it's very hard to read. I'm surprised that people are actually can follow the flow and can understand dirty tricks.

The most unreadable code would be the code that was intentionally obfuscated. Here I at least try to pretend that it is readable in some parts.

Thanks for kind words.


Arthur Whitney, is that you? [1]

[1] https://code.jsoftware.com/wiki/Essays/Incunabulum


The way he seems to have written 'C' is to use the C preprocessor and Macros to create a domain specific language and then write the program in that! Mind Blown!


This is cool. I asked ChatGPT4 to explain to me what it does, and it gives a useful overview:

This is a minimal Gemini protocol client written in C. The Gemini protocol is an application-level internet protocol for serving hypertext documents over secure connections.

The code includes various standard C libraries and the OpenSSL library for secure connections. The `WARN` macro definition is used to print warning messages to `stderr` and jump to the `start:` label.

In the `main` function, it defines variables and initializes certain structs, like `struct sockaddr_in addr` that is used for internet socket addresses. It then opens a connection with the Gemini server on port 1965 (the standard port for the Gemini protocol). The connection is encrypted using the OpenSSL library functions. It also opens two file streams, one for history (`.gmi100`) and another for a temporary file. The file names and locations are generated by the `tmpnam()` function.

The main loop reads commands from `stdin`. Depending on the command, it can execute system commands (when the input starts with '!'), navigate through the history (when input starts with a digit), process a variety of single-letter commands (like 'q' for quit, 'r' for refresh, 'c' to print current URI, 'u' for up a level, 'b' for back in history), or interpret the input as a URL or a part of a URL to be navigated to.

For URL navigation, it creates a TCP socket connection with the Gemini server, sets up an SSL connection over that socket, and sends a request to the server for the resource identified by the URL. The client handles the server response depending on the status code. For example, if the status code is '1' it's a request for input and if it's '3' it's a redirect. It prints the response to `stdout` and also stores it in the temporary file.

For `text/*` type responses, it additionally parses any lines that start with '=>' as links and assigns them a number, storing this in the temporary file for later navigation. If the line starts with '=>' but doesn't have a description, it goes back to reading the rest of the description.

After processing the server response, it uses the `less` command to display the content of the temporary file, allowing the user to scroll through it if it's longer than the screen. Before each new iteration of the main loop, it flushes the temporary file and goes back to the `start:` label. If the user quits, the function returns 0 and the program terminates.

Overall, this program is a command-line based Gemini client that allows a user to navigate the geminispace, issue queries, and view text documents served over the Gemini protocol. It also keeps a history of visited URIs in the `.gmi100` file.

source: https://sharegpt.com/c/xscNWg6


Wow, this is actually very good description of program flow. There are only few small mistakes. For example:

> navigate through the history (when input starts with a digit)

Should be more like: navigate to link from currently displayed page (when input starts with a digit).

> It prints the response to `stdout` and also stores it in the temporary file.

Response is not printed to `stdout` at all. It is only stored in temporary file.

Also the possibility to use first program argument to overwrite the `less` command was ignored.

But overall very good job.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: