Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can the veterans of the 90s SSL Wars explain the issues with ASN1/DER/BER? Looking it up today, it seems like a pretty smart and extensive serialization system, and I have to wonder why new systems like Google Protobufs chose to reinvent the wheel.

Conversely, how have modern systems avoided the pitfalls (if any) of ASN1/DER/BER?



I know of at least one problem with ASN.1. The string encodings other than UTF-8 are terrible. Most of the string encodings are very limited and weird subsets of ASCII that nobody actually uses anymore. ASN.1 itself doesn't define the encodings and just refers to other standards.

The problem with this is probably most notable with the T.61 encoding which changed over the years and since ASN.1 references other standards nobody is quite sure exactly what you have to support to have T.61 actually work right.

Within X.509 certificates though nobody bothers to actually implement T.61 and just uses the T.61 flag for ISO-8859-1.

There are a bunch of gory details around this mess in this (now quite old) write-up here: https://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt

Since that write up I believe UTF-8 is pretty much the expectation for character encoding for X.509.

I documented some of the quirks around 6 years ago when I took an existing X.509 parser and improved it for use in certificate trust management in Subversion: http://svn.apache.org/viewvc/subversion/trunk/subversion/lib...

Basically ASN.1 wasn't well defined and it only works well when people agreed to only use certain features or to interpret things in a particular way when ambiguous.

It's also notoriously difficult to parse well. It's very easy to have bugs in your parser, even if you're implementing a subset of it that's needed for X.509. Especially if you're doing so in a non-memory safe language.

I can't speak for why Google invented Protobufs, but I can't imagine anyone sane picking up ASN.1 for anything modern and deciding that this is what they want to use.


For the string encoding thing, however, it does have UTF-8 and you should not use anything else to express arbitrary human text anyway.

PKIX actually leverages the weird encoding restriction to our benefit. It defines two kinds of names which things might have on the Internet (you can and should stop trying to name things which are actually on the Internet some other way), DnsNames and IpAddresses. IpAddresses, since they're either 32-bit or 128-bit arbitrary bit values, are just represented as either 32-bit or 128-bit arbitrary bit values. So you cannot express the erroneous IPv4 address 100.200.300.400 as an IpAddress, which means you can't trip up somebody's parser with that nonsense address. DnsNames use a deliberately sub-ASCII encoding from ASN.1 which can express all the legal DNS names (all A-labels and the ASCII dot . are permissible) but can't express lots of other goofy things including most Unicode. So a certificate issuer, even if they're completely incompetent, cannot write a valid DnsName that expresses some garbage IDN as Unicode. Hopefully they read the documentation and find out they need to use A-labels (Punycode) but if not they're prevented from emitting some ambiguous gibberish.

Even in forums where you'd once have expected pushback, "Just use UTF-8" is becoming more widespread. Microsoft for example, once upon a time you'd get at least some token resistance, today they're likely to agree "Just use UTF-8". So ASN.1 ends up no worse off for a half a dozen bad ways to write text you shouldn't use, compared to say XML, HTML, and so on.


Agree, although the right thing to do helps in specific applications but not so much in the general case. You're very often stuck with other people's MIBs / specs and encoders, trying to make sense of what a) they're allowed to put on the wire and b) what they actually do and under what circumstances.


A couple of years ago I ran into the same confusion of the "TeletexString"/"T61String" data type in ASN.1. After going down the rabbit hole of what is T.61 and trying to map it to Unicode, I reread the ASN.1 (X.690) spec and realized that the authors never actually referenced T.61. Ever since the first edition of ASN.1 in 1988, those strings have not used T.61. They use a character set that is easily mapped to Unicode - https://www.itscj-ipsj.jp/ir/102.pdf, a subset of US ASCII.

Not to say the rest of the spec is notably better. If fully implemented, it requires supporting escape codes in strings to change character sets. I've never seen valid escape codes in real world data, but it probably exists.

As the original article shows, ASN.1 has lots of other challenges and complexity. Trying to write a code generator that supports all the complexity is no trivial task and the only open source one I've seen only generates C code. Protobuf has the advantage of having modern language support (including multiple type safe and memory safe languages).


Eh... It does have a transitive normative reference to T.61, but only by way of special restrictions on the use of three characters.

T61String is defined in terms of ISO 2022, with the default C0 Character set set to ISO-IR-102 (as you linked). ISO-IR-102 defines the set of graphical characters, but also places a condition on the use of 3 of them by reference to T.61. It also requires that the control character set C0 be set to ISO-IR-106 by default, and ISO-IR-107 for C1.

The net effect is that the default character set of T61String is almost the T.61 character set, except that to get the T.61 character set, you need to include the escape sequence to set G1 to ISO-IR-103. ESC 2/9 7/6

A conforming T61String implementation does need to support the escape sequences and resulting encodings from ISO-IR-6, ISO-IR-87, ISO-IR-102, ISO-IR-103, ISO-IR-106, ISO-IR-107, ISO-IR-126, ISO-IR-144, ISO-IR-150, ISO-IR-153, ISO-IR-156, ISO-IR-164, ISO-IR-165, ISO-IR-168.

Since the control character sets include shift prefixes etc, properly parsing T61Strings into Unicode is non-trivial.

This is actually a pretty good reflection of the complexity in ASN.1. Technically the ASN.1 spec proper only requires that a T61 string support exactly the set of characters specified in the above registrations. It does not mandate any particular format, for them. It is the BER encoding that requires that ISO2022 be used to encode these. A different encoding could specify that all strings are encoded as UTF-8, and the different types are just various subsets of allowed characters.


Heimdal's ASN.1 compiler generates C code. It also generates bytecode with C bindings. Two options.

Also, I've made it generate JSON dumps of the ASN.1 modules. My goal is to eventually replace the C-coded backends that generate C / bytecode with jq-coded backends that can generate C, Java, Rust, etc.


> Basically ASN.1 wasn't well defined and it only works well when people agreed to only use certain features or to interpret things in a particular way when ambiguous.

ASN.1 has always been as-well- or better-defined than its competition. The ITU-T specs for it are a thing of beauty not often equaled outside the ITU-T.

That said, for a long time the ASN.1 specs were non-free, and that hurt a lot. Also, the BER family of encoding rules stunted development of open source tooling for ASN.1.


> I can't imagine anyone sane picking up ASN.1 for anything modern and deciding that this is what they want to use.

Part of my curiosity stems from Apple using it as part of their bootable file-format: https://www.theiphonewiki.com/wiki/IMG4_File_Format

But as you say, I have to assume they're using it in a very constrained way.


> Part of my curiosity stems from Apple using it as part of their bootable file-format: https://www.theiphonewiki.com/wiki/IMG4_File_Format

I could only speculate, but I wonder if part of the reason is that DER is completely unambiguous and therefore suitable for cryptographic services. It's also very easy to decode without a specification (TLV format). Apple are almost certainly using ASN.1 compilers for their mobile devices and security layers (even if they ship FOSS implementations, I'd be surprised if they aren't checking their work with commercial compilers), so there's overlap there. Rolling your own format in that case could be unnecessary and another failure point that could be rolled into a single unit.


One should not design cryptographic protocols so that they require canonical encodings.

Instead one should write tooling that produces decoders that preserve the original encoding of signed data.


> Instead one should write tooling that produces decoders that preserve the original encoding of signed data.

That's an interesting idea. How do you evaluate the tradeoffs in this design? I.e., what does it buy you compared to saying that you need to sort in tag order, for example? (Assume that you have something like an automatic tagging environment for sake of argument.)


Say you have a certificate, and it's supposed to be encoded in DER, which is canonical, but for some reason the issuing CA has a crappy encoder and produced something slightly not-DER-but-still-BER. Well, because certificates are supposed to be DER you can just reject it. But if you wanted to accept it you couldn't validate the signature if you simply tried to re-encode the `tbsCertificate` field -- you'd come up with DER encoding that doesn't match the original. So instead you want your codec to preserve the original encoding of the `tbsCertificate` even as it returns to you the decoded `tbsCertificate`, and now you can validate the signature. This is easier said than done because the encoding of the `tbsCertificate` is buried in the encoding of the Certificate, so you can't easily get at that encoding without writing a partial decoder, or without having support from the ASN.1 tooling.

This is what Heimdal's ASN.1 compiler does: it lets you request that for `TBSCertificate` you get a `_save` field that has the original encoding of that value, and just that value (not the outer `Certificate`).

The only trade-off is that you're wasting memory for a while, as you now keep around both, the decoded value and its original encoding. But after you're done validating the signature, you can release the memory used for tracking the original encoding.

Sorting by tag is not involved here, and neither is automatic tagging.


> The string encodings other than UTF-8 are terrible.

Well, yes, because ASN.1 predates Unicode.


Oh where to begin?

ASN.1 really demands code generation. Unfortunately lots of nonconforming stuff has to be dealt with. The concept of encoding rules and the module tagging scheme make for a pretty big number of possible representations.

The language semantics of ASN.1 don't really map to anything well, particularly around default fields and structures that can vary.

Newer systems don't have encoding rules and pick a semantics that matches a target language much more closely.


> ASN.1 really demands code generation.

Nope, nyet, bzzt. Proofs by counter-example:

- OpenLDAP has a printf/scanf-like approach to BER encoding

- Heimdal has an ASN.1 compiler that generates code, yes, but also alternatively generates bytecode that gets interpreted at run-time.

> The language semantics of ASN.1 don't really map to anything well, particularly around default fields and structures that can vary.

You are ill-informed. Proof by counter-example:

- there are ASN.1 encoding rules that produce natural XML (XER) and JSON (JER)

- "default fields" are supported (the relevant keyword is `DEFAULT`, naturally)

- "structures that can vary" -- if you mean unions, it's got that (the relevant keyword is `CHOICE`), and if you mean "extensions", it's got extensibility markers (that effectively are alike a CHOICE of an octet string of unknown stuff, or else the extensions known at module compile time.


I have worked on code that took the OpenLDAP approach. It sucked, guiding to partial parsing and processing. The rest of your question misunderstands the nature of semantics I'm talking about. It's not that we can't make XML or JSON it's that programming languages often don't have types that map naturally to all of ASN.1 default not nil doesn't work in Go for example.


Oh, I agree. I don't like the printf/scanf-like approach to BER encoding. In fact, it's awful.

The point I was making is that code generation is not the only option for ASN.1 or any encoding.

Also, ASN.1 types map very well onto C (surprise):

- OCTET STRING -> struct with pointer and length in bytes

- BIT STRING -> struct with pointer and length in bits

- INTEGER (constrained) -> some stdint.h integer type

- INTEGER (unconstrained) -> struct with pointer to array of uint64_t, array element count, and boolean to indicate if signed or unsigned

- REAL -> double or some arbitrary precision real library's type

- most string types -> pointer to array of char, or counted byte string type

- SEQUENCE OF and SET OF -> struct with pointer to array and count of elements

- SEQUENCE and SET -> struct

- CHOICE -> struct with discriminant enum and union of alternatives

- tags -> ignore

- OPTIONAL -> pointer

- DEFAULT -> nothing special

- NULL -> int (whatever)

- BOOLEAN -> unsigned int, bool, maybe a bitfield of unsigned integer type so that all booleans can be compressed, etc.

- OBJECT IDENTIFIER and RELATIVE OBJECT IDENTIFIER -> struct with pointer to DER encoding, and length in bytes

- extensibility markers -> [hard to make this pithy, but it can be handled just fine]

That covers like 99% of it. Suffice it to say that there's a very natural mapping of most of ASN.1 onto C.

Things like classes and object sets aren't types but can guide the tooling to provide automatic encoding and decoding through open types (typed holes).

BTW, `SET` is silly. `SET OF` is only of interest if you have arrays where order doesn't matter and you want a canonical encoding, but since one should not depend on canonical encodings, `SET OF` is also silly. IMO both should be deprecated (they can't be removed, but hey).


> ASN.1 really demands code generation.

On this specific point: isn't this also the case for other high-performance serialisers? Google ProtoBufs, Apache Thrift, any protocol through Rust's SerDes...


Not really. You can trivially encode or decode protobuf or thrift at runtime, given a message specification, and this isn't uncommon in the wild. It's just that you usually expect messages which are well-defined at build time, so why not generate code?


No, it's not. There is no reasonable syntax/IDL/schema/whatever you want to call it for which you wouldn't have a choice of implementing by code generation or by bytecode generation.

How is that not obvious? It would be like saying that "the problem with LISP is that it has to be interpreted", or that "the problem with C is that it can only be compiled to object code", when both such statements are clearly incorrect because of real-life counter-examples.

But there is something special to ASN.1. Instead of seeing that there's nothing new under the Sun when it comes to data encoding and schemata, and that there hasn't been anything new in that field really since S-expressions, ASN.1 has engendered a special hatred that blinds everyone to things that they would grant as obvious in other cases.


There isn't in the wild nonconformant data you also need to live with out there for most of them. The combination is unholy.


Also expect to pay to read the spec.


ASN.1 standards are free: https://www.itu.int/rec/t-rec-x.680/en

Many, though not all, specifications that use ASN.1 are also freely available. I've been out of telecom for awhile, so I don't know the status of the newer standards, but when I was working in the business GSM MAP and MMS were the only proprietary ones that were an issue.


GSM standards are also freely available --- look at 3gpp.org or etsi.org --- the biggest problem is finding which ones actually contain what you're looking for.


The ITU-T ASN.1 specs have been free for a very long time now. They used to be non-free, and that was a big problem with ASN.1, but that was decades ago.


There is NO problem with ASN.1 itself except a bit of ugliness. There are SERIOUS problems with DER/BER/CER and with all tag-length-value schemes -- this includes protobufs!

ASN.1 is just syntax and semantics. There are encoding rules that produce textual representations (GSER), XML (XER), JSON (JER), there's XDR-style encoding rules (PER and OER, but with 1-octet units instead of 4-octet units, plus efficient representation of optional fields).

In fact, you can make ASN.1 encoding rules that are based on NDR and XDR and which work for all of IDL and XDR and that subset of ASN.1 that is covered by the semantics of IDL and XDR, and you can extend that to cover all of ASN.1 if you want.

I should know these things, as I maintain an ASN.1 compiler and I intend to eventually teach it to do XDR and NDR.

Really, there's nothing about data schemas that you can express in JSON, CBOR, IDL, XDR, S-expressions, or any schema language you want, that you can't express in ASN.1, or, if there is, it's got to be a pretty niche feature and easily added to ASN.1 anyways. Even functions (RPCs) can be expressed in ASN.1 with some conventions, and routinely are, because it's really just a request/response protocol.

But every year someone invents a new thing because of how stupid, tired, and old ASN.1 is (or, rather, they perceive it to be). Or because of how complex ASN.1 is and how there's a paucity of tools, so then they: reinvent the wheel (often badly), a wheel for which instantly there is a paucity of tools.


Personally, I think that people just like to reinvent things. I don't want to sound shitty (or have kentonv show up again to scold me for it) but I get the feeling that, a lot of the time, it's just that simple.

https://news.ycombinator.com/item?id=20725550


To me that is a specious argument. It's like asking why Python was invented when Cobol could suffice.

The dozens of ASN.1 specs are absolutely hideous and entrenched in obsolete telecom jargon. If the sole goal Protobuf was to avoid having Google engineers be required to refer to the dozens of ASN.1 specs when disagreements or confusions arose, then it would have been 100% worth it for just that reason.


First, let me confess that I don't have enough experience with ASN.1 or Protobufs to have an informed opinion.

The supporting argument for the "because it's there" hypothesis for why people reinvent things (in IT) is that they do it so often.

Even if all the newer message/serialization systems are better than ASN.1, they're not all better than each other, eh? Why so many? Same goes for chat systems, programming languages, etc.


There has been a lot more new stuff in the world of programming languages, even recently, than there has been in the world of data schemata and encoding rules.

That said, most of the innovation in programming language theory has been around Haskell and related languages, and it has not justified languages like Golang or Python. DSLs in general are justified regardless of whether they are innovative in terms of programming language theory.


The ASN.1 specs are beautiful. They are beautifully written, better than anything the IETF produces because the ITU-T is an expensive standards development organization that can afford to have people who only do this sort of thing.

The ASN.1 specs are very readable. Much easier to read than many important RFCs.


ASN.1 was too broad. There is immense value in a more constrained specification that does not include so many hazardous serialization types and antiquated string formats.

Now, should Protobufs or Thrift simply have been constrained versions of ASN.1? I think there is a view of software engineering where this would have been an ideal outcome, but almost universally when we see too-big standards, they are declared "dangerous" and avoided like the plague before they are downscoped.


ASN.1 in 1984 was not too broad. It was too simple, and it was too targeted to tag-length-value encoding rules (which are stupid -- TLV is a crutch that is only maybe useful when you lack a compiler, which early on was the case).

ASN.1 today is as broad as it needed to evolve to be because its users needed it.


There is value in throwing away cruft, especially cruft that comes from the IT Middle Ages (before we decided to drop any non 8 bit word sizes, before UTF-8 became the almost universal string encoding, etc.).


Maybe, but ASN.1 is a Chesterton's fence.

Before you throw it away and reinvent it badly, acquaint yourself with it.

And you might notice that ASN.1 has a long history, but it's thoroughly modern today, and much more so than many alternatives to ASN.1 that have been created even in recent times.


I agree with this, and I think that overall the Chesterton's Fence principle should be applied more in software engineering.

What's hard is finding a set of "thoroughly modern" ASN.1 implementations that work together, and trusting that they will do so. The name is overloaded by the years of revisions and cruft.


Fabrice Bellard has a proprietary ASN.1 compiler that looks very modern and very very featureful.

Heimdal's ASN.1 compiler is getting to where we should separate it and make it a standalone project. It's really quite featureful, and it's also getting to where adding support for PER, OER, NDR, XDR, XER, JER, and other things should be quite easy: just write a bytecode interpreter for each of those. I only need NDR and JER, so I'll be adding those (already it can dump values as JSON, but it's not quite JER compliant).

Also, I've begun adding support for dumping ASN.1 modules (not just values) as JSON with an eye towards rewriting the codegen and bytecode generator in jq, and then maybe adding support for targeting languages other than C. It really helps to have a decent implementation to stand on for this, and I am really standing on the shoulders of giants here.


ASN.1 is extremely complicated and hard to implement correctly. All ASN.1 implementations I've seen are either specialized (know how to work only with a very specific message), or slow, buggy and expose equally complicated APIs. Modern systems like protobufs tend to use much simpler encodings & specs which are easier to understand and implement correctly.


Have spent a few years during the late 90s/early 2000s in an industry running on ASN.1, coming from the web. I was initially surprised by how enamoured most of my coworkers were with ASN.1 and its tools, but it grew on me too: the pleasure of interacting only with a protocol specifications regardless of the implementation language/intricacies of the remote party, the guaranty that there could be no invalid messages received or emitted, the automatic generation of tests and tools, eventually balanced out the inconvenience of not being able to readily read data on the wire (it was before every human-readable protocols gets encrypted) and the inconvenience of not being able to start coding upfront.

It was like going from runtime type checking to static type checking: initially inconvenient, but paying dividends after a short while.

So why did this tech disappeared if it was ultimately better than the later alternatives (textual protocols, shema-less serializers, and eventually protobuf which reinstated some form of efficient encoding and type checking).

As it uncannily frequently occurs with technological evolution, the reason is probably not to be found within its technical issues (which basically all boil down to: designed by committee).

ASN.1 was just a bit too inconvenient, the free tools to generate code were just not quite good and robust enough, and the approach of starting with designing your types and protocols and putting in place your code production tool-chain before being able to ship anything was at odd with the mood of the day, which was to let the junior cheap dev fire off his code editor during the coffee break of the first design planning meeting to build the first half-backed prototype that would be already sold to the customer by the time he hits :wq. To move fast and break things, ASN.1 got in the way.

So did formal specifications in general, code analyzing tools, even basic type checking, all of them thrown out the window during the same period for the extra weight, extra time-to-market and extra cost of hiring. Text protocols out competing saner alternatives because they are initially simpler (SIP vs H.323 anyone?), schema-less data formats predominating almost entirely because you can start hacking quicker, etc. are all attributable to that cultural rather than technical trend I believe.

Now it seems the industry is slowly recovering from these excesses. Maybe because of the damage that has done, but more likely because of the end of cheap hardware progresses, encryption everywhere and massive data volumes (that's what made Google come up with better protocols than HTTP and better formats than human readable text, after all).


I owned the Microsoft ASN1 library for a while around 2005. It was a maintenance nightmare and I spent a lot of time fixing static analysis derived issues.

That said, I always found the standard quite interesting with different encodings based on the degree of prior shared info or format. My assumption is that not-invented-here is part of the why it’s not used.


I own Heimdal's ASN.1 compiler. It's a pleasure.


I used the Netscape/Mozilla NSS library quite a bit, and one problem I found with it, is that all of the DER encoding/decoding was written by hand. They should have generated all that boilerplate from the ASN.1 modules written in the specs (later, RFC 2459, but at the time, a hodge-podge of scattered specs).

Hand-coding works okay when the data is what you expect. But when you throw mal-formed certificates at it, you have to catch all the edge cases. Having generated code would have enabled much more edge cases to be covered.


Those libraries were originally written in the early/mid 90s. Don’t recall much in the way of code generation tools that would take those specs and generate the code at the time.

Spent a bunch of time working with and adding to those libraries.


10 different string encodings is one problem.


Is it ? You pick the one that fits your use, normally UTF8String these days


Can one use UTF8?

The 90s were rough on text encoding, but it seems pretty settled now.


> Can one use UTF8?

For new standards, yes. But ASN.1 was first specified in the '80s, and backwards compatibility is a thing. So really it depends on what you're doing: if you can start with a subset of ASN.1, which I think is done in MDER[0] and OER[1], you have a bit more freedom. But if you're working in legacy formats and standards that operate internationally, you could run into problems.

[0]: https://www.iso.org/standard/66717.html

[1]: see among others https://www.ntcip.org/document-numbers-and-status/


Kerberos implementations generally just-send-whatever in IA5String fields. That means Windows sends UTF-8, and MIT Kerberos and Heimdal send whatever the user's locale uses. Windows doesn't normalize or anything. It works in that a) it interops when using ASCII names, b) it interops when using non-ASCII names in UTF-8 locales on Unix. It violates the spec, but it works.


Stick to UTF8String. ASN.1 predates Unicode.


Or IA5SString if you know, ahead of time, that you only need ASCII.


Or do what many implementors do: just send whatever you have as whatever string type the protocol spec requires.


No veteran of the 90s SSL wars, but I once upon the time was tasked with fixing security bugs in a custom protocol backend server which used ASN.1 for purposes that one would probably use protobuf nowadays.

The quality of existing open source libraries to parse ASN.1 leaves a lot to be desired.


When I first saw protobufs, I wondered exactly the same thing.

There’s an “XER” if you want a human-readable XML encoding, too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: