This is one of the many reasons there should be a universal data standard using ...

dundarious · on Sept 11, 2023

Parsing the data formats had zero contribution to the problem. They had a problem running an algorithm on the input data, and error reporting when that algorithm failed. Nothing about JSON would improve the situation.

rglover · on Sept 11, 2023

Yes, but look at the data. The algorithm was buggy because the input data is a nightmare. If the data didn't look like that, it's very unlikely the bug(s) would have ever existed.

dundarious · on Sept 11, 2023

ADEXP sounds like the universal data standard you want then. The UK just has an existing NATS that cannot understand it without transformation by this problematic algorithm. So the significant part of your suggestion might be to elide the NATS specific processing and upgrade NATS to use ADEXP directly.

Using a JSON format changes nothing. Just adds a few more characters to the text representation.

schainks · on Sept 11, 2023

I have seen a bad outages caused by valid JSON whose consumer implemented something incorrectly.

I agree with dundarius that "doing this in JSON" would not have changed the likelihood the bug could have manifested.

rglover · on Sept 11, 2023

No change at all? I find that hard to believe. There's also a data design problem here, but the structure of JSON would aid in, not subtract from, that process.

The question at hand is: "heavily structured data vs. a blob of text as input into a complex algorithm, which one is preferred?"

Unless you're lying, you'd choose the former given the option.

dundarious · on Sept 11, 2023

The issue is using both ADEXP and ICAO4444 waypoints, and doing so in a sloppy way. For the waypoint lists, there is no issue with structurelessness -- the fact that they're lists is pretty obvious, even in the existing formats. Adding some ["",] would not have helped the specific problem, as the relevant structure was already perfectly clear to the implementers. I am not lying when I say the bug would have been equally likely in a JSON format in this specific case.

schainks · on Sept 11, 2023

Now I'm wigging out to the idea of how the act of overcoming the inertia of the existing system just to migrate to JSON would spawn thousands of bugs on its own — many life-threatening, surely.

schainks · on Sept 11, 2023

These old standards ARE heavily structured data, despite what their formatting or lack of punctuation suggests.

numpad0 · on Sept 11, 2023

To me and XML-ified this would look more nightmarish than the status quo... it's just brief, space separated and \n terminated ASCII. No need to overcomplicate things this simple.

zimpenfish · on Sept 11, 2023

> The algorithm was buggy because the input data is a nightmare.

No, the algorithm was "buggy" because it didn't account for the entry to and exit points from the UK to have the same designation because they're supposed to be geographically distant (they were 4000Nm apart!) and the UK ain't that big.

smarx007 · on Sept 11, 2023

There are already standards like XML and RDF Turtle that allow you to clearly communicate vocabulary, such that a property 'iso3779:vin' (shorthand for a made-up URI 'https://ns.iso.org/standard/52200#vin') is interpreted in the same way anywhere in the structures and across API endpoints across companies (unlike JSON, where you need to fight both the existence of multiple labels like 'vin', 'vin_no', 'vinNumber', as well as the fact that the meaning of a property is strongly connected to its place in the JSON tree). The problem is that the added burden is not respected at the small scale and once large scale is reached, the switching costs are too big. And that XML is not cool, naturally.

On top of that, RDF Turtle is the only widely used standard graph data format (as opposed to tree-based formats like JSON and XML). This allows you to reduce the hoop jumping when consuming responses from multiple APIs as graph union is a trivial operation, while n-way tree merging is not.

Finally, RDF Turtle promotes use of URIs as primary identifiers (the ones exposed to the API consumers) instead of primary keys, bespoke tokens, or UUIDs. Followig this rule makes all identifiers globally unique and dereferenceable (ie, the ID contains the necessary information on how to fetch the resource identified by a given ID).

P.S.: The problem at hand was caused by the algorithm that was processing the parsed data, not with the parsing per se. The only improvement a better data format like RDF Turtle would bring is that two different waypoints with the same label would have two different URI identifiers.

seabass-labrax · on Sept 11, 2023

Furthermore, there are already XML namespaces for flight plans. These are not, however, used by ATC - only by pilots to load new routes into their aircrafts' navigation computers.

I'm not sure whether there is an existing RDF ontology for flight plans; it would probably be of low to medium complexity considering how powerful RDF is and the kind of global-scale users it already has.

fbdab103 · on Sept 11, 2023

Airport software predates basically every standard on the planet. I would not be surprised to learn that they have their own bizarro world implementation of ASCII, unix epoch time, etc.

tjohns · on Sept 11, 2023

Yes, FPL messages are sent over AFTN, which uses ITA-2 Baudot code instead of ASCII: https://en.wikipedia.org/wiki/Baudot_code

The keyboards used by ATC don't even allow entering symbols: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1...

(There is a modern replacement for AFTN called AMHS, which replaces analog phone lines with X.400 messages over IP... but the system still needs to be backwards compatible for ATC units still using analog links.)

vb-8448 · on Sept 11, 2023

It won't fix anything. JSON is the "standard" today, 15 years ago it was XML and in 15 years we will have protobuf or another new standard.

rglover · on Sept 11, 2023

Correct. The other "leg" of a solution to this problem would be to codify migration practices so stagnation at the tech level is a non issue long-term.

recursive · on Sept 11, 2023

You could do all that stuff.

But after you did it, you'd still have exactly the same problem. The cause was not related to deserialization. That part worked perfectly. The problem is the business logic that applied to the model after the message was parsed.

vb-8448 · on Sept 11, 2023

> codify migration practices

I think this won't work: no one really wants to touch a system that works, and people will try to find any excuse to avoid migrating. The reason of this is that everyone prefers systems that work and fails in known way rather new systems that no one knows how can it fail.

rglover · on Sept 11, 2023

Does the system work if it randomly fails and collapses the entire system for days?

People generally prefer to be lazy and to not use their brains, show up, and receive a paycheck for the minimum amount of effort. Not to be rude, but that's where this attitude originates. Having a codified process means that attitude can't exist because you're given all of the tools you need to solve the problem.

vb-8448 · on Sept 11, 2023

> Having a codified process means that attitude can't exist because you're given all of the tools you need to solve the problem.

Yes, but in real life doesn't work. Processes have corner cases. As you said, people are lazy and will do everything to find the corner case to fit in.

Just an example from the banking sector. There are processes (and even laws) that force banks to use only certified, supported and regularly patched software: there are still a lot of Windows 2000 servers in their datacenters and will be there for many years.

nemetroid · on Sept 11, 2023

There are several XML formats for expressing flight plans, most notably ARINC 633 and FIXM.

0xffff2 · on Sept 11, 2023

Broadly speaking I think this is done for new systems. What you need to identify here is how and when you transition legacy systems to this new better standard of practice.

rglover · on Sept 11, 2023

I'd argue in favor of at least an annual review process. Have a dedicated "feature freeze, emergencies only" period where you evaluate your existing data structures and queue up any necessary work. The only real hang up here is one of bad management.

In terms of how, it's really just a question of Schema A to Schema B mapping. Have a small team responsible for collection/organization of all the possible schemas and then another small team responsible for writing the mapping functions to transition existing data.

It would require will/force. Ideally, too, jobs of those responsible would be dependent on completion of the task so you couldn't just kick the can. You either do it and do it correctly or you're shopping your resume around.

tristor · on Sept 11, 2023

The problem is systems written in the 1970s in FORTRAN to run on Mainframes don't speak JSON.

rglover · on Sept 11, 2023

Great. It should be fixed by replacing the FORTRAN systems with a modern solution. It's not that it can't be done, it's that the engineers don't bother to start the process (which is a side-effect of bad incentive structure at the employment level).

mprovost · on Sept 11, 2023

No migration of this magnitude is blocked because of engineers not "bothering" to start the process. Imagine how many approvals you'd need, plus getting budget from who-knows how many government departments. Someone is paying for your time as an engineer and they decide what you work on. I'm glad we live in a world where engineers can't just decide to rewrite a life or death system because it's written in an old(er) programming language. (Not that there is any evidence that this specific system is written in anything older than C++ or maybe Ada.)

tristor · on Sept 11, 2023

That's... not how that works. I take it you're probably more of a frontend person than a backend person by this comment. In the backend world, you usually can't fully and completely replace old systems, you can only replace parts of systems while maintaining full backwards compatibility. The most critical systems in the world -- healthcare, transportation, military, and banking -- all run on mainframes still, for the most part. This is isn't a coincidence. When these systems get migrated, any issues, including issues of backwards compatibility cause people to /DIE/. This isn't an issue of a button being two pixels to the left after you bump frontend platform revs, these systems are relied on for the lives and livelihood of millions of people, every single day.

I am totally with you wishing these systems were more modern, having worked with them extensively, but I'm also realistic about the prospect. If every major airline regulator in the world worked on upgrading their ATC systems to something modern by 2023 standards, and everything went perfectly, we could expect to no longer need backwards compatibility with the old system sometime in 2050, and that's /very/ optimistic. These systems are basically why IBM is still in business, frankly.

tjohns · on Sept 11, 2023

Many of them have been upgraded. In the US, we've replaced HOST (the old ATC backend system) with ERAM (the modern replacement) as of 2015.

However, you have to remember this is a global problem. You need to maintain 100% backwards compatibility with every country on the planet. So even if you upgrade your country's systems to something modern, you still have to support old analog communication links and industry standard data formats.

fullspectrumdev · on Sept 11, 2023

Have you ever been involved in such a migration?

It’s invariably a complete clusterfuck.

rglover · on Sept 11, 2023

I haven't, but I'd love to. My approach wouldn't be very "HR friendly," though.

count · on Sept 11, 2023

Ah yes, migration through sheer force of will.

jakub_g · on Sept 11, 2023

It's trivial. Only took Amadeus hundreds of developers working for over a decade to migrate off TPF. /s

[0] https://amadeus.com/en/insights/blog/celebrating-one-year-fu...

rglover · on Sept 11, 2023

In some sense, yes. Notice that most of the responses to what I've said are immediately negative or dismissive of the idea. If that's the starting point (bad mindset), of course nothing gets fixed and you land where we are today.

My initial approach would be to weed out anyone with that point of view before any work took place (the "not HR friendly" part being to be purposefully exclusionary). The only way a problem of this scope/scale can be solved is by a team of people with extremely thick skin who are comfortable grabbing a beer and telling jokes after they spent the day telling each other to go f*ck themselves.

tristor · on Sept 11, 2023

Anyone who has worked with me knows that I have no issue coming in like a wrecking ball in order to make things happen, when necessary. I've also been involved in some of these migration projects. I think your take on the complexity of these projects (and I do mean inherent complexity, not incidental complexity) and the responses you've received is exceptionally naive.

The amount of wise-cracks and beers your team can handle after a work day is not the determinate factor in success. /Most/ of these organizations /want/ to migrate these systems to something better. There is political will and budget to do so, these are still inglorious multi-decade slogs which cannot fail, ever, because failure means people die. No amount of attitude will change that.

rglover · on Sept 11, 2023

> The amount of wise-cracks and beers your team can handle after a work day is not the determinate factor in success.

Of course it isn't. But it's a starting point for building a team that can deal with what you describe (a decade-plus long timeline, zero room for failure, etc). If the people responsible are more or less insufferable, progress will be extremely difficult, irrespective of how talented they are.

fbdab103 · on Sept 11, 2023

I guess we should rewrite it in Rust.

Airplane logistics feels like one of the most complicated systems running today. A single airline has to track millions of entities: planes, parts, engineers, luggage, cargo, passengers, pilots, gate agents, maintenance schedules, etc. Most of which was created all before best-practices were a thing. Not only is the software complex, but there are probably millions of devices in the world expecting exactly format X and will never be upgraded.

I have no doubt that eventually the software will be Ship of Thesus-ed into something approaching sanity, but there are likely to be glaciers of tech debt which cannot be abstracted away in anything less than decades of work.

seabass-labrax · on Sept 11, 2023

It would still be valuable to replace components piece-by-piece, starting with rigorously defining internal data structures and publically providing schemas for existing data structures so that companies can incorporate them.

I would like to point out that the article (and the incident) does not relate to airline systems; it is to do with Eurocontrol and NATS and their respective commercial suppliers of software.

masklinn · on Sept 11, 2023

The bug here was a processing one, having the data in json would make no difference.

elzbardico · on Sept 12, 2023

The problem was not in the format, but with the way the semantics of the data is understood by the system. It could be fixed-width, XML, json, whatever, and the problem would still be the same.