If you're going to do that then you might as well just use UUID, since you effec...

p-e-w · on July 21, 2023

The difference is that you can still use sequential IDs internally, while exposing hashed IDs to the outside. This protects your database from collisions under all circumstances, while in the absolute worst case, a single user might experience bugs because two external IDs collide.

stickfigure · on July 21, 2023

This is a weird proposal. If you're using non-hashed IDs internally and exposing hashed IDs externally, you are going to need to map those (securely hashed) ids back to internal ids when the client hands them to you.

I guess you could do this with complete table scans, hashing the ids and looking for matches, but that would be horribly inefficient. You could maintain your own internal reverse index of hash -> id but now I have to ask what's the point? You aren't saving any storage and you're adding a lot of complexity.

Seems like if you want random unguessable external ids, you're always better off just generating them and using them as primary keys.

Also, you aren't protecting your database "from collisions under all circumstances" - there's no guarantee your hash won't collide even if the input is small.

jandrewrogers · on July 22, 2023

Yes, it is more reasonable to use encrypted IDs externally from structured/sequential IDs internally, not hashed IDs. Recovering the internal ID from the external ID is computationally trivial since it will fit in a single AES block and you don't have to worry about collisions.

cacheyourdreams · on July 21, 2023

Yes, I tend to like this philosophy in database design, of internal sequential ids which are used for joins between tables etc. and an exposed "external reference". But I typically would use a UUID for my external reference rather than a hash of the internal id.

newaccount74 · on July 21, 2023

Doesn't that just add a whole lot of unnecessary complexity? If elements have multiple IDs, one of which should not be leaked to the outside, that's just asking for trouble in my opinion.

Is generating UUIDv4 or UUIDv7 really too much effort? I'd assume that writing the row to the database takes longer than generating the UUID.

iforgotpassword · on July 21, 2023

It also means once your hash function leaks for whatever reason or gets brute forced because of whatever weird weakness in your system, it's game over and everybody will forever be able to predict any future ids, guess neighboring ids, etc., unless you're willing to change the hash and invalidate all links to any content on your site.

If I'm in a scenario where I think I need consecutive ids internally and random ones externally, I'll just have two fields in my tables.

jstanley · on July 21, 2023

You need 2 fields anyway, unless you want to have to brute force your hash function when you need to invert it.

wongarsu · on July 21, 2023

Store just the sequential id, compute the hash on the edge.

This keeps your database simple and performant, and pushes complexity and work to the backend servers. This can be nice because developers are typically more at home at that layer, and scaling the backend can be a lot easier than scaling your database. But it also comes with the downsides listed in this thread.

jstanley · on July 21, 2023

That's fine, but when a request comes in referencing only a hash and not an id (because you're not leaking ids to clients), how do you get the id?

wongarsu · on July 21, 2023

Good point. Back when we did that we just used a reversible hash function (some would call it encryption). There are some simple algorithms meant for encrypting single integers with a reasonable key.

iforgotpassword · on July 21, 2023

I might be misremembering, but didn't YouTube do this in the early days? So yeah, that was what I was thinking of when replying, not a traditional hash function.

asimpletune · on July 21, 2023

If you're hashing for security reasons, I think you should still maintain a salt.