I used to be very gung ho on IPFS, until I learned that the content ID does not ...

hecturchi · on May 25, 2022

The content ID is not the hash of the content, it is the hash of the root of the Merkle DAG that carries the content.

Doing it like that has many advantages, like being able to verify hashes as small blocks are downloaded and not after downloading a huge file. Being able to de-duplicate data, being able to represent files, folders and any type of linked content-addressed data structure.

As long as your content is under 4MiB you can opt out of all this and have a content ID that is exactly the hash of the content.

anonymousnotme · on May 26, 2022

As I just replied to "cle", some disadvantages doing the way that it is because one can't predict what content ID would be produced. Perhaps the hash of the entire contents of the file could point the hash that is current the content ID would solve this issue. To me, IPFS does not seem useful unless this issue is solved. Also, multiple hashes (different algorithms) of the file could point to the content ID/merkle DAG; so if both SHA2 and SHA3 were both used and one of them had a security issues, then just use the one that is OK.

cle · on May 25, 2022

How would you produce the same hash for different encodings of data?

anonymousnotme · on May 26, 2022

not sure that I follow what you are asking. I would expect if sha2-256 is used then the content ID would be the same. However, depending on how the content is chunked, the content ID will change. Two disadvantages that I see:

1. if new packages are produced for a release of open source, could I see if there is a copy available via IPFS? No, because one can't predict how it would be chunked. So, one would have to download and then derive a content ID and one can only tell if it is available if the same chunking algorithm is available.

2. if I want to push a package or other binary, can I figure out if it is already available via IPFS? No, one can't.

siwatanejo · on May 25, 2022

Wow good to know!