Not good for a serious agency, but it stops the lower level ones that aren't going to do this.
We use these hashes to fight off spammers. A bloom filter won't do us any good because it will provide false positives.
An external service will be legally subject to subpoenas, so pawning it off doesn't solve the problem either.
You're right. Log anonymization is hard.
The only really good way to solve this is to throw this information away, which is also legal. Because of the spam considerations, we need this for now. I might decide it's not worth it though and just stop storing even these hashes.
> An external service will be legally subject to subpoenas, so pawning it off doesn't solve the problem either.
How about a distributed onion route of external corporations, such that N subpoena hops must be followed to get to an unhashed IP? Serious investigations would do the legwork required and catch a serious bad guy, but nuisance clowns would be stopped from going too far. Kind of like a legal scrypt().
Instead of services, it could just be a reciprocal arrangement where everybody participating holds some of everybody else's data.
You don't even have to chain them, you can execute requests to 128 different services in parallel, and then XOR the result to obtain a single IP address "handle" for logging purposes.
If your scheme becomes prevalent, someone will create a service that for $10k will brutforce $1k worth of hashed IP addresses. $10k and 4 days is comparable to a legal bill, so you're not deterring anyone, but the most casual snooper.
But if it isn't prevalent (which is likely to remain the case for the near future), and the regional agencies issuing subpoenas are technically pretty clueless (which I think they mostly are), then it will be effective. That's pretty near the best you can do, as long as you need to log IPs in some form for spam prevention.
Incidentally - a slightly more effective solution might be to put the 'did this IP visit recently' function in a secure microprocessor, rate limit it so it can't be bruteforced at more than a modest rate (in case of DDOS you can always temporarily stop using it), and throw away the keys to reprogram it. That really will stop everyone but the NSA, but it's about a million times more difficult and expensive...
We use these hashes to fight off spammers. A bloom filter won't do us any good because it will provide false positives.
An external service will be legally subject to subpoenas, so pawning it off doesn't solve the problem either.
You're right. Log anonymization is hard.
The only really good way to solve this is to throw this information away, which is also legal. Because of the spam considerations, we need this for now. I might decide it's not worth it though and just stop storing even these hashes.