Use bcrypt, with a caveat. Lookup the user record and get their password hash. Then check to ensure the supplied password matches the pre-existing hash. The speed of the computation will only vary based on the length of the supplied plaintext password and leaks no more information than has already been supplied.
If the user record does not exist, you should still perform a check against a precomputed bcrypt hash that uses the same work factor as a real one, but throw away the results. This will incur the same performance penalty as a real check but won't leak information. This assumes that your login doesn't leak info already from saying "Sorry, that user does not exist" but instead says "Invalid username and/or password".
With regards to attempts to brute force a password, it's going to be slow with bcrypt and a sufficiently high work factor. However, you can improve this by implementing IP address banning. More than X failed attempts within Y minutes gets you a Z minute timeout period which you can easily implement in your database.
To prevent distributed attacks on a single account, if an account has more than M failed attempts within N minutes suspend the account and send the account owner an email with a reactivation link. The reactivation link must be designed to be immune to the above attacks as well. When clicked, the system should temporarily ignore the account suspension status from their IP address only, subject to normal failed attempt banning.
Password reset links should similarly not reveal whether or not an account exists, but send an email to the requested address anyways. For example, the email could read "A password reset was requested for your email address [foo@example.com] but an account with that email address does not exist in our system". IP-based banning should apply to the password reset email as well to prevent someone flooding everyone on the internet with emails from your system. If someone has to try more than 10 email addresses to find their own account, their IP address should get banned and they should be prompted to call or email customer service for assistance.
"The speed of the computation will only vary based on the length of the supplied plaintext password and leaks no more information than has already been supplied."
The devil's in the details.
General a way to compare the passwords (chunks of memory) would be memcmp or strcmp. The default versions of these functions break out of the checking loop on the first inequal byte.
Unless they're using a cryptographically safe memcmp (which should XOR the memory to 0 or similar), the speed of the computation will vary based on how many leading characters match between hash(supplied password) and hash(saved password).
If the hash function is known, the attacker could use a chosen set of plaintext that they know give a different leading value once hashed. One of these inputs will take slightly longer than the rest. The attacker now knows the first byte of the hash. They repeat the previous step as many times as computationally possible (it will get harder to find inputs such that hash(input) starts with a chosen substring as that substring increases in length).
Then finally, using the known start of the hash value, use a dictionary attack to reduce the number of searches to a minimal set (by dropping all entries from the dictionary that don't start with the precomputed hash substring)
Reading this discussion on bcrypt and timing attacks, I have a question. Aren't systems like these designed so that even someone who has access to the hashed password, the randomly created salt and the hashing algorithm cannot find the password? If timing attacks help in some way, doesn't this mean some small compromise for this goal? Bruteforce attacks can of course be done faster by someone who has access but this seems to be independent from timing issues.
Exactly. Someone hacks your server, downloads your entire database containing email addresses and bcrypt-hashed passwords. They can plow through these at full-speed on their own system and will be limited to calculating a handful of hashes per second due to the computational intensity of calculating the bcrypt hash. Compare this to MD5 where they can generate tens of millions (or more) hashes per second, salt or no salt.
While using bcrypt is a good idea, why not just (also?) add a short (~0.5 second?) delay before responding to 'bad username/password' reply? This has the advantage that you aren't burning CPU cycles.
There was a fantastic article a year or more ago on HN about a vulnerability in a standard library where some loop in a function returned as soon as it failed, which means that the more wrong your hash guess was, the quicker it executed, but the difference was only a few clock cycles.
And then you think so what? There's no way an attacker can use that because all requests are transmitted over the internet where latencies are way, way bigger than a few clock cycles, right?
Wrong. Using statistical analysis over a vast amount of requests you can find out which ones execute a few clock cycles faster than others, and then you're home free.
Lesson learned: I'm not smart enough for security. :-)
Zomg. You have a discernable behavior. Adding more randomness would just give you the same easily-visible results after adding, say, 2x as many points, at which point you have this (expanded a little):
Because if the 0.5 second delay is even marginally different than the amount of time it takes to actually run bcrypt, the attacker could tell the difference between valid/invalid usernames. Unless that 0.5 second delay comes from an extremely precise (and fresh) measurement of how long bcrypt takes to run, it's just as bad (if not worse) than doing nothing at all.
Why not just time the checking function and then round it off to 0.5 seconds? So, regardless of how long the actual checking took, the function would pad with a sleep() call to 0.5 seconds. This sounds like it would work for everything...
Why bother? Odds are that a call to sleep(500) vs. sleep(500 - bcrypt_time) is going to be subtly different depending on timer resolution for your platform and so on. An attacker might be able to bog down your system through other means such that the difference becomes statistically significant or such that bcrypt takes longer than 500ms, then you're not only worse off but you wasted the time it took to implement something that only sounded like it was going to be good enough.
Avoid the pain and burn the CPU cycles for all code paths. If you find that bcrypt takes up proportionally too much CPU, reduce the work factor by 1 until it's acceptable. Keep in mind that the vast majority of logins will be legitimate. For the small number that aren't and persist with failed attempts, let their IP get banned then you don't need to spend any CPU cycles dealing with them -- in this case, it would be fine to sleep an arbitrary amount of time and then display the banned notification.
My earlier comment (which was for some reason downvoted) also solves this problem: you will always sleep for half a second, timer resolution is irrelevant, and if an attacker bogs down the system such that bcrypt takes longer than half a second, when the timer finished it would notice that bcrypt wasn't done and treat it as a failure.
It would be a bit of a hack compared to just using bcrypt every time, but it would save processing power, which may be useful in some contexts, like very energy-efficient devices, etc.
So if your system is heavily loaded, nobody will be able to login if bcrypt takes longer than half a second? If processing power is an issue, there are alternate solutions such as dynamically lowering the work factor (upon successful login, rehash the plaintext password they supplied using a decreased work factor) or offloading logins to a pool of servers.
Well, in this instance you wouldn't need any work factor, since it would always take half a second. And if a zero work factor hash is taking half a second to complete, something is very wrong... But granted, for almost all cases it would be best just to bcrypt in every circumstance.
A bcrypt work factor of 10 or 11 on a recent CPU will take about a tenth of a second to hash. Assuming little other workload, you can get about 10 valid logins/second before you'll hit a backlog. If your app is busy and the logins/second increases you'll eventually reach half a second per hash time. If you always return failure when the timer expires and bcrypt hasn't finished, you're doing a denial of service on your own users.
You seem to only be considering the case where the only logins you need to deal with are invalid logins. A busy and successful service will see the vast majority of logins being for legitimate, known users where the bcrypt check must consume CPU time. You have to design the system to be able to handle the workload from both good and bad logins without revealing information about a bad login to an attacker. Anything other than going through the same motions every time will leak information in ways you don't expect.
You can limit it to only checking X hashes per second. The attacker can, under heavy load, figure out how many people are trying to log in, but that doesn't help them figure out any passwords.
Also, there is no reason to implement an extensive backlog in the first place. It's far better for a system at 105% load to drop one in ten attempts than to have an infinitely growing queue.
If the user record does not exist, you should still perform a check against a precomputed bcrypt hash that uses the same work factor as a real one, but throw away the results. This will incur the same performance penalty as a real check but won't leak information. This assumes that your login doesn't leak info already from saying "Sorry, that user does not exist" but instead says "Invalid username and/or password".
With regards to attempts to brute force a password, it's going to be slow with bcrypt and a sufficiently high work factor. However, you can improve this by implementing IP address banning. More than X failed attempts within Y minutes gets you a Z minute timeout period which you can easily implement in your database.
To prevent distributed attacks on a single account, if an account has more than M failed attempts within N minutes suspend the account and send the account owner an email with a reactivation link. The reactivation link must be designed to be immune to the above attacks as well. When clicked, the system should temporarily ignore the account suspension status from their IP address only, subject to normal failed attempt banning.
Password reset links should similarly not reveal whether or not an account exists, but send an email to the requested address anyways. For example, the email could read "A password reset was requested for your email address [foo@example.com] but an account with that email address does not exist in our system". IP-based banning should apply to the password reset email as well to prevent someone flooding everyone on the internet with emails from your system. If someone has to try more than 10 email addresses to find their own account, their IP address should get banned and they should be prompted to call or email customer service for assistance.
Corrections and/or other login tips appreciated!