On the pages for rand() and uniqid(), as well as looking at the C code, they specifically state that these functions should not be used for generating secure tokens. They tend to generate predictable values. And the documentation for md5() states that it should not be used for password hashing. Granted we’re not hashing passwords when creating a CSRF token, but with the tooling available shouldn’t we be using functions that are more cryptographically secure?
…
The goal here is the random value. As such the hashing using hash_hmac() does not buy you a whole lot extra. The number of possible values in a 32 byte random string is 1.1579208923731619542357098500869e+77. That alone would seem to be enough for a CSRF prevention token. mt_rand() returns an integer which gives you about 4 billion possible numbers. While that will probably protect you, the other value will offer you better protection. There’s no sense in gambling with a smaller value if you have the ability to generate a larger value with virtually no additional cost.
So it would seem that, for generating a proper token the code that you would really need is this:
$token = base64_encode( openssl_random_pseudo_bytes(32));The only reason for the base64_encode() call is to make sure that the value provided will not break your HTML layout.
Looks like we need to update Aura.Session to use openssl when available and fall back to mt_rand() when it’s not. Via Generating secure cross site request forgery tokens (csrf).
Good to know!
How about using mcrypt_create_iv?
Or /dev/urandom?
See the comments in that article. It looks like OpenSSL is treated as best, mcrypt with urandom as good, and (mt_)rand as worst.
You should use mcrypt_create_iv as the fall back. With PHP 5.3+ it should mostly always work as a fallback.
A few notes about random bytes.
It is a bit misleading to say that openssl_random_pseudo_bytes() is “better” (security-wise speaking) than any other method that relies on /dev/urandom (or the Windows equivalence on Windows). Reading straight from /dev/urandom, or fetching bytes some other way (which uses /dev/urandom) are all practically equal.
Care should be taken to make sure to avoid those quirks when fetching random bytes. For example, openssl_random_pseudo_bytes() blocking on certain versions, /dev/uradom not available on Windows and security issues with mcrypt_create_iv() (using DEV_URANDOM) on certain versions on Windows.
Be sure to mention that on the origin site.
The csrf token can end inside an url so the proper call would be
rtrim( strtr( base64_encode( openssl_random_pseudo_bytes(32)), ‘+/’, ‘-_’), ‘=’)
My understanding is that a CSRF token should never be used in a URL.
You can use a CSRF token in URLs where, for example, requests that should be made using POST are being done using GET. Personally, I’d fix the underlying problem of using GET for state changes but that may not always be feasible in legacy apps.
Blocking on openssl_pseudo_random_bytes() was an issue prior to PHP 5.3.4. It may also be slow on Windows since the lack of /dev/urandom source leaves us with two other options to generate entropy which are not as fast.
mcrypt_create_iv() can be used as an alternative if you pass MCRYPT_DEV_URANDOM as an option. Otherwise it uses the blocking /dev/random source.
Did try and post this on the original blog item, but seems it’s not happening..
I don’t like the reliance on random numbers.
I actually think the first suggestion of a HMAC is on the right path, but again not hashing random bytes.
The $data argument to hash_hmac should be made up from serialised data. This should include the full uri to where the form is to be posted, session id, and any hidden values in the form ().
This provides not only CSRF protection, but also another layer of validation to parts of the form.
The $key parameter for the CSRF could be a site wide secret, and do away with needing to use $_SESSION at all.
CSRF token should be unguessable. The token is the secret being kept from attackers. Thus a random 32 byte string is perfect. You can improve that (a little) by making tokens form specific (i.e. one token per unique form) since it limits the impact of having any one token disclosed. However, this into improbable territory though worth doing in the event that the improbable does happen (e.g. attacker gains access to browser history and reads a CSRF token from a past GET request).
Since the token is itself the secret, hashing it has no impact whatsoever. It just makes the hash the secret that needs keeping. Hashing is nearly always false security. What does this mean?
1. You don’t need to include session ID. That also increases the risk of implementation errors leaking session IDs.
2. You don’t need a server secret (i.e. salt) because hashing adds no security other than to obscure the session ID being used. See how this is adding complexity over just using a simple random string?
3. Hidden values are optional in forms so that would only impose an extra salting factor some of the time.
4. Other attempts of avoiding strong random tokens, e.g. uniqid(), are known to produce predictable values (or at least values easier to pin to a specific brute forcible range) since uniqid’s are generated using the server time (linear) and any extra entropy (set 2nd param to TRUE) is generated using an internal linear congruential generator which is widely panned as being predictable.
The core of the issue here is simple – why NOT just use a random string? Answer: There is no reason not to. It’s easily superior to all other methods.
The result of a HMAC with known data and a secret key is unguessable. The random 32 byte string is far from perfect as it relies on $_SESSION storage.
HMACs don’t leak data.
Including the hidden values as part of the data to be hashed by the HMAC prevents an attacker changing those values, from when they were sent. Because the attacker can’t create a valid token for them to be accepted.
Missing the point of CSRF tokens. It prevents an attacker from making the user’s browser submit a form automatically (e.g. XSS on another website or a phishing website). In those cases, the attacker does not have access to the user’s session ID for the target site. You can defeat CSRF protections on the target site by attacking it in other ways. For example – perform a Man-In-The-Middle attack on their connection, use an XSS vulnerability (any site) to perform a Man-In-The-Browser attack, exploit an XSS vulnerability on the target site (request form, get token, reuse token to submit form), or perform session hijacking to steal the session ID and masquerade as the user.
CSRF is a low level attack – it defends against one specific vector. Auto form submissions on visited sites other than the target site.
$_SESSION is a local server storage so not sure why you distrust it. The token must be known by both the server and the client. Session data on the client is presumed secure (short of a server breakin) and including in a return form from the client is secure (short of MiTM, MiTB and local XSS). In other words, CSRF tokens are only defeatable using a higher level attack.
Again, random string is random string. Hash is random string (need salt) + sensitive data (must be protected). You’re adding an ingredient for zero gain and more risk. More risk === bad.
Guess we’re going have to disagree, because I completely disagree with your assumption that is a “risk”.
A HTTP POST request is just a message, and using a message authentication code seems completely logical to me, especially when the sender of the message and the receiver are one and the same (the webapp).