Escaping and sanitizing user input in PHP

Posted on July 23, 2010

I recently answered a question on Quora, a questions and answers website that I frequent. The poster asked “what are best practices for escaping or sanitizing user input in PHP?” People seemed to appreciate the answer I wrote so I’ll post it here and elaborate on it a bit more.

Why is it important to sanitize user input?

If you’re not careful with user input your website might be open to code injection, directory traversal or similar attacks. Information supplied by users can never be assumed safe.

Examples of user input are submitted forms (e.g. comments), URL parameters (?q=example) and server-side scripts pulling in third-party data, such as an RSS feed importer.

HTML and JavaScript

To make strings safe for HTML (without breaking Unicode text) use htmlentities().

1
<?php $safe = htmlentities($unsafe, ENT_QUOTES, 'UTF-8'); ?>

This will encode all special HTML characters. This method is better than black-listing specific elements such as <script> or just opening tags (<). Do not use strip_tags(), str_replace() or regular expressions to filter HTML and JavaScript, it is easy to miss obscure vulnerabilities and leave them exploitable.

When automatically parsing a URL to display a clickable link, check if URL starts with a protocol like “http://” (regex: /[a-z]:\/\//i) and make sure javascript: links never work. Also be careful with quotes and closing angle brackets as they can break HTML.

These examples demonstrate how unfiltered links can be dangerous:

1
2
3
4
5
6
7
<?php
$url = 'javascript:alert(\'XSS\');';
$url = 'http://example.com" onclick="alert(\'XSS\');';
$url = 'example.com"><script>alert(\'XSS\');</script>';
?>
 
<a href="<?php echo $url ?>">Click here</a>

URLs

To pass values as a parameter to a URL, use rawurlencode().

1
<?php $url = 'http://example.com/?k=' . rawurlencode($v); ?>

This function is nearly identical to urlencode() but follows the RFC 1738 specification.

MySQL database queries

When storing strings in a MySQL database, use mysql_real_escape_string().

1
<?php $safe mysql_real_escape_string($unsafe); ?>

Often addslashes() is used instead but this is not enough to prevent SQL injection attacks. If you want to learn why I recommend reading this blog post by Chris Shiflett and the third chapter of the guide to PHP security by Ilia Alshanetsky (PDF, 130KB).

This function requires a MySQL connection.

Magic Quotes

If your web app suffers from unwanted backslashes appearing in content this is probably due to double escaping (e.g. “today\’s weather”). This is likely caused by PHP’s now deprecated Magic Quotes.

Magic Quotes is a feature automatically escapes user input, intended to help beginners write more secure code. Because it’s not always on or needed this affects portability and requires excessive use of stripslashes() to undo.

To recursively undo the affects of Magic Quotes use this function at the beginning of your scripts:

1
2
3
4
5
6
7
8
9
10
11
12
13
<?php
function undo_magic_quotes($v)
{
    return is_array($v) ? array_map('undo_magic_quotes', $v) : stripslashes($v);
}
 
if ( function_exists('get_magic_quotes_gpc') && get_magic_quotes_gpc() )
{
    $_GET    = array_map('undo_magic_quotes', $_GET);
    $_POST   = array_map('undo_magic_quotes', $_POST);
    $_COOKIE = array_map('undo_magic_quotes', $_COOKIE);
}
?>

Directories

And finally, be careful with including files using user input.

1
2
3
4
5
6
7
<?php
// Don't do this
require($_GET['file']);
 
// And especially not this
echo file_get_contents($_GET['file']);
?>

This is dangerous because a user can request any file from the server and execute or view the code (e.g. “?file=../../config.php”, this is called a path traversal attack). One solution is to use basename() which strips off the path of a file name.

These were just a few basics, there is a lot more to web application security. OWASP is a great resource if you want to learn more.

Scroll to top

Comments (4)

  • Great post. If I read correct I should always use header() to encode UTF-8 when using UTF-8 on output aswell ? (to protect from http://shiflett.org/blog/2005/dec/googles-xss-vulnerability) ?

    Posted by svenn on July 23, 2010 Reply

    • I always do, didn’t know it was exploitable though.

      Posted by ElbertF on July 24, 2010 Reply

  • Nice article! I suppose there is a misprint in “MySQL database queries” section: - just ‘=’ is missing.
    And how do you think, may be htmlspecialchars() is better than htmlentities()? There is a problem with htmlentities() if there are non-latin characters in the string.

    Posted by Kremchik on April 28, 2011 Reply

  • Can you elaborate more on why you should not use strip tags? Perhaps a link with more information?

    Posted by Greg on January 24, 2012 Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

Fork me on GitHub