Input Validation Strategy - Black vs. White -listing - Following the White Rabbit Blog -
Input Validation Strategy - Black vs. White -listing

[This post is a little lengthy, but necessarily so.  Get a beverage, sit back, and learn something] 

            I've recently spent some time in front of a group of development-oriented professionals and the talk I gave broke down at a certain point, and I felt like I needed to write this one up. What happened was not entirely unexpected but I was a little surprised at the tenacity of the group in their arguments. At one point I felt like I was going to be chased into the windmill by the villagers and burned... The good news is these developers were willing to listen which is all I ask for.

            The debate over whether to default-deny or default-allow is one that extends well beyond the web application security world. In fact, this is a principle that's applied to the real-world all over the place. Night clubs, airport [in]security (*chuckle*), and many other examples of this type of methodology abound. Perhaps the quintessential example of default-deny (white-listing) is the operation of 99% of the world's firewalls. When we all started building networks we would block the bad stuff and allow everything. Over time (and quite quickly) we security folks realized we were getting beaten, badly, as the bad guys could come up with attacks faster than we could close off ports, so we changed our approach. The new approach was to default-deny everything and only allow what we knew was OK or supposedly semi-trusted. Over time this became the standard and now I feel it's time for the Web Application Development comunity to start thinking of this or face the harsh lessons (or continue to face those harsh lessons) like we firewall jockeys did back in the day.            Let's first address the concepts, just to make sure everyone has the same baseline.  The two main concepts at odds are white vs. black-listing for input validation and sanitization.  A quick explanation of the two works like this:
  • Black-listing: Allow anything, and create a list (blacklist) of disallowed characters, or character combinations (typically done through a Regular Expression RegExpr)
  • White-listing: Disallow everything except for specifically identified character sets and combinations (typically done through a Regular Expression RegExpr)
            Now that you have the basics down let’s cover the question of which is appropriate.  Of course, everyone has their personal take on this topic but I honestly do feel like there is a right answer here.  I’ll present the facts and will reserve my personal comment and recommendation for the end.             Since most web applications are built with maximum user operability and compatibility in mind, blacklisting generally sounds like the immediate better idea.  Immediately when validation is brought up to developers the question of complexity rears its ugly head.  Why not just allow everything and have some “security device” (software, hardware, whatever) do the security checking?  The simple answer to that question is this – if you rely on a 3rd party “bandaid” device you’re in trouble from the start.  Security must be done at the heart, in the belly of the beast, inside the application – where else does full knowledge of application content and context live?             Having addressed complexity, and taking it as a given (some complexity addition is inherently necessary) we have to address the requirements of the application to figure out which method of validation is feasible.  At the end of the day there is no one-size-fits-all solution to this problem.  Each individual application must be analyzed and addressed page by page, form by form, field by field.  The general rules still govern the task of validation though – simplicity is preferred.  Always remember the KISS (Keep It Simple Stupid) principle when coding… or building anything for that matter.             There are issues here which very realistically can make either option viable such as the need to input free-form text fields where a tolerance needs to be added (requires the characters < and > which are known to be used in XSS or Cross-Site Scripting) and when a name field will be accommodating Seamus O’Malley (the ‘ is a great SQL injection attack staple) comes up.

             But think of it this way – pretend you own a night club (work with me here).  You, the owner, hire a bouncer and tell him to monitor carefully who gets into your club.  You start by saying no one in shorts and a T-shirt only to later find people on your dance floor wearing ball caps.  You then add ball caps to the disallowed list only to notice sandal-clad patrons.  You then add sandals to find cut-off jeans… and on and on.  Finally you get annoyed and create a new policy, only people wearing formal dress-clothes are allowed in, everyone else stays out… this is a much healthier approach than trying to continually keep up with what the next unwanted trend is.  This is identical in the development of web applications.  You don’t want to spend your days and weeks into eternity trying to continually update your “blacklist file” with all the things that are disallowed, and building regular expressions to disallow them.  You’re never going to be done, and there will always be some permutation of an attack that will slip past you.

             By now the benefits of white-listing should be apparent – but what if you run into cases where a simple white-list isn’t appropriate?  What if you do have to allow most-characters in the English character set?  Are there cases where the only real and viable approach is to build black-lists?  The answer to this last question is an emphatic yes.  Just doing one or the other often either entirely fails, or becomes very difficult to work with.  For example, if you have to include the greater-than (>) and less-than (<) characters – you should write regular expressions to make sure that those characters aren’t part of a script tag … right?  My point is this – you’re never going to win trying to keep up with the hackers by building a black-list.  I can personally guarantee you this.  If you’re extremely lucky – and very good at security/programming – you may be able to hit a 30% effectiveness with black-listing.  That’s still overwhelmingly poor… I would hope you understand that.  But… in conjunction with a white-list that is well defined this could make your application not only safe today – but also future-proof your code.  If Cross-Site Scripting (XSS) is what you’re worried about… then you can feel pretty safe is your server-side validator throws out any non numeric characters [0-9].  You can build code that is resilient to future attacks (not 100% future-proof, mind you).

             So there you have it…the low-down on validation based on white/black-listing.  Which is appropriate for your application?  Only you and your security team will be able to determine that based on specification, functional requirements, and security need.

Posted 06-26-2008 5:00 PM by RafalLos

Comments

Jeremiah Grossman wrote re: Input Validation Strategy - Black vs. White -listing
on 06-26-2008 10:00 PM

Good article, common situation, makes total sense. The only thing that appeared a little curious to me was....

"If you’re extremely lucky – and very good at security/programming – you may be able to hit a 30% effectiveness with black-listing. "

Seems like you are playing a little fast and lose with the stats. In my experience blacklist effectiveness will vary widely from situation to situation. To the point where pretty much don't matter.

Anyway, good work!

Russ McRee wrote re: Input Validation Strategy - Black vs. White -listing
on 06-27-2008 5:10 PM

Rafal,

You nailed it here. Cisco's John Stewart spent a lot of time on this yesterday in a keynote at FIRST. "The total good is smaller than the total unknown and bad." Hmm...sounds like a reasonable argument for ye olde whitelist...better coverage, more success. DENY ALL first, allow as needed thereafter. KISS that.

Recent URLs tagged Whitelist - Urlrecorder wrote Recent URLs tagged Whitelist - Urlrecorder
on 08-25-2008 6:15 PM

Pingback from  Recent URLs tagged Whitelist - Urlrecorder

Erik Čerpnjak wrote re: Input Validation Strategy - Black vs. White -listing
on 07-16-2009 9:42 PM

I found the article to become a very good guide on how my future input checking will look like. You have made a good point in not be able to detect all possible permutations of bad input and exclude them. As i have read everywhere on the internet, specialist in matters of security are also pending in the direction to use white lists. Some also say that using both approaches is THE most safe way of cheching whether an input is safe to execute or not. But probably everyone should ask themself if the application really needs both approaches - if time,effort and money is not an issue then it is safe to say that integrating both is not a bad idea.

So thank you for this article.

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Type the numbers and letters above: