[This post is a little lengthy, but necessarily so. Get a beverage, sit back, and learn something]
I've recently spent some time in front of a group of development-oriented professionals and the talk I gave broke down at a certain point, and I felt like I needed to write this one up. What happened was not entirely unexpected but I was a little surprised at the tenacity of the group in their arguments. At one point I felt like I was going to be chased into the windmill by the villagers and burned... The good news is these developers were willing to listen which is all I ask for.
The debate over whether to default-deny or default-allow is one that extends well beyond the web application security world. In fact, this is a principle that's applied to the real-world all over the place. Night clubs, airport [in]security (*chuckle*), and many other examples of this type of methodology abound. Perhaps the quintessential example of default-deny (white-listing) is the operation of 99% of the world's firewalls. When we all started building networks we would block the bad stuff and allow everything. Over time (and quite quickly) we security folks realized we were getting beaten, badly, as the bad guys could come up with attacks faster than we could close off ports, so we changed our approach. The new approach was to default-deny everything and only allow what we knew was OK or supposedly semi-trusted. Over time this became the standard and now I feel it's time for the Web Application Development comunity to start thinking of this or face the harsh lessons (or continue to face those harsh lessons) like we firewall jockeys did back in the day. Let's first address the concepts, just to make sure everyone has the same baseline. The two main concepts at odds are white vs. black-listing for input validation and sanitization. A quick explanation of the two works like this:
-
Black-listing: Allow anything, and create a list (blacklist) of disallowed characters, or character combinations (typically done through a Regular Expression RegExpr)
-
White-listing: Disallow everything except for specifically identified character sets and combinations (typically done through a Regular Expression RegExpr)
Now that you have the basics down let’s cover the question of which is appropriate. Of course, everyone has their personal take on this topic but I honestly do feel like there is a right answer here. I’ll present the facts and will reserve my personal comment and recommendation for the end. Since most web applications are built with maximum user operability and compatibility in mind, blacklisting generally sounds like the immediate better idea. Immediately when validation is brought up to developers the question of complexity rears its ugly head. Why not just allow everything and have some “security device” (software, hardware, whatever) do the security checking? The simple answer to that question is this – if you rely on a 3rd party “bandaid” device you’re in trouble from the start. Security must be done at the heart, in the belly of the beast, inside the application – where else does full knowledge of application content and context live? Having addressed complexity, and taking it as a given (some complexity addition is inherently necessary) we have to address the requirements of the application to figure out which method of validation is feasible. At the end of the day there is no one-size-fits-all solution to this problem. Each individual application must be analyzed and addressed page by page, form by form, field by field. The general rules still govern the task of validation though – simplicity is preferred. Always remember the KISS (Keep It Simple Stupid) principle when coding… or building anything for that matter. There are issues here which very realistically can make either option viable such as the need to input free-form text fields where a tolerance needs to be added (requires the characters < and > which are known to be used in XSS or Cross-Site Scripting) and when a name field will be accommodating Seamus O’Malley (the ‘ is a great SQL injection attack staple) comes up.
But think of it this way – pretend you own a night club (work with me here). You, the owner, hire a bouncer and tell him to monitor carefully who gets into your club. You start by saying no one in shorts and a T-shirt only to later find people on your dance floor wearing ball caps. You then add ball caps to the disallowed list only to notice sandal-clad patrons. You then add sandals to find cut-off jeans… and on and on. Finally you get annoyed and create a new policy, only people wearing formal dress-clothes are allowed in, everyone else stays out… this is a much healthier approach than trying to continually keep up with what the next unwanted trend is. This is identical in the development of web applications. You don’t want to spend your days and weeks into eternity trying to continually update your “blacklist file” with all the things that are disallowed, and building regular expressions to disallow them. You’re never going to be done, and there will always be some permutation of an attack that will slip past you.
By now the benefits of white-listing should be apparent – but what if you run into cases where a simple white-list isn’t appropriate? What if you do have to allow most-characters in the English character set? Are there cases where the only real and viable approach is to build black-lists? The answer to this last question is an emphatic yes. Just doing one or the other often either entirely fails, or becomes very difficult to work with. For example, if you have to include the greater-than (>) and less-than (<) characters – you should write regular expressions to make sure that those characters aren’t part of a script tag … right? My point is this – you’re never going to win trying to keep up with the hackers by building a black-list. I can personally guarantee you this. If you’re extremely lucky – and very good at security/programming – you may be able to hit a 30% effectiveness with black-listing. That’s still overwhelmingly poor… I would hope you understand that. But… in conjunction with a white-list that is well defined this could make your application not only safe today – but also future-proof your code. If Cross-Site Scripting (XSS) is what you’re worried about… then you can feel pretty safe is your server-side validator throws out any non numeric characters [0-9]. You can build code that is resilient to future attacks (not 100% future-proof, mind you).
So there you have it…the low-down on validation based on white/black-listing. Which is appropriate for your application? Only you and your security team will be able to determine that based on specification, functional requirements, and security need.
Posted
06-26-2008 5:00 PM
by
RafalLos