| ProfileI may have joined the wr...BlogLists | Help |
|
June 13 What's wrong with ASP.NET? HTML encodingThe problemBack when ASP.NET was first introduced, I had pretty high hopes that the new controls would offer support for automatic HTML encoding. Unfortunately, there was very little of this, and most of it was more than a bit lukewarm (more on this later). In some ways, things have improved a bit in v. 2.0, but they're considerably worse in others. Before you read any further, you might want to ask yourself which ASP.NET controls perform HTML encoding for you and under what circumstances this is done. If the answer doesn't leap to mind, you've perhaps got a first inkling that there might be a little problem with API consistency and/or the documentation. Then again, maybe you've never worried about HTML encoding in your web applications, in which case I'd strongly recommend that you read up on HTML injection and cross-site scripting. A good starting point might be CERT Advisory CA-2000-02. We'll look at which controls perform HTML encoding soon. First, we're going to need to nail down some conceptual stuff because not all encoding is created equal. You may already be aware that HTML, URLs, and client-side script use different encodings. For the sake of simplicity, the remainder of this post will refer mainly to HTML encoding, although the other two forms of encoding do merit consideration as well. There is more than one flavour of HTML encoding, even within ASP.NET. The first is exposed via the System.Web.HttpUtility.HtmlEncode methods. These encode the characters >, <, &, ", as well as any characters with codes between 160 and 255, inclusive. The other main encoding flavour used by ASP.NET is "attribute" encoding, which is exposed via the System.Web.HttpUtility.HtmlAttributeEncode methods. In ASP.NET 1.1., these encode the & and " characters only. In ASP.NET 2.0, these encode the characters <, &, and ". Attribute encoding ought to be a superset of full HTML encoding that also encodes the single quote character in case that's what happens to be wrapping the attribute. However, as you may have noticed from the above, the ASP.NET version of attribute encoding is even wimpier than its full encoding brother. To make matters worse, the full HTML encoding implemented by ASP.NET is no great shakes in the first place. Security isn't the only reason for HTML encoding, and failure to encode everything outside the low ASCII range can impact on page readability when client browsers don't apply the correct code page (which happens more often than you might think, whether it's the client's or the server's fault). Now that we know what kinds of HTML encoding are available in ASP.NET, let's take a look at the encoding support offered by the built-in ASP.NET controls. The following table covers some of the more commonly used controls and properties. (There are, of course, many other controls and properties that one might wish to see encoded, but I've tried to keep the list down to things that most folks are likely to use reasonably frequently.)
Assuming you've actually taken the time to read the above, you might have noticed that there are five basic patterns of encoding usage:
If this strikes you as perhaps a wee bit inconsistent, you wouldn't be alone. Wouldn't it be great to see a consistent approach that telegraphs well and acts as a pit of success? If all the controls performed HTML encoding by default but allowed overriding when necessary (preferably via a single approach), the vast majority of developers writing for ASP.NET would end up generating a more secure, more reliable applications with considerably less effort. WorkaroundsWhile we're all waiting around for the ASP.NET team to eventually provide reasonable built-in support for HTML encoding, what can we do to ensure that our apps are both protected from HTML injection and character mis-rendering? A good starting point would be to fully encode all data (i.e.: anything not 100% known at compile time, and even some stuff that is) that will be pushed to the client browser. Unfortunately, as was already mentioned above, the built-in encoding scheme leaves a little something to be desired. Luckily, the ACE team folks at Microsoft have been working on a couple of tools that take a more robust approach to HTML (and URL and script) encoding. Rather than blacklisting a fixed set of potentially problematic characters for encoding, they whitelist a set of known safe characters (low ASCII a-z, A-Z, 0-9, space, period, comma, dash, and underscore for HTML encoding) and encode everything else. This quite nicely takes care of both security and appearance issues, and you may wish to seriously consider using this approach rather than calling System.Web.HttpUtility.HtmlEncode to perform your HTML encoding. Regardless of which HTML encoding approach you select, you're quickly going to run into a bit of trouble with double encoding if you simply start assigning pre-encoded text to control properties (e.g.: someTextBox.Text = HttpUtility.HtmlEncode(someString)). When dealing with malicious input, this is pretty much a non-issue. However, not all data that ought to be encoded is malicious, and you usually wouldn't want users seeing stuff like a > b rather than a > b. Unfortunately, if we want to avoid double encoding in the set of controls that perform non-overrideable encoding (including attribute encoding), we need to use custom controls. To make matters worse, it can require rather a lot of work to subclass most of the controls in order to override the encoding behaviour. In quite a few cases, simply starting from scratch would probably make more sense than trying to subclass the built-in controls. Also, even for those controls where double encoding wouldn't be an issue (e.g.: Label, CheckBox), it's probably worth considering using custom controls anyway since the pain of authoring the custom control isn't likely to outweigh the cumulative effort of all the manual encoding calls you might make across all your projects. Don't like these workarounds? Maybe it's time to start complaining... Comments
Comments have been turned off on this page.
TrackbacksWeblogs that reference this entry
|
|||||||||||||||||||||||||||||||||||||||||||||||||
|
|