Skip Menu |
 
Report information
Id: 22371
Status: resolved
Priority: 0/
Queue: perl5

Owner: Nobody
Requestors: scrosby [at] cs.rice.edu
Cc:
AdminCc:

Operating System: (no value)
PatchStatus: (no value)
Severity: low
Type: unknown
Perl Version: (no value)
Fixed In: (no value)



To: perlbug [...] perl.com
Subject: Algorimic Complexity Attack on Perl 5.6.1, 5.8.0
From: Scott A Crosby <scrosby [...] cs.rice.edu>
Date: 29 May 2003 15:33:11 -0500
Download (untitled) / with headers
text/plain 2.6k
Hello. We have analyzed this software to determine its vulnerability to a new class of DoS attacks that related to a recent paper. ''Denial of Service via Algorithmic Complexity Attacks.'' This paper discusses a new class of denial of service attacks that work by exploiting the difference between average case performance and worst-case performance. In an adversarial environment, the data structures used by an application may be forced to experience their worst case performance. For instance, hash tables are usually thought of as being constant time operations, but with large numbers of collisions will degrade to a linked list and may lead to a 100-10,000 times performance degradation. Because of the widespread use of hash tables, the potential for attack is extremely widespread. Fortunately, in many cases, other limits on the system limit the impact of these attacks. To be attackable, an application must have a deterministic or predictable hash function and accept untrusted input. In general, for the attack to be signifigant, the applications must be willing and able to accept hundreds to tens of thousands of 'attack inputs'. Because of that requirement, it is difficult to judge the impact of these attack without knowing the source code extremely well, and knowing all ways in which a program is used. As part of this project, I have examined perl versions 5.6.1 and 5.8.0, and have attacked the hash function. Thus any script that may hashes untrusted input is vulnerable to our attack. Furthermore, the structure of the hash functions allows our fast collision generation algorithm to work. This means that any script written in perl that hashes a large number of keys from an untrusted source is potentially subject to a severe performance degradation. Our paper shows a 2000x degradation when hashing 90,000 strings. Depending on the application or script, this could be a critical DoS. The solution for these attacks on hash tables is to make the hash function unpredictable via a technique known as universal hashing. Universal hashing is a keyed hash function where, based on the key, one of a large set hash functions is chosen. When benchmarking, we observe that for short or medium length inputs, it is comparable in performance to simple predictable hash functions such as the ones in Python or Perl. Our paper has graphs and charts of our benchmarked performance. I highly advise using a universal hashing library, either our own or someone elses. As is historically seen, it is very easy to make silly mistakes when attempting to implement your own 'secure' algorithm. The abstract, paper, and a library implementing universal hashing is available at http://www.cs.rice.edu/~scrosby/hash/. Scott
Download (untitled) / with headers
text/plain 1.7k
Thanks for your extensive research on this. We have actually tinkered away at better hash functions for quite a while inside Perl, but so far we have mostly been worried about (1) speed (2) good spreading (3) speed (4) did I mention speed? DoS has not been our worry so far. I am by no means a cryptographer or a (discrete) mathematician, but before making a (yet another) change of the hash function, I would definitely like to learn a bit more of the art and our options and trade-offs. With my tongue firmly in my cheek, why should we trust *your* universal hashing function? :-) A quick googling (for "fast universal hashing") for example shows Krovetz' and Rogoway's "Fast universal hashing with small keys and no preprocessing: the PolyR construction" from 2000. Since the speed of one of Perl's most basic data structures, ta-dah-- hashes, depends on the speed of the hash function, I really would like the hash function to be as fast as possible. If we are "secure" against evil hash keys, but slow as a sloth, I don't know that we did much good. As you point out, there are other ways to "DoS the language", either by constructing degenerate data for qsort (the latest Perl does not use qsort anymore, however, but instead mergesort, which should be rock solid against bad data), and especially by using exponential regular expressions. (Though of course that is usually more likely to be caused by silly programmer than by malicious data.) I am currently (fervently) hoping to get Perl 5.8.1 out soon(ish), and I don't know that I want to start changing the hash function at this late a stage. We currently use Bob Jenkins' algorithm, so in theory we could go for keying that (pseudo)randomly at Perl startup, but you advise against that kind of ad-hoc tweaking, understandably.
To: perlbug-followup [...] perl.org
CC: Dan Wallach <dwallach [...] cs.rice.edu>
Subject: Re: [perl #22371] Algorimic Complexity Attack on Perl 5.6.1, 5.8.0
From: Scott A Crosby <scrosby [...] cs.rice.edu>
Date: 18 Jun 2003 04:20:37 -0500
RT-Send-Cc:
Download (untitled) / with headers
text/plain 4.5k
On 17 Jun 2003 19:46:38 -0000, Jarkko Hietaniemi (via RT) <perlbug-followup@perl.org> writes: Show quoted text
> Thanks for your extensive research on this. We have actually > tinkered away at better hash functions for quite a while inside > Perl, but so far we have mostly been worried about (1) speed (2) > good spreading (3) speed (4) did I mention speed? DoS has not been > our worry so far.
Show quoted text
> I am by no means a cryptographer or a (discrete) mathematician, but > before making a (yet another) change of the hash function, I would > definitely like to learn a bit more of the art and our options and > trade-offs. With my tongue firmly in my cheek, why should we trust > *your* universal hashing function? :-)
No need at all. A full survey of universal hashing was beyond the scope of the paper. My advice is to wait a few months, after I present at USENIX. With luck, I'll either learn about new fast functions at that conference, which I'll reference on the web page, or it may focus the academic community into either better universal hash functions, or sufficiently strong functions for pragmatic programs. Show quoted text
> A quick googling (for "fast universal hashing") for example shows > Krovetz' and Rogoway's "Fast universal hashing with small keys and > no preprocessing: the PolyR construction" from 2000. Since the > speed of one of Perl's most basic data structures, ta-dah-- hashes, > depends on the speed of the hash function, I really would like the > hash function to be as fast as possible. If we are "secure" against > evil hash keys, but slow as a sloth, I don't know that we did much > good.
Actually, I did a quick retrofit of perl to use the UHASH library, and actually got about a 10% speedup on inserting one of the test-files. (I think that was my P4 laptop. I think my P2-450 got a 15% slowdown.) I advise benchmarking it on real software; I've found microbenchmarks can sometimes be very meaningless. Also, the paper shows that the difference between the hash functions we tested is almost completely dominated by the cost an L2 cache miss. On the microbenchmark, XOR drops from being 6x as fast to only a bit faster. Show quoted text
> As you point out, there are other ways to "DoS the language",
Show quoted text
> either by constructing degenerate data for qsort (the latest Perl > does not use qsort anymore, however, but instead mergesort, which > should be rock solid against bad data),
Mergesort should be good. Show quoted text
> and especially by using exponential regular expressions.
Show quoted text
> (Though of course that is usually more likely to be caused by silly > programmer than by malicious data.)
I've yet to find/successfully design an exponential regular expression so far, but identifying regular expressions that may on 'carefully chosen' input experience exponential, quartric, cubic, or quadratic behavior is a topic I am actively researching and currently seeking related work on. I would like to identify for any regular expression the worst-case input, and the cost of that input. Would you happen to have any good references to perl's regexp engine design or information related to bad-case regular expression. Show quoted text
> I am currently (fervently) hoping to get Perl 5.8.1 out soon(ish), > and I don't know that I want to start changing the hash function at > this late a stage.
One thing to keep in mind is that someone else showed on perlmonks[1] how a carefully chosen CGI POST request would consume about three minutes of CPU time with 10k inputs; 250kb of data. Were they to send 40k inputs, it would take almost an hour of CPU for the module to process that megabyte. This is rather severe. Show quoted text
> We currently use Bob Jenkins' algorithm, so in theory we could go > for keying that (pseudo)randomly at Perl startup, but you advise > against that kind of ad-hoc tweaking, understandably.
I do, but jenkin's with a random secret key is what is used by the linux kernel. I know of no guarenteed security properties of the hash function. I suspect it may have statistical weaknesses, however, they probably don't matter in this case. It is probably 'good enough'. If you think the above issue with CGI.pm is severe enough to warrant 5.8.1 being less vulnerable, may I suggest using a keyed variant of Jenkin's or ---if you want provable security---a universal hash as a short-term fix, with the possibility or expectation of replacing it later with something different after a consideration all of the alternatives? One possibility for later is to perhaps change the infrastructure to allow a script to declare whether it wants to be resource-bounded on regexp matching and use a secure hash function or not. Scott [1] http://www.perlmonks.org/index.pl?node_id=262468
Download (untitled) / with headers
text/plain 1.1k
Attached is a patch that will patch a current development version of Perl (or 5.8.1-to-be) to pseudorandomly key the "Bob Jenkins" algorithm currently used. It seems to help the hash "exploit code" (blowout.pl), unsurprisingly. That's the good news. The bad news is that Data::Dumper (a standard Perl module) now gives different results for different runs of Perl because the output depends on the ordering of the hash keys. I managed to munge the regression suite of Data::Dumper to work with this (amusingly, I used the same trick as we used for EBCDIC...the curious will check the patch to see how-- the downside is that the test is not quite that exact anymore). The neat effect is that now this: ./perl -le '@a="a".."z";@a{@a}=();print keys %a' gives a different result for each run. That _really_ should teach people not to expect any particular ordering of hash keys... If one needs to emulate the old behaviour, one can set the environment variable PERL_HASH_INIT. One can get a 5.8.1-to-be snapshot at least for a while as describd here: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2003-06/msg00475.html
Download hashinit.pat.v2b.gz
application/x-gzip 5.6k

Message body not shown because it is not plain text.

To: perlbug-followup [...] perl.org
Subject: Re: [perl #22371] Algorimic Complexity Attack on Perl 5.6.1, 5.8.0
From: Scott A Crosby <scrosby [...] cs.rice.edu>
Date: 18 Jun 2003 17:08:50 -0500
RT-Send-Cc:
Download (untitled) / with headers
text/plain 621b
On 18 Jun 2003 21:11:49 -0000, Jarkko Hietaniemi (via RT) <perlbug-followup@perl.org> writes: Show quoted text
> Attached is a patch that will patch a current development version of Perl > (or 5.8.1-to-be) to pseudorandomly key the "Bob Jenkins" algorithm currently > used.
There's a typo at this line: +hash keys would cause Perl to consume large amounts of time because +internal structure of hashes would badly degenerate. In Perl 5.8. the ^^^^^^^ +hash function is randomly perturbed by a pseudorandom seed which makes Other than that, the patch looks good. Scott
Date: Thu, 19 Jun 2003 10:52:32 -0500
From: Dan Wallach <dwallach [...] cs.rice.edu>
To: Tels <perl_dummy [...] bloodgate.com>
CC: scrosby [...] cs.rice.edu, perl5-porters [...] perl.org
Subject: Re: [perl #22371] Algorimic Complexity Attack on Perl 5.6.1, 5.8
RT-Send-Cc:
Download (untitled) / with headers
text/plain 954b
Adding length restrictions to the CGI's input might cause problems for people with large forms or those uploading files. If you do it, you probably want to make it a configuration parameter that somebody can turn on if/when they decide they absolutely need it. Validating that all the form elements are actually desired by the application before inserting them in a hash table (your %goodones table) absolutely will solve the problem for you. The only question is whether there are CGI scripts out there that expect form elements to be there but which are never explicitly named (i.e., can you make sure %goodones isn't missing anything important). You could potentially break some existing scripts this way. The correct answer, of course, is to fix the underlying hash function. As stop-gap measures, limiting the size of the input and limiting the input values to desired ones will certainly help address the problem. Thanks, Dan
To: Tels <perl_dummy [...] bloodgate.com>
CC: perl5-porters [...] perl.org, dwallach [...] cs.rice.edu
Subject: Re: [perl #22371] Algorimic Complexity Attack on Perl 5.6.1, 5.8
From: Scott A Crosby <scrosby [...] cs.rice.edu>
Date: 19 Jun 2003 15:16:04 -0500
RT-Send-Cc:
Download (untitled) / with headers
text/plain 2.1k
On Thu, 19 Jun 2003 17:05:19 +0200 (CEST), Tels <perl_dummy@bloodgate.com> writes: Show quoted text
> Somebody should probably also forward this to the group which is re-writing > Matt Wright (sp?) CGI scrip collection. I forgot their URL and name... > getting old.
The purpose of research is to generate and then disseminate results. If you can identify anyone else who would be interested in the paper, go ahead and inform them. Show quoted text
> * length restriction: when taking parameters from the user/browser/client, > restrict the length of the data. For instance, a simple formmail script > doesn't need to accept 40 Meg of data, when all it does is relay a short > text message from a website user.
The catch is that a total input as low as 250kb is enough to DoS a server for several minutes. A 1MB threshold is enough for almost an hour. And the server cannot know what a correct limit should be for all scripts. A limit of 250kb is too small, for instance, to upload many PDF files. Show quoted text
> Drawbacks: the length reduction should be done *outside* the cgi script. > Easy if you wrote your own server using Net::Server, hard if it is a script > simple running under Apache. The reason is that once you constructed a Perl > scalar, you might already running out of memory. OTOH, restricting the > length to like 10 Kbyte drwats this attack effectively, right?
Perhaps. 10 KByte means at most a thousand or so attack inputs. Cost to the victim would be greatly reduced; at most only a few seconds. Cost to the attacker to generate is moderate, only a week or so of CPU time. Still a relatively low-bandwidth attack, 10KB of data to DOS a server for a couple of seconds, but much less interesting. Show quoted text
> * strict parameter checking: While I already implemented this (in a way > that would be too complicated to explain on this margin here), the problem > is that first all parameters are put into a hash, and then checked. The > attack however works when the data goes into the hash, so *boom*
This also works, but requires changing the API. Show quoted text
> That would avoid the attack, right?
Probably yes, at least within the CGI script. However, if the paramaters were inserted into a different hash table (say, in the webserver), then all bets are off. Scott
Since the hash randomization is in for Perl 5.8.1, I'm marking the problem ticket as resolved.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

For issues related to this RT instance (aka "perlbug"), please contact perlbug-admin at perl.org