| 1 | This is a patched version of zlib, modified to use | 
 
 
 
 
 | 2 | Pentium-Pro-optimized assembly code in the deflation algorithm. The | 
 
 
 
 
 | 3 | files changed/added by this patch are: | 
 
 
 
 
 | 4 |  | 
 
 
 
 
 | 5 | README.686 | 
 
 
 
 
 | 6 | match.S | 
 
 
 
 
 | 7 |  | 
 
 
 
 
 | 8 | The speedup that this patch provides varies, depending on whether the | 
 
 
 
 
 | 9 | compiler used to build the original version of zlib falls afoul of the | 
 
 
 
 
 | 10 | PPro's speed traps. My own tests show a speedup of around 10-20% at | 
 
 
 
 
 | 11 | the default compression level, and 20-30% using -9, against a version | 
 
 
 
 
 | 12 | compiled using gcc 2.7.2.3. Your mileage may vary. | 
 
 
 
 
 | 13 |  | 
 
 
 
 
 | 14 | Note that this code has been tailored for the PPro/PII in particular, | 
 
 
 
 
 | 15 | and will not perform particuarly well on a Pentium. | 
 
 
 
 
 | 16 |  | 
 
 
 
 
 | 17 | If you are using an assembler other than GNU as, you will have to | 
 
 
 
 
 | 18 | translate match.S to use your assembler's syntax. (Have fun.) | 
 
 
 
 
 | 19 |  | 
 
 
 
 
 | 20 | Brian Raiter | 
 
 
 
 
 | 21 | breadbox@muppetlabs.com | 
 
 
 
 
 | 22 | April, 1998 | 
 
 
 
 
 | 23 |  | 
 
 
 
 
 | 24 |  | 
 
 
 
 
 | 25 | Added for zlib 1.1.3: | 
 
 
 
 
 | 26 |  | 
 
 
 
 
 | 27 | The patches come from | 
 
 
 
 
 | 28 | http://www.muppetlabs.com/~breadbox/software/assembly.html | 
 
 
 
 
 | 29 |  | 
 
 
 
 
 | 30 | To compile zlib with this asm file, copy match.S to the zlib directory | 
 
 
 
 
 | 31 | then do: | 
 
 
 
 
 | 32 |  | 
 
 
 
 
 | 33 | CFLAGS="-O3 -DASMV" ./configure | 
 
 
 
 
 | 34 | make OBJA=match.o | 
 
 
 
 
 | 35 |  | 
 
 
 
 
 | 36 |  | 
 
 
 
 
 | 37 | Update: | 
 
 
 
 
 | 38 |  | 
 
 
 
 
 | 39 | I've been ignoring these assembly routines for years, believing that | 
 
 
 
 
 | 40 | gcc's generated code had caught up with it sometime around gcc 2.95 | 
 
 
 
 
 | 41 | and the major rearchitecting of the Pentium 4. However, I recently | 
 
 
 
 
 | 42 | learned that, despite what I believed, this code still has some life | 
 
 
 
 
 | 43 | in it. On the Pentium 4 and AMD64 chips, it continues to run about 8% | 
 
 
 
 
 | 44 | faster than the code produced by gcc 4.1. | 
 
 
 
 
 | 45 |  | 
 
 
 
 
 | 46 | In acknowledgement of its continuing usefulness, I've altered the | 
 
 
 
 
 | 47 | license to match that of the rest of zlib. Share and Enjoy! | 
 
 
 
 
 | 48 |  | 
 
 
 
 
 | 49 | Brian Raiter | 
 
 
 
 
 | 50 | breadbox@muppetlabs.com | 
 
 
 
 
 | 51 | April, 2007 |