Archive for July, 2007

Serious shortcomings with PHP5 get_headers() function

I was writing some code to find out if a file exists on a server and if it does, have it return the size in bytes.  I found a useful function built into PHP 5, get_headers().  For getting file sizes, it works flawlessly.  For situations where the file does not exist on the server, the behavior of this function was less than desirable.

Be forewarned, none of the user contributed get_headers() functions on the get_headers() documentation page on PHP.net will replicate the behavior of PHP 5’s get_headers() for URLs that use the ‘Location:’ redirect header or return File Not Found headers.

According to RFC1945, A user agent should never automatically redirect a request more than 5 times, since such redirections usually indicate an infinite loop.  For true compatibility, the functions below should be able to handle up to 5 Location redirects within one function call.  Only the native get_headers() function exhibits this behavior.  None of the user contributed functions on PHP.net handle the ‘Location’ redirection.

The native PHP >= 5 get_headers() function will not return headers in some instances where the user contributed functions would.  For example, if the server returns a 404 status, get_headers() will throw a PHP warning.  Unfortunately, the 404 error can only be known by looking at the headers.  From first glance, all of the user contributed functions will return 404 headers, which may be a desired effect but does not replicate the behavior of the native get_headers() function.

The function I created is included below.  It works well if the file exists.  Unfortunately for the project I am using the code for, I also need to verify if the file exists on the server.  I will not be able to use this function.

<CODE>
function remotefsize($url) {
$sch = parse_url($url, PHP_URL_SCHEME);
if (($sch != “http”) && ($sch != “https”) ) {
return false;
}
$headers = array_change_key_case(get_headers($url, 1),CASE_LOWER);
if ((!array_key_exists(”content-length”, $headers)))
return false;
if( is_array($headers["content-length"]) )
return array_pop($headers["content-length"]);
return $headers["content-length"];
}
</CODE>

Quick .htaccess to list files in directory on apache web server

If your web server to supports .htaccess files and you can specify “Options” from within your .htaccess file, then the following is a quick 1 line solution to your file listing needs.

So you just uploaded a tun of pictures to a web directory and you want a list of all the images.  Since the only types of files in the directory are images, the security risk of displaying the list diminishes.  In actuality, the list of images can be quite useful especially when trying to find a specific one.

The answer, create a 1 line .htaccess file with the following: Options Indexes

If you don’t have control over your apache configuration files and this option is not available to you, don’t fret.   Chris Snyder created an excellent php script that allows you to list the contents of a web directory.

Link: http://chxo.com/scripts/image-list.php

This script is very useful, I’ve customized it for my subversion repository in order to display a list of repositories in the root of the subversion web server.

Save bandwidth and faster downloads with Apache mod_deflate

I’ve been auditing apache web logs from statistics gathered in AWStats. I found 6 different IP addresses that are using a lot of the servers bandwidth. After looking at the logs, I discovered one of the IP addresses was a spammer and the rest are from web robots or bots. One bot used over 2.5GB of traffic last month. That is unbelievable. That is 2.5% of the months bandwidth. Of course Google and Yahoo combined have downloaded almost 30GB, but we want Google and Yahoo to index our sites. So what can we do to save our bandwidth and still provide the information to the search engines? Compress it with Mod_deflate!

The mod_deflate module in Apache is not new. What is new is the trend to use it. The deflate module uses gzip compression and is relatively fast in comparison to the bandwidth time. You can do the math, but if your page was 1MB and took 20 seconds to download, and compressed it is 250k and takes 1 second to compress, 5 second to download, and 2 second to decompress, the compression method is faster. You can crunch numbers till you are blue in the face, but the basic premise holds true.

First, you need to make sure the mod_deflate module is enabled in apache. Look for the line in your apache configuration files and uncomment:

LoadModule deflate_module modules/mod_deflate.so

Then add the following lines within a <Location> or <VirtualHost> section.

SetInputFilter DEFLATE

The above will compress everything. Instead of compressing everything, especially if you have a lot of files on your web site such as images, media, zip files, etc.. that are already compressed, you may want to only add compression for particular content types. To compress specific content types, replace the SetInputFilter with one or more of the following.

AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/x-httpd-php

If you work with other content types such as application/xml, you can add those as well.

You can test your server by using the GID Network gzip test located here: http://www.gidnetwork.com/tools/gzip-test.php

Now that your web server is using compression, you can focus on other things like programming or eating pizza.

Blubrry player!