The following was originally posted on php.net/crc32/ under the ‘User Contributed Notes’ but was recently removed. Since this information came up in topic again with a colleague, I am making this research available on my blog.
If you are trying to decide on a function for file verification, I came to the conclusion that md5_file() is the best all around solution.
file_crc() function that Bulk at bulksplace dot com posted on php.net/crc32/ is the most efficient solution on Windows for small and medium size files. It is most likely because file_get_contents() uses memory mapping techniques. Unfortunately on Linux (Fedora), the results were slightly better for md5_file().
sha1_file() on large files is slower than md5_file(). The time it takes for the __crc32_file() function found on php.net/crc32/ is linear to the size of the file. I would avoid using __crc32_file(). The file_crc() function will fail when using the file_get_contents() if the file is larger than the PHP.ini memory_limit setting. Windows does not seem to use the memory_limit for file_get_contents(), but I did run into an error ‘FATAL: emalloc(): Unable to allocate x bytes’ when testing iso files.
I ran the following tests on both WindowsXP and Fedora 4 machines.
< ?php
// File verification tests by Angelo Mandato (angelo [at] mandato {period} com)
// __crc32_file() is very slow, you can uncomment to test for yourself.
//require_once('crc32_file.php');
// Copy and paste the contents of the crc32_file() code found on
// the php.net crc32 PHP manual page in a new file and save
// as crc32_file.php in the same directory as this script.
// Get microseconds
function GetMicrotime()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
// file_crc() - function to test
function file_crc($file)
{
$file_string = file_get_contents($file);
$crc = crc32($file_string);
return sprintf("%u", $crc);
}
$Methods = array('sha1_file()', 'md5_file()', 'file_crc()');
if( function_exists('__crc32_file') )
$Methods[] = '__crc32_file()';
$directory = '/path/to/directory/'; // Don't forget trailing backslash.
$files = scandir($directory);
for( $method_index = 0; $method_index < count($Methods); $method_index++ )
{
$start_time = GetMicrotime();
while( list($index,$file) = each($files) )
{
if( $file != '.' && $file != '..' && is_file($directory.$file) )
{
switch( $method_index )
{
case 0: { // sha1_file()
$value = sha1_file($directory.$file);
}; break;
case 1: { // md5_file()
$value = md5_file($directory.$file);
}; break;
case 2: { // file_crc()
$value = file_crc($directory.$file);
}; break;
case 3: { // __crc32_file()
$value = __crc32_file($directory.$file);
}; break;
}
}
else // It is not part of our test results, lets remove it from the array
{
unset($files[$index]);
}
}
$end_time = GetMicrotime();
echo sprintf("%s took %.03f seconds to calculate %d files.\n", $Methods[$method_index], $end_time-$start_time, count($files) );
reset($files); // Reset pointer in array
}
echo "file verification tests completed.\n";
?>
In conclusion, the md5_file() function was the all around fastest file verification function in PHP. I suspect if a well written crc32_file() function was incorporated into PHP then it would be the best way verify files.