Archive for the ‘News’ Category

Converting URL to Local File Path in Batch MySQL INSERT Query Using SUBSTRING_INDEX() and CONCAT()

I wanted to quickly take a URL (e.g. http://www.compiledweekly.com/somefolder/somefile.ext) and translate it to the local path (e.g. /home/user/public_html/somefolder/somefile.ext) while inserting multiple records into a new table. My first thought was to select all the records, use PHP to trim off the path, then insert the new record in the new table. There’s a better answer, use SUBSTRING_INDEX() and CONCAT() with a INSERT INTO table SELECT statement.

INSERT INTO new_table
SELECT CONCAT(‘/home/user/public_html/somefolder/’, SUBSTRING_INDEX(s.url, ‘/’, -1)) AS local_path
FROM source_table AS s
WHERE …

It does the job with out having to write a single line of PHP code!

Publish test

Publish test

Download This Episode

Compiled Weekly Recording from G1 Android Phone Try 2

This is a recording of Compiled Weekly from the recorder app in the
Google G1 Android phone. It came out pretty well. What do you think? I
may start podcasting this way all the time!

Attempt 2!

The first attempt I did not save the mp3 in stereo mode, which meant flash and quicktime among other players would not be able to play the file correctly. Problem fixed in this version.

Download This Episode

Compiled Weekly Recording from G1 Android Phone

This is a recording of Compiled Weekly from the recorder app in the Google G1 Android phone. It came out pretty well. What do you think? I may start podcasting this way all the time!

Download This Episode

Sun acquires MySQL

If you have not herd, Sun Microsystems is acquiring MySQL. I think this is a great move, especially since Sun has embraced the open source community as strong, if not stronger, than other large companies such as IBM, Yahoo and Google.

Article: http://www.mysql.com/news-and-events/sun-to-acquire-mysql.html

I think the move is a good one for MySQL. With the recent purchase of the InnoDB storage engine by Oracle, there has been some concern that MySQL could end up in the hands of a company that’s best interests do not involve the open source community.

The recent influx of participation in the development of MySQL by Google gives me a lot of hope that MySQL’s future will be a bright one. The next generation storage engine called Falcon hopes to become a replacement for InnoDB and many of the performance tweaks Google has implemented over the years will most likely find their way into the future versions of MySQL.

Will be on the 24hr podcast Friday at 2pm

I will be on the 24 hour podcast event planned for Friday, December 21 at 2pm.  I’ll be discussing download Statistics, web programming and answering questions anyone may have.

More information is available at the Geek News Central.

PHP code to format Program Name for 1-click zune subscription

ZuneI added the Zune 1-click subscription option to the RawVoice properties at www.blubrry.com and www.techpodcasts.com. I ran into an issue where show titles that contain special characters, such as quotes, would break the rest of the web page.

Below is the solution I implemented for the problem.

Fix

<a href=”< ?php
$title_for_zune = ereg_replace(‘[^A-Za-z0-9 ]‘, ”, $show_title);
$title_for_zune = str_replace(‘ ‘, ‘_’, $title_for_zune);
echo ‘zune://subscribe/?’.$title_for_zune.’=’.$feed;
?>” title=”Add to Zune”>Add to Zune</a>

Having spaces in the URL does not validate, so I replace spaces with the _ character. If you are not worried about HTML validation, you can remove the str_replace function call.

I am not sure who came up with this protocol for the add to Zune 1-click subscription, as it does not appear to be very well thought out. Most other sites like Digg only require the feed url, which will never contain characters that could break your web pages.  As Jason Van Orden points out, the format is more complex than it needs to be.

Serious shortcomings with PHP5 get_headers() function

I was writing some code to find out if a file exists on a server and if it does, have it return the size in bytes.  I found a useful function built into PHP 5, get_headers().  For getting file sizes, it works flawlessly.  For situations where the file does not exist on the server, the behavior of this function was less than desirable.

Be forewarned, none of the user contributed get_headers() functions on the get_headers() documentation page on PHP.net will replicate the behavior of PHP 5’s get_headers() for URLs that use the ‘Location:’ redirect header or return File Not Found headers.

According to RFC1945, A user agent should never automatically redirect a request more than 5 times, since such redirections usually indicate an infinite loop.  For true compatibility, the functions below should be able to handle up to 5 Location redirects within one function call.  Only the native get_headers() function exhibits this behavior.  None of the user contributed functions on PHP.net handle the ‘Location’ redirection.

The native PHP >= 5 get_headers() function will not return headers in some instances where the user contributed functions would.  For example, if the server returns a 404 status, get_headers() will throw a PHP warning.  Unfortunately, the 404 error can only be known by looking at the headers.  From first glance, all of the user contributed functions will return 404 headers, which may be a desired effect but does not replicate the behavior of the native get_headers() function.

The function I created is included below.  It works well if the file exists.  Unfortunately for the project I am using the code for, I also need to verify if the file exists on the server.  I will not be able to use this function.

<CODE>
function remotefsize($url) {
$sch = parse_url($url, PHP_URL_SCHEME);
if (($sch != “http”) && ($sch != “https”) ) {
return false;
}
$headers = array_change_key_case(get_headers($url, 1),CASE_LOWER);
if ((!array_key_exists(“content-length”, $headers)))
return false;
if( is_array($headers["content-length"]) )
return array_pop($headers["content-length"]);
return $headers["content-length"];
}
</CODE>

Save bandwidth and faster downloads with Apache mod_deflate

I’ve been auditing apache web logs from statistics gathered in AWStats. I found 6 different IP addresses that are using a lot of the servers bandwidth. After looking at the logs, I discovered one of the IP addresses was a spammer and the rest are from web robots or bots. One bot used over 2.5GB of traffic last month. That is unbelievable. That is 2.5% of the months bandwidth. Of course Google and Yahoo combined have downloaded almost 30GB, but we want Google and Yahoo to index our sites. So what can we do to save our bandwidth and still provide the information to the search engines? Compress it with Mod_deflate!

The mod_deflate module in Apache is not new. What is new is the trend to use it. The deflate module uses gzip compression and is relatively fast in comparison to the bandwidth time. You can do the math, but if your page was 1MB and took 20 seconds to download, and compressed it is 250k and takes 1 second to compress, 5 second to download, and 2 second to decompress, the compression method is faster. You can crunch numbers till you are blue in the face, but the basic premise holds true.

First, you need to make sure the mod_deflate module is enabled in apache. Look for the line in your apache configuration files and uncomment:

LoadModule deflate_module modules/mod_deflate.so

Then add the following lines within a <Location> or <VirtualHost> section.

SetInputFilter DEFLATE

The above will compress everything. Instead of compressing everything, especially if you have a lot of files on your web site such as images, media, zip files, etc.. that are already compressed, you may want to only add compression for particular content types. To compress specific content types, replace the SetInputFilter with one or more of the following.

AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE text/javascript
AddOutputFilterByType DEFLATE application/x-javascript
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/x-httpd-php

If you work with other content types such as application/xml, you can add those as well.

You can test your server by using the GID Network gzip test located here: http://www.gidnetwork.com/tools/gzip-test.php

Now that your web server is using compression, you can focus on other things like programming or eating pizza.

Apache and PHP performance tips, tricks and what I call database buffering

I’ve found a couple useful web sites to help tune Apache with PHP on web servers.

phpLens has a pretty detailed page on the subject that hits everything pretty well.

Link: http://phplens.com/phpeverywhere/tuning-apache-php

For the most part I agree with everything this page says.  I do have a few other suggestions though.

Addressing the 3rd suggestion on the phpLens tuning Apache and PHP post, I recommend turning compression on for specific content types using the mod deflate module in Apache 2.0 or greater.  Apache 1.3 uses a different compression module which is a bit more complicated to setup.  Here is a link on ClarkConnect that describes how to add the deflate support to your web site and track its statistics with awstats.  Don’t forget to compress javascript, css and xml as well as html.

Link: http://www.clarkconnect.com/wiki/index.php?title=Howtos_-_Activating_content_compression_on_Apache

Web server or module level PHP accelerators seem ideal for speeding up the performance of php scripts.  Though the 6th suggestion, I would debate whether caching pages in the PHP level is faster or not.  Libraries like Smarty are great for templating your pages but they have their limits.  If you are hosting a web site with constantly changing dynamic content, the caching built into Smarty is minimal at best.  Granted that your original PHP script was well written, the cached Smarty version technically can’t re-optimize your database queries, which is most likely the source of any delays with a dynamic web page.

One suggestion that is missing from the post is optimizing your SQL queries and methods how you select the data.  phpLens mentions the concept of output buffering.  What I am about to explain is similar and should be deployed together.  First and for most, when you are working with a result set of data from a database, loop through all the data and store it in variables as quickly as you can and free up that queries resource.  This will use the time you are connected to the database as efficiently as possible and prevent you from accidentally creating nested queries.  Since this practice is very similar to output buffering, perhaps we should call it ‘database buffering’.  The database may still buffer data as well, but your script knows better than the database what to do with the selected data and your script definitely knows what queries are going to happen next.  Database buffering practices just make sense.

I would also recommend testing your web sites for security issues.  Check out Nikto, it is a pretty powerful web server scanner that looks for vulnerabilities and security issues.

Link: http://www.cirt.net/code/nikto.shtml

Blubrry player!