emrahgunduz
always eats his vegetables
Blog RSS Feed
  • twitter
  • friendfeed
  • linkedin
  • facebook
  • vimeo
  • flickr
  • lastfm

UTF8 BOM problems in PHP and XML

Yep, it’s a hell if you’ve got the UTF8 BOM (byte order mark) at the beginning of your PHP or XML files. These files need to send their own headers before anything else. Because of the BOM’s location, which is the first bytes of the file, headers can not be received by browsers and unintented errors might occur.

For PHP the error mostly will be “Warning: Cannot modify header information”, and for XML, “XML declaration allowed only at the start of the document”. If you are having header errors in your WordPress (including admin pages), it is most probably caused by a byte order mark in your theme files (First check, functions.php file of your theme).

How you can find and delete the BOM from text files ? Most frameworks and editors include a setting for saving non BOM UTF8 files. Check your help file, or ask at the forum or helpdesk of the tool you are using. Second, never use Notepad on Windows for development purposes. It directly inserts the BOM when you save your file in UTF8 format.

If you are dealing with hundreds of files, finding and deleting the BOM is time consuming. So here is a PHP file I wrote for finding the files that you’ll need to correct. What this script does is actually check all files’ first bytes for BOM characters by recursively moving around your home folder and subfolders. Every subfolder and file is checked and reported. After the script ends, it will give you a small list of files that have BOM.

Before running check the $HOME line, and change it with your home directory. You can also try setting it to document_root or file dir location. If you are hosted on a Windows based machine, do not forget to change the $WIN line to 1, as Windows recursive needs a different set of slashes.

PS. Delete the file after usage. This one prints the file list of your domain’s root and subfolders.
PSS. The php file is a resource monster. So try not to use it in your host’s peak hours.

<?php // Tell me the root folder path. // You can also try this one // $HOME = $_SERVER["DOCUMENT_ROOT"]; // Or this // dirname(__FILE__) $HOME = dirname(__FILE__); // Is this a Windows host ? If it is, change this line to $WIN = 1; $WIN = 0; // That's all I need ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>UTF8 BOM FINDER</title> <style> body { font-size: 10px; font-family: Arial, Helvetica, sans-serif; background: #FFF; color: #000; } .FOUND { color: #F30; font-size: 14px; font-weight: bold; } </style> </head> <body> <?php $BOMBED = array(); RecursiveFolder($HOME); echo '<h2>These files have UTF8 BOM:</h2><p class="FOUND">'; foreach ($BOMBED as $utf) { echo $utf ."<br />\n"; } echo '</p>'; // Recursive finder function RecursiveFolder($sHOME) {   global $BOMBED, $WIN;     $win32 = ($WIN == 1) ? "\\" : "/";     $folder = dir($sHOME);     $foundfolders = array();   while ($file = $folder->read()) {     if($file != "." and $file != "..") {       if(filetype($sHOME . $win32 . $file) == "dir"){         $foundfolders[count($foundfolders)] = $sHOME . $win32 . $file;       } else {         $BOM = SearchBOM(file_get_contents($sHOME . $win32 . $file));         if ($BOM) $BOMBED[count($BOMBED)] = $sHOME . $win32 . $file;       }     }   }   $folder->close();     if(count($foundfolders) > 0) {     foreach ($foundfolders as $folder) {       RecursiveFolder($folder, $win32);     }   } } // Searching for BOM in files function SearchBOM($string) {     if(substr($string, 0,3) == pack("CCC",0xef,0xbb,0xbf)) return true;     return false; } ?> </body> </html>

This post's short url is: http://emrg.me/6n

2 responses for UTF8 BOM problems in PHP and XML

  1. Hi there.
    I was looking for some files that had the BOM in them and considered many solutions. This one was, by far, the easiest. I customized the file to look only for some specific file extensions and it’s working flawlessly. Thanks.

  2. Tiep Nguyen says:

    you save my life, this is very helpful, thank you very much !

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <span style=""> <p> <li> <ol>

Calendar

May 2012
M T W T F S S
« Feb    
 123456
78910111213
14151617181920
21222324252627
28293031  
Web Analytics
Author: Emrah Gunduz