emrahgunduz
always eats his vegetables
Blog RSS Feed
  • twitter
  • friendfeed
  • linkedin
  • facebook
  • vimeo
  • flickr
  • lastfm

Take 2 on UTF8 BOM : Remove BOM with PHP

Take 2 on UTF8 BOM : Remove BOM with PHP

Some people asked me about my UTF8 BOM problems in PHP and XML post. They were wondering if it was possible to remove the BOM from the files, without damaging it. And if PHP could do this. They had hundreds of files with UTF8 BOM in them and it would be time consuming to remove by hand, if they weren’t able to find a solution.

My answer was, “of course”. PHP can read and remove BOM from every file. As we encounter this problem only in text based files, a string remover will do the trick. Applause for substr().

At the end of the post, you can find my old BOM php code tweaked a little. This time it finds plus removes the UTF8 BOM problems out of your life.

Remember

GET A COMPLETE BACKUP OF YOUR FILES BEFORE YOU RUN THIS SCRIPT. Some files and software depend on the BOM to understand the content encoding. I won’t accept any responsibilities on how you used the code or what happened with it. So, be careful.

After this paranoid paragraph, here is the refurbished code. Just copy paste it to a text file, save as .php and run. Don’t use notepad. Oh what the hell, use it if you like, this baby will remove BOM from itself too :D

<?php // Tell me the root folder path. // You can also try this one // $HOME = $_SERVER["DOCUMENT_ROOT"]; // Or this // dirname(__FILE__) $HOME = dirname(__FILE__); // Is this a Windows host ? If it is, change this line to $WIN = 1; $WIN = 0; // That's all I need ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>UTF8 BOM FINDER and REMOVER</title> <style> body { font-size: 10px; font-family: Arial, Helvetica, sans-serif; background: #FFF; color: #000; } .FOUND { color: #F30; font-size: 14px; font-weight: bold; } </style> </head> <body> <?php $BOMBED = array(); RecursiveFolder($HOME); echo '<h2>These files had UTF8 BOM, but i cleaned them:</h2><p class="FOUND">'; foreach ($BOMBED as $utf) { echo $utf ."<br />\n"; } echo '</p>'; // Recursive finder function RecursiveFolder($sHOME) {   global $BOMBED, $WIN;     $win32 = ($WIN == 1) ? "\\" : "/";     $folder = dir($sHOME);     $foundfolders = array();   while ($file = $folder->read()) {     if($file != "." and $file != "..") {       if(filetype($sHOME . $win32 . $file) == "dir"){         $foundfolders[count($foundfolders)] = $sHOME . $win32 . $file;       } else {         $content = file_get_contents($sHOME . $win32 . $file);         $BOM = SearchBOM($content);         if ($BOM) {           $BOMBED[count($BOMBED)] = $sHOME . $win32 . $file;                     // Remove first three chars from the file           $content = substr($content,3);           // Write to file           file_put_contents($sHOME . $win32 . $file, $content);         }       }     }   }   $folder->close();     if(count($foundfolders) > 0) {     foreach ($foundfolders as $folder) {       RecursiveFolder($folder, $win32);     }   } } // Searching for BOM in files function SearchBOM($string) {     if(substr($string,0,3) == pack("CCC",0xef,0xbb,0xbf)) return true;     return false; } ?> </body> </html>

This post's short url is: http://emrg.me/6j

2 responses for Take 2 on UTF8 BOM : Remove BOM with PHP

  1. pinpinelea says:

    ¡¡ Great !! Thank you so much for that.

  2. André says:

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <span style=""> <p> <li> <ol>

Calendar

May 2012
M T W T F S S
« Feb    
 123456
78910111213
14151617181920
21222324252627
28293031  
Web Analytics
Author: Emrah Gunduz