Sunday, November 28, 2010

Comparing Two Files in PowerShell

Sometimes you want to test if two files are the same. You could run MD5 or SHA hashes of the files, but it might take some time to compute them. A byte by byte comparison might be the fasted. After seeing some code on StackOverflow, I've decided to "port" the code to a PowerShell script. It might do some good for all the PS-ers out there.

function FilesAreEqual
{
   param([System.IO.FileInfo] $first, [System.IO.FileInfo] $second) 
   $BYTES_TO_READ = 8;

   if ($first.Length -ne $second.Length)
   {
        return $false;
   }

   $iterations = [Math]::Ceiling($first.Length / $BYTES_TO_READ);
   $fs1 = $first.OpenRead();
   $fs2 = $second.OpenRead();
        
   $one = New-Object byte[] $BYTES_TO_READ;
   $two = New-Object byte[] $BYTES_TO_READ;

   for ($i = 0; $i -lt $iterations; $i = $i + 1)
   {
       $fs1.Read($one, 0, $BYTES_TO_READ) | out-null;
       $fs2.Read($two, 0, $BYTES_TO_READ) | out-null;

       if ([BitConverter]::ToInt64($one, 0) -ne 
           [BitConverter]::ToInt64($two, 0))
       {
           return $false;
       }
   }
    
   $fs1.Close();
   $fs2.Close();

   return $true;
}

Calling it would be:

FilesAreEqual c:\temp\test.html c:\temp\test.html

7 comments:

  1. Are U sure ?
    I see that U read always same 0 offset :
    $fs1.Read($one, 0, $BYTES_TO_READ)

    http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx

    ReplyDelete
  2. Hey lumir, the offset is relative to the position of the stream. After each read the position will advance. Thanks for your reaction. Kees

    ReplyDelete
  3. The $BYTES_TO_READ was wrong... I've changed it to 8.

    ReplyDelete
  4. Thanks for this code snippet! It works perfectly :)

    ReplyDelete
  5. Thank you for your code!

    We modified it a bit to work faster with "larger" files.

    function FilesAreEqual
    {
    param([System.IO.FileInfo] $first, [System.IO.FileInfo] $second)
    $BYTES_TO_READ = 4096;

    $tmp = $first.Length

    if ($first.Length -ne $second.Length)
    {
    return $false;
    }

    $iterations = [Math]::Ceiling($first.Length / $BYTES_TO_READ);
    $fs1 = $first.OpenRead();
    $fs2 = $second.OpenRead();

    $one = New-Object byte[] $BYTES_TO_READ;
    $two = New-Object byte[] $BYTES_TO_READ;

    for ($i = 0; $i -lt $iterations; $i = $i + 1)
    {
    $secondIterations = $fs1.Read($one, 0, $BYTES_TO_READ);
    $fs2.Read($two, 0, $BYTES_TO_READ) | out-null;

    for ($j = 0; $j -lt $secondIterations; $j = $j + 1)
    {
    if ($one[$j] -ne $two[$j])
    {
    return $false;
    }
    }
    }

    $fs1.Close();
    $fs2.Close();

    return $true;
    }

    ReplyDelete
    Replies
    1. Edit:
      Line $tmp = $first.Length doesn't belong there (just for testing).

      My Bad.

      Delete

Please feel free to leave a comment. When you are using the option to react anonymous, please add your name to the comment ;-).