Saturday, November 20, 2010

Seek Position of a String in a File or FileStream

Yesterday I was working on a little bit of code to sniff out some XMP without having to worry about reading a file at a certain pre-defined position. XMP, being just plain XML, can be found by matching a string. After some googling I’ve found out that lots of people use code that read the entire file into memory and perform a regex or a string comparison. That’s not going to work for me, because I have files that are +100MB! So I wrote a class that reads the file byte for byte to search for that string. The method will return the position of the string in the specified file or FileStream.

I've wrapped the code in a simple class:

using System;

namespace Kees.Blog.Examples.FileStreamSeek
{
    public static class Utils
    {
        /// 
        /// Seeks the position of a string of data in the file.
        /// 
        /// The path to a file.
        /// A string to search for.
        /// The position of the string or -1 if 
        /// the string could not be found.
        public static long Seek(string file, string searchString)
        {
            //open filestream to perform a seek
            using (System.IO.FileStream fs = 
                        System.IO.File.OpenRead(file))
            {
                return Seek(fs, searchString);
            }
        }

        /// 
        /// Seeks the position of a string in the file stream. 
        /// It will advance the position of the stream.
        /// 
        /// An open file stream.
        /// A string to search for.
        /// The position of the string or -1 if 
        /// the string could not be found.
        public static long Seek(System.IO.FileStream fs, 
                                string searchString)
        {
            char[] search = searchString.ToCharArray();
            long result = -1, position = 0, stored = -1, 
            begin = fs.Position;
            int c;

            //read byte by byte
            while ((c = fs.ReadByte()) != -1)
            {
                //check if data in array matches
                if ((char)c == search[position])
                {
                    //if charater matches first character of 
                    //seek string, store it for later
                    if (stored == -1 && position > 0
                        && (char)c == search[0])
                    {
                        stored = fs.Position;
                    }

                    //check if we're done
                    if (position + 1 == search.Length)
                    {
                        //correct position for array lenth
                        result = fs.Position - search.Length;
                        //set position in stream
                        fs.Position = result;
                        break;
                    }

                    //advance position in the array
                    position++;
                }
                //no match, check if we have a stored position
                else if (stored > -1)
                {
                    //go to stored position + 1
                    fs.Position = stored + 1;
                    position = 1;
                    stored = -1; //reset stored position!
                }
                //no match, no stored position, reset array
                //position and continue reading
                else
                {
                    position = 0;
                }
            }

            //reset stream position if no match has been found
            if (result == -1)
            {
                fs.Position = begin;
            }

            return result;
        }
    }
}

Example 1: reading a position by a file path:

string file = @"AFilePath.jpg";
long position = Kees.Blog.Examples.Utils.Seek(file, @"xap:CreateDate=""");

Example 2: reading a position by a file stream:

string file = @"AFilePath.jpg";
System.IO.FileStream fs = System.IO.File.OpenRead(file);
long position = Kees.Blog.Examples.Utils.Seek(fs, @"xap:CreateDate=""");

Well... I hope it helps!

1 comment:

  1. It was very useful to me :)
    Mat

    ReplyDelete

Please feel free to leave a comment. When you are using the option to react anonymous, please add your name to the comment ;-).