Filtering common starting/ending characters from array/list of strings

  softwareengineering

Ok so for example I have an array of strings with each string as below:

364VMS1029 
364VMSH920 
364VMSH192 
364VMSU839 
364VMN2382 
364VMR223
364VMR2X3 
364VMN829 
364VMN8757 
364VMN831

How can I dynamically get the program to recognise the common characters among all strings in the array, which in this case is 364VM and filter them out?
If there’s no common character, then don’t do anything.

When you have a complicated problem that you can’t solve, try breaking it down into simpler problems and solve those.

Your problem rephrased:

Remove the common prefix and suffix from a list of strings.

A simpler problem would be:

Find the common prefix for a pair of strings.

This should be much simpler to solve, for example like this:

string GetPrefix(string first, string second)
{
    int prefixLength = 0;

    for (int i = 0; i < Math.Min(first.Length, second.Length); i++)
    {
        if (first[i] != second[i])
            break;

        prefixLength++;
    }

    return first.Substring(0, prefixLength);
}

Now that you have this, you can build back to the original problem:

Find the common prefix for a list of strings.

Here, it’s very helpful to realize that the prefix of three strings is the same as the prefix of the prefix of the first two strings and the third string. (Hmm, that sounds confusing, maybe it will be clearer in a more formal notation: prefix(A, B, C) = prefix(prefix(A, B), C).)

This means that you can use the LINQ method Aggregate() on the GetPrefix() method above to get the prefix of a whole list of strings:

string GetPrefix(IEnumerable<string> strings)
{
    return strings.Aggregate(GetPrefix);
}

The next step:

Remove the common prefix from a list of strings.

Now that we can find the common prefix, we can remove it using LINQ Select() and Substring():

IEnumerable<string> RemovePrefix(IEnumerable<string> strings)
{
    var prefix = GetPrefix(strings);

    return strings.Select(s => s.Substring(prefix.Length, s.Length - prefix.Length));
}

This assumes you want to get a new sequence containing the filtered strings. If you want to modify an existing collection, use a for loop instead of the Select().

One last step:

Remove the common prefix and suffix from a list of strings.

I’ll leave this as an exercise for the reader. This answer contains simple code for reversing a string, which could be helpful. (I think you don’t need the code from other answers to that question, since it looks like your strings are ASCII-only.)

You have to compare every character at the beginning of the strings to see how long the longest common prefix is, and stop once you’ve found a difference.

(Depending on how expensive string access is vs. sorting, you might save time by first sorting the list and then looking only at the first and last string – whatever they have in common will, by definition, also be common to all strings between them. But it is a matter of profiling to find out whether this saves or adds time.)

3

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website Kho Theme wordpress Kho Theme WP Theme WP

LEAVE A COMMENT