Conditionally initializing a string list

I am writing a program that needs to be able to process data from a number of different sources. The sources output data in a variety of formats. So, depending on which source is being used, I need to initialize a string list with a different set of column headers. The quantity of headers varies between 10 and 18 columns depending on the data source.

Essentially, I need a clean method to conditionally initialize a list. I have three working solutions but I am not sure which is the cleanest/best practice:

  • Example #1 requires that I add each header to the list line-by-line. With the number of potential sources and columns that need to be added for each, this list gets pretty long.
  • Example #2 requires that I use {} around each case statement to limit the scope so that I can declare the dataHeaders list individually in each case statement. I have never encountered a situation wherein I had to limit the scope to an individual case statement and as such, I suspect it is bad practice.
  • Example #3 seems like the best option. I actually implemented this one as I was writing this question. I don’t really see any drawbacks here but I thought I’d still pose the question anyhow.

Is there a better/cleaner way to do this? If not, which example is the best practice?

Example #1:

List<string> headers = new List<string>();
switch (dataSource)
{
    case "Source 1":
        headers.Add("header1");
        headers.Add("header2");
        headers.Add("header3");
        headers.Add("...");
        headers.Add("header18");
        break;
    case "Source 2":
        headers.Add("headerA");
        headers.Add("headerB");
        headers.Add("headerC");
        headers.Add("...");
        headers.Add("headerJ");
        break;
    default:
        // TODO: Handle bad sources.
        break;
}

Example #2:

List<string> headers = new List<string>();
switch (dataSource)
{
    case "Source 1":
        {
            List<string> dataHeaders = new List<string>() { "header1", "header2", "header3", "...", "header18" }
            headers = dataHeaders;
            break;
        }
    case "Source 2":
        {
            List<string> dataHeaders = new List<string>() { "headerA", "headerB", "headerC", "...", "headerJ" }
            headers = dataHeaders;
            break;
        }
    default:
        {
            // TODO: Handle bad sources.
            break;
        }
}

Example #3:

List<string> headers = new List<string>();
switch (dataSource)
{
    case "Source 1":
        headers.AddRange(new List<string>() { "header1", "header2", "header3", "...", "header18" })
        break;
    case "Source 2":
        headers.AddRange(new List<string>() { "headerA", "headerB", "headerC", "...", "headerJ" })
        break;
    default:
        // TODO: Handle bad sources.
        break;
}

2

A lot of times, you can really clean up a problem like this by expressing your data in a form with less syntax than a programming language, such as a CSV string. I don’t know C#, but an example in Scala is:

def headersForDataSource(dataSource: String): Option[Array[String]] = {
  val headers = """
    Source1,header1,header2,header3,...,header18
    Source2,headerA,headerB,headerC,...,headerJ
    """

  val lines = headers.lines map (_.trim split ',')
  lines find (_(0) == dataSource) map (_.tail)
}

You can see this makes your list of headers significantly more compact and easier to read, because it doesn’t clutter and intermingle it with the syntax of the data structure. It’s putting it in your preferred format, instead of the format forced on you by the programming language.

What is the issue you’re trying to fix?

If you’re looking to improve performance, then this is definitely a case of premature optimisation. Any of the three methods you give could fill a list with thousands of items in the fraction of a second it takes to lift your finger from the mouse after clicking on a button.

If you’re just trying to make your code less verbose, then a variation on option 3 is the best.

var headers = new List<string>();
switch (dataSource)
{
    case "Source 1":
        headers.AddRange(new [] { "header1", "header2", "header3", "...", "header18" })
        break;
    case "Source 2":
        headers.AddRange(new [] { "headerA", "headerB", "headerC", "...", "headerJ" })
        break;
    default:
        // TODO: Handle bad sources.
        break;
}

Note that you don’t have to specify the type of the list, or the arrays for the elements – these will be automatically inferred by the compiler.

You also can avoid switch and if using maps. I’m going to use Java syntaxis, but I guess it could be done in C# easily

Unmodificable (static or constant) Descriptor

Map<String,String[]> HEADERS_MAP = new HashMap<String,String[]>();
HEADERS_MAP.put("Source 1",new String[]{"header1","headerN"});
HEADERS_MAP.put("Source 2",new String[]{"headerA","headerZ"});
...
//Preventing Map from changes
HEADERS_MAP = Collections.unmodifiableMap(HEADERS_MAP);

Note: I think Maps are Dictorinary in C#

Header builder

//Each consumer will have it's onw list
return Arrays.asList(HEADERS_MAP.get(dataSource));

Edit:
I have edited by first answer @unholysampler’s comment.

Now Map stores String[] and builder returns a new fresh/unbound list. So builder consumer will be able to modify the content of the list and it will prevent code from unwanted changes. It could be done with Collections.unmodificableListbut that way adds little bit more code and complexity. It does code less readable.

2

I think a slightly modified option 2 is your best bet. Leave the declaration of headers outside the switch, and inside the switch just set headers = new List

You also don’t need the curly braces to create blocks for each case statement; C# will force you to have break at the end of each case.

The advantage of doing this is that if you add a new case, you’ll be warned that headers isn’t initialized if you try to use it later, where as your current #2 will just end up with an empty list, which may be a logic error to track down later.

Here is shorter version, combining suggestions of Peregrine and Karl Bielefeldt:

string hline;
switch (dataSource)
{
    case "Source 1":
        hline = "header1,header2,header3,...,header18";
        break;
    case "Source 2":
        hline = "headerA,headerB,headerC,...,headerZ";
        break;
    default:
        // TODO: Handle bad sources.
        break;
}
var headers = new List<string>();
headers.AddRange(hline.Split(','));

If you can refer to your different data sources by an index from a consecutive interval, I would consider to avoid the whole switch-case at all, and initialize hline like this:

 string hLine = new []{
   "header1,header2,header3,...,header18",
   "headerA,headerB,headerC,...,headerZ",
   "..."}[dataSourceIndex];

Direct answer

Based on the options presented, I’d err towards #2 (i.e. collection initializer), but dataHeaders is an unnecessary inbetween step (just assign the new list to headers directly) and I do prefer multiline here for readability’s sake.


Improving the approach

However, your suggested solutions all give me a magic string bee in my bonnet.

These headers might change one day, when one of the sources decides to update its own code, or when you shift towards using new sources. If you push this header configuration to a config file (of any sort you’d like), then you get the best of both worlds:

  • No more magic string header values.
  • Your code no longer needs to worry about how to initialize a bunch of string values.
  • Adding more sources no longer requires adding more code, which is OCP-friendlier.

There are plenty of ways to write this data to an external file. To keep things simple, I’m just going to assume a flat txt file whose filename is the source name.

// Source 1.txt

header1
header2
header3
...
header18

Your code can then parse this file, instead of needing to juggle all the values in the file.

public IEnumerable<string> GetHeadersForSource(string sourceName)
{
    var filepath = Path.Combine(myDirectory, $"{sourceName}.txt");

    if(!File.Exists(filepath))
    {
        // return string[0];
        // OR
        // throw new FileNotFoundException("...", filepath);
    }

    var headers = File.ReadAllLines(filepath)
                      .Select(line => line.Trim())
                      .Where(line => !String.IsNullOrWhitespace(line))
                      .ToList();

    return headers;
}

And now your entire switch case can be replaced with a single statement:

var headers = GetHeadersForSource(dataSource);

Disclaimer
This example focused on the header values. You’re still using a magic string for sources themselves, which should also be changed, but I considered to be out of scope for the posted question.

Why not use a Dictionary?

var headers = new List<string>();
var headersToAdd = new Dictionary<string, string[]>
{
    {"Source 1", new string[] {"header1", "header2"} },
    {"Source 2", new string[] {"headerA", "headerB"} }
};

headers.AddRange(headersToAdd["Source 1"]);
headers.AddRange(headersToAdd["Source 2"]);

1

If you prefer a functional style with immutability:

    static IEnumerable<string> GetHeaders(string dataSource)
    {
        switch (dataSource)
        {
            case "Source 1":
                yield return "header1";
                yield return "header2";
                yield return "header3";
                yield return "...";
                yield return "header18";
                yield break;
            case "Source 2":
                yield return "headerA";
                yield return "headerB";
                yield return "headerC";
                yield return "...";
                yield return "headerJ";
                yield break;
            default:
                // TODO: Handle bad sources.
                yield break;
        }
    }

2

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *