Right now in my current company I must “parse” a csv and extract some data out of it (in the sense of data mining).
I surprised myself by defining the following data structure (Java 8):
static List<Map<String, Map<String, LocalDate[]>>> intervalStructure
Is this considered bad practice? Should I use a self-define class instead to store some data?
1
This is a dumb, stringly typed data structure. It is a:
- List
- of Maps
- keyed by String
- of Maps
- keyed by String
- of Local date array
- of Maps
This makes it difficult to create, traverse, and could very well get you into code that looks like:
foo.get(1).get("bar").get("qux")[42]
And guess what… you’ve got a null pointer exception that was thrown on that line.
Setting up the structure isn’t much fun either. That this is a rawish data structure means that you have to do all of the adds and deletes by hand working on the interfaces (that can be rather deep) rather than working with a Constructor or Builder.
You’ve linked the representation that is passed around in your code to the representation of the file. When one needs to change, the other needs to change – and possibly in multiple locations. Think of the joys you will have when you have an xml or json or yaml data file behind it and need to go through and change things.
This all in all is a bad idea.
A better idea would be to have one or more classes that exposes the logic of fetching the data. You have a Structure
. It exposes a get(String bar, String qux)
and returns back a List
. This allows you to isolate the traversal of the structure (if it is that) in private fields so that it can change as needed. It also exposes an add(String bar, String qux, LocalDate date)
which allows you to clearly isolate the logic for building it in a testable way.
You may find that you want other questions of that data that you want answered. Maybe you want a isInRange(...)
or contains(...)
or something else that you ask. Failing to have this as a nice class with corresponding methods to answer those questions, you will find that you either have a static method in a helper class (this is beginning to smell) that acts on that data.
public static boolean contains(List<Map<String, Map<String, LocalDate[]>>> arg, LocalDate value)
That looks as ugly as it is. Don’t do that.
If you don’t do that, you will have a copy of that logic each time you use it which is not at all DRY.
Either way, working on raw data structures like this is undesirable.
7
I’m going to respectfully disagree with Michael here, and some rather smart programmers, like Rich Hickey, tend to think along the same lines as me. The key here is that you take care not to pass this data structure all over the place, but you use it as a base layer from which you build the sort of abstractions that Michael talks about in his answer.
These sorts of data structures get a bad rap. They are extremely easy to manipulate, with highly reusable tools. Without knowing anything about your application other than that one line of code defining a data structure, I could write an interface to convert it to JSON, HOCON, or XML. I could store it in a database. I don’t have to write any custom serialization/deserialization code, which lets me reuse generic libraries. I can easily write queries and filters to give me nice strongly-typed class-based layers above this one. When my CSV file format changes, I have this much more flexible and dynamic layer of isolation between it and my more rigid and static classes above it.
In summary, don’t make it your sole data structure, but using it as an intermediate data structure can solve a lot of maintainability problems.
2