Using empty subclasses to add to the semantics of a class hierarchy?

I was wondering whether using (almost) empty derived classes to give additional semantics to a class hierarchy was a good idea (or practice) ?

Because a (not so short) example is always better than a long speech, let’s assume we’re writing a class hierarchy for SQL generation. You want to represent a data manipulation command (INSERT, UPDATE or DELETE), so you have two alternatives :

First version, with switch statement

enum CommandType {
    Insert,
    Update,
    Delete
}

class SqlCommand {
    String TableName { get; }
    String SchemaName { get; }
    CommandType CommandType { get; }
    IEnumerable<string> Columns { get; }
    // ... and so on
}

class SqlGenerator  {
    string GenerateSQLFor(SqlCommand command)
    {
        switch (command.CommandType) 
        {
            case CommandType.Insert:
                return generateInsertCommand(command);
            case CommandType.Update:
                return generateUpdateCommand(command);
            case CommandType.Delete:
                return generateDeleteCommand(command);
            default:
                throw new NotSupportedException();
        }
    }
}

Second version with VisitorPattern (Note that the question is NOT about whether the Visitor pattern should be used or adding new operations)

abstract class SqlCommand {
    String TableName { get; }
    String SchemaName { get; }
    IEnumerable<string> Columns { get; }
    // ... and so on
    abstract void Accept(SqlCommandVisitor visitor);
}

class InsertCommand : SqlCommand 
{
    overrides void Accept(SqlCommandVisitor visitor) 
    {
        visitor.VisitInsertCommand(this);
    }
}

class DeleteCommand : SqlCommand 
{
    overrides void Accept(SqlCommandVisitor visitor) 
    {
        visitor.VisitDeleteCommand(this);
    }
}

class SqlCommandVisitor  {
    void InitStringBuffer() { ... }
    string GetStringBuffer() { ... }

    string GenerateSQLFor(SqlCommand command)
    {
        InitStringBuffer();
        command.Accept(this);
        return GetStringBuffer();
    }

    void Visit(SqlCommand command) { ... }
    void VisitInsertCommand(InsertCommand command) { ... }
    void VisitDeleteCommand(DeleteCommand command) { ... }
    void VisitUpdateCommand(UpdateCommand command) { ... }
}

With both examples we achieve the same result, but :

In the first version, my code feels more DRY, even though polymorphism is generally preferable over a switch statement.
In the second version, I feel like I’m needlessly deriving SqlCommand just so that the code carries more meaning.

Another similar case is deriving a class from a generic class, just to give additionnal meaning to the collection, eg :

class CustomerList : List<Customer> { }

I’m left off wondering whether this is a good idea. Are there any pitfalls to doing so ?

I personally prefer the second version exactly because it adds meaning to the code, and helps me when I read my code later… But I clearly lack the experience to see any downside to this pattern.

Your second example has the benefit of no case statement, and no CommandType. This means you can a new statement type without having to modify a case statement. A case statement ought to cause an OOP programmer to pause a moment and consider whether the design should be changed to get rid of the case statement. A variable denoting a thing’s type ought to cause the programmer to pause a long while.

However, the visitor pattern doesn’t seem to be needed here. Simpler would be to use a class hierarchy, as you did, but then put the SQL generation inside that hierarchy rather than having it be a separate class:

class SqlCommand {
  abstract String sql;
}

class InsertCommand : SqlCommand {
  overrides String sql {...};
}

class DeleteCommand : SqlCommand {
  overrides String sql {...};
}

The code inside the sql functions will probably turn out to have shared behavior, or enough complexity, that moving the shared behavior and complexity into a separate SqlBuilder class would be good:

class InsertCommand : SqlCommand {
  overrides String sql {
    builder = SqlBuilder.new
    builder.add("insert into")
    builder.add_identifier(table_name)
    buidler.add("values")
    builder.add_value_list(column_names)
    ...
    return builder.sql
  };
}

The SqlCommand drives the process of creating the sql, pushing its details into the SqlBuilder. This keeps the SqlCommand and SqlBuilder decoupled, since the SqlCommands do not have to expose their entire state for the builder to use. This has the benefits of your second example, but without the complexity of the visitor pattern.

If it turns out that sometimes you need the sql built one way and sometimes another (perhaps you want to support more than one SQL back end), then you abstract the SqlBuilder class, like so:

class SqlBuilder {
  abstract void add(String s);
  abstract void add_identifier(String identifier);
  abstract void add_value_list(String[] value_list);
  ...
}

class PostgresSqlBuilder : SqlBuilder {
  void add(String s) {...}
  void add_identifier(String identifier) {...}
  void add_value_list(String[] value_list) {...}
  ...
}

class MysqlSqlBuilder : SqlBuilder {
  void add(String s) {...}
  void add_identifier(String identifier) {...}
  void add_value_list(String[] value_list) {...}
  ...
}

Then the sql functions use the builder passed into them, rather than creating one:

class InsertCommand : SqlCommand {
  overrides String sql (SqlBuilder builder) {
    builder.add("insert into")
    builder.add_identifier(table_name)
    buidler.add("values")
    builder.add_value_list(column_names)
    ...
    return builder.sql
  };
}

I prefer the first one, but maybe that’s just me – I like DRY, and like keeping things simple, and there’s nothing simpler than a switch statement for this. (I always think that, unless you have a good reason to refactor the switch into a polymorphic class, this refactoring is mainly to show off how clever code can be).

You are writing a lot more code to achieve the same result, and that requires a lot of cut and paste – and I find c&p coding is one place that bugs are more likely to appear. I also think trying to read the code is more difficult as it skips through all those classes and methods. But ultimately, its just down to taste – they both do the same thing.

For your second question.. it depends, once you derive from a class you’re responsible for checking all the plumbing works. For example, you do not derive from stl container classes because they don’t have virtual destructors, and thus are not designed for inheritance. Unless you know the potential pitfalls of the class you’re inheriting from, its much safer to contain the base class.

The other reason is just semantics, a CustomerList is not a List of Customers, so someone reading your code would have to think “is this the same as a List<> or have they modified the functionality?” If you do not make any modifications, then leave it as the underlying type, or use a typedef/using statement that at least just says “I’ve renamed this for readability only”.

The downside to having more classes in the hierarchy is that when you have to debug things it’s that much more complicated to figure out what’s going on. The deeper your class hierarchy, and the more levels that have actual code, the harder it is to understand the code when debugging (because you always have to be mindful of where you are in the call chain at any point, and what actual object you are, etc, etc).

The real question is: Does this make your code overall more readable? If you’ll only have that switch statement in one spot, then I’d say it’s a win. Whereas, if you’d otherwise end up duplicating that switch statement all over the place, then the second form is probably a win.

Using the type system (in a statically type language) to express semantics is exactly what it is meant for. Consider a type for database IDs that you subclass for each table, say a CustomerId, an OrderId and so on. The benefit is that you never mistake a CustomerId for an OrderId. But they are clearly empty subclasses. See the Google IO talk on GWT by Ray Ryan at around 7:00 for an example.

I like second approach better since when one looks into InsertCommand, this reads as “THERE IS an insert command”, which means that there should be another commands. When you read next, you see these commands and you totally understand the idea.

The first approach reads as “THERE IS a command… with some details in there…”.

No, it’s not a good practice in general.

Any time you encode information into the type system, you’re doing it wrong (in mainstream languages like C++, java, and C#). It then requires you to use a visitor pattern, or a pile of type checks to extract that information which is fragile at best.

Further, in the mainstream languages it makes it very awkward to store/change that information. If you for example wanted to execute a command entered by the user in some SQL IDE.

In short, if you are storing information – just use a variable; that’s what they’re there for.

Filed under: softwareengineering - @ 15:15

Thẻ: object-oriented, readability, semantics

Thiết kế website giá rẻ

Danh mục

Using empty subclasses to add to the semantics of a class hierarchy?