SQL queries in integration tests

Current situation

I have a situation that I find frustrating: the integration tests in my project (a RESTful API) contain both API HTTP calls and SQL queries. Whenever an API call is made that writes something to the DB (a POST, PUT or PATCH request), a hand-written SQL query is launched to compare the API response with the DB. So:

  1. Send POST request, receive response
  2. Send SQL query, receive response
  3. Compare the two responses

The integration tests live in a separate project, so it’s impossible to just use the main project’s ORM and unmarshal the DB responses into the business objects that we are using.

Issue

I have an issue with the hard dependency between the DB structure and the integration tests.

The project is young and the database structure is constantly changing. Whenever a change in the DB occurs, SQL queries in all affected integration tests go out of sync and must be manually updated. When the change is small, it’s usually not a big deal, but whenever e.g. a column is changed to a foreign key, all the affected queries suddenly start to require a join, the selected columns need table aliases to avoid ambiguous column names.

That becomes an issue especially when the tests themselves build the SQL queries from smaller building blocks: a WHERE clause might be passed as an argument to a method that outputs the SQL or maybe a second query might be build based on some parametres used in the first one.

Finally, whenever a collection is contained in the API response, that’s usually a join in the SQL that needs some additional processing (change n selected rows into a single row with a collection in one of the columns).

Atop of that, given the fact that the integration tests take a few hours to complete, there’s no immediate feedback that there is an issue, which prolongs the time needed to fix any issues.

In short, a small change in the DB structure often results in several man days wasted on updating the SQL and the SQL building logic in the integration tests.

Rationale

The rationale for having SQL queries in the integration tests seems to be that the team has encountered issues with the DB structure: the fetched data was correct, but it was saved incorrectly in the database.

That’s the explanation given to me when I raised the issue. I don’t buy it.

Solution

So what’s the best approach?

I believe that if we’re testing the API, we should rely on the API calls exclusively to compare the responses and eliminate the SQL altogether, so:

  1. Send POST request, receive response
  2. Send GET request for the affected entity, receive response
  3. Compare the two

However, if the team states that the database structure must be verified, I do not know what solution to offer them instead of what we have currently.

I would appreciate any insight, suggestions or links to (preferably short) articles that tackle this issue.

Sounds like a right convoluted codebase that is a nightmare. Is this a new thing, or a massive legacy piece of poop that you’re maintaining? If the former – you got other problems.

However. There’s nothing wrong with verifying the data store to see it contains what is expected. It is a kind of short-cut to verify future API calls will be correct (ie you say the correct data was returned from the API but the data stored was wrong – that suggests the wrong data will be returned eventually, possibly after the server is restarted and cache is cleared and re-populated with bad data that should have been saved correctly in the first place. Good luck finding that kind of error quickly, but if you check the DB, then it’ll be obvious what has gone wrong straight away. Remember data in the DB is the golden data, everything else is secondary).

So you have issues creating the integration tests. You could make the codebase easier to generate these, so anyone making a DB schema change or code change then knows which integration tests to update. Someone always has to update tests when code changes, make sure its the person changing the code that is responsible for updating the tests.

I would try desperately to reduce the coupling inside the codebase though, if 1 table is handled by 1 module and that module only, it makes it much easier to know which tests are involved.

The next thing to do is to split the integration tests into smaller pieces that can be run immediately, like unit tests. Then you’ll at least have some quicker feedback.

The 3rd option is not to check the results from the APIs but to have fixed data that runs against a fixed set of DB data, then you can simply check the DB against itself, rather than the returned API response.

1

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *