This is not a question of getting out of UPDATE_ROLLBACK_FAILED
, but getting into it.
This issue rarely occurs on my Prod deployment, but when it does the solutions sound rather risky because no one has “tested” it out on how a particular strategy works.
For e.g, a stack update rollback is stuck in UPDATE_ROLLBACK_FAILED
because a particular lambda can’t be rolled back, as its older runtime is discontinued.
I would have liked to be able to reproduce this state and see how the following strategy would have worked:
“Roll back by skipping the problematic resources”
On the next upgrade what happens to that un-rollback-able resources?
- It is successfully ignored because it doesn’t need an update.
- A duplicate of that resources is created.
- CloudFormation throws an error.
While a direct answer of the above strategy is appreciated,
It would be interesting if there was a reliable way to reproduce the UPDATE_ROLLBACK_FAILED
state.