Taking Care of Business (Analysis): December 2012

I'm reading Michael Crichton's last novel called Micro and I love the introduction. I'd like to post some of it here:

Perhaps the single most important lesson to be learned by direct experience is that the natural world, with all its elements and interconnections, represents a complex system and therefore we cannot understand it and we cannot predict its behavior. It is delusional to behave as if we can, as it would be delusional to behave as if we could predict the stock market, another complex system. If someone claims to predict what a stock will do in the coming days, we know that person is either a crook or a charlatan. If an environmentalist makes similar claims about the environment, or an ecosystem, we have not yet learned to see him as a false prophet or a fool.

Human beings interact with complex systems very successfully. We do it all the time. But we do it by managing them, not by claiming to understand them. Managers interact with the system: they do something, watch for the response, and then do something else in an effort to get the result they want. There is an endless iterative interaction that acknowledges we don't know for sure what the system will do--we have to wait and see. We may have a hunch we know what will happen. We may be right much of the time. But we are never certain. Interacting with the natural world, we are denied certainty. And always will be.

Working at an investment bank is challenging. There's constant pressure to deliver, not screw up, reduce operational risk (which is the new buzzword that represents the same concepts that we've always been wanting less of), reduce TCO (another buzzword), etc. and of course to produce new software that helps the business do deals or reduce their manual workload.

Here are a few of the challenges:

Maintaining Legacy Software and/or trying to Decomm Legacy software
Satisfying audit & internal regulatory requirements that have little impact
Hiring button pushers rather than smart people who can actually debug issues

I will address the first two points in this post & leave the last for another day.

We have tons of old, legacy software whose original developers have moved on. I'm talking about software that just runs and that is mostly a black box. We feed them (i.e., keep the lights on, reboot the servers on a routine basis, monitor them) so they keep running but occasionally they fail. That means I need to pull a developer who is working on a new project to resuscitate the old software. This is painful because the developer is not very familiar with this software and it takes time to figure things out. Users may be impacted and thus screaming and escalating to senior managers, so this adds to the pressure. Note that most if not all of this software was not constructed with the latest agile practices like TDD so there are no unit or integration tests to speak of, and the documents (if they even exist) are yellow & crusty.

Of course, making changes to this software is risky, because the developer could make a change that impacts another part of the software and break something else. And without a good suite of tests, how will you ever know? Given the time pressure to get this working again in production, we can't do adequate testing. So we release into production and hope for the best. And we document what we learned in a wiki or FAQ, so we can at least build our knowledge base (hopefully). Then we all context-switch back to our new software development.

This is an oft-repeated pattern in my group and I would not be surprised if this is universal. (or I would be surprised if this was NOT universal!) Additionally, morale slips because we lose time fixing old software, we disappoint both the users of the legacy software and the new users who are expecting their new software on time, etc. Developers become disappointed because they can't seem to get traction on the new software as well.

So as a general rule of thumb, we should try to minimize modifications to the legacy code-base. If you need to do it, then do it. But let's look at one 'need' in more detail; this brings me to the 2nd bullet point above: Satisfying audit & internal regulatory requirements that have little impact

I'm currently on a project to satisfy an internal audit point (not a government regulatory item) that requires me to modify a legacy piece of software (which by the way we are trying to decommission). This change, I'm finding is incredibly risky. One, it touches at least a dozen files across the board and the software is responsible for getting all kinds of data intra-day and End of day. Two, the developer is in Singapore whereas the rest of the team is in NY and so we can't even talk real-time about the changes. Three, there are other, new components that the legacy software will need to make use of and so may require a 3rd party to be available on stand-by. Taken all together, it would have been wiser to get an exemption for this.

Unfortunately, this comes back to the way I presented the problem to my manager. I hadn't thought through all the ramifications of the change. I only knew I didn't think it was in our best interests. But I went to my manager and all I said was "Look, we have this audit point but we want to decomm this software so do we really want to do this?" and he said, "Yes, we need to". So off, I marched.

My point is that I didn't have a strong enough view on the situation. At that moment in time, I only had a 'leaning' but not a clear enough picture of what the whole thing would entail. If I did, I could have possibly persuaded him that this wasn't in our best interests to do.

So what's the lesson here? When modifying legacy software, (1) get the scope of the change (how long it will take, impact assessment, how many files/modifications need to be made, etc), (2) assess overall risk, (3) if too risky, push back and make the case as persuasively as possible.

Audit points, regulatory risk, operational risk are big buzzwords that make people jump and immediately act but wait! not so fast... (1) there could be abuse of those terms and they can be inappropriately applied, especially if there are other agendas involved like politics & (2) they should not all be treated equally. Some audit points just aren't as important as others and we need to really understand what's important and what's not. It's similar to taking something in the spirit of the law and not just going by verbiage. So if the risk to satisfy this point is riskier than not satisfying it, well, that should be taken into account. We shouldn't just blindly do it because that's what the audit point says. Let's have a conversation about it and negotiate.

So in writing this point and thinking further about this, what I will do is: re-examine the exact scope of the change, and if warranted, re-have the same conversation with my manager asking for an exemption. I'll fill him in on the exact risks and have him sign off on email if he still wants to proceed. That way, if there are issues, he can't say he didn't know about them. And I'll have done my due diligence.

Taking Care of Business (Analysis)

Recent Posts

Sunday, December 30, 2012

Complex Systems

Saturday, December 15, 2012

Legacy Software