The ideal world

R&D, Support Service and the IT Ops collaborate
in the same room, on the same incident.

Investigation gets easy and fast.
Root cause gets identified in no time.
Resolution is just a matter of hours.

The reality

R&D, Support Service and the IT Ops reside in their own space.
People work in silos.

And multiple support levels co-exist : L1, L2..

Silo multiplication makes things worse.
Human talents are required to make the link.

The production

Production environment is an excellent stronghold.

Expected for security reasons. No discussion.

You do not let strangers accessing it easily, even colleagues.

So data access is under strict control : procedures must be followed.
Every investigation step takes time.

The geography

All actors are not always in the same geographical region.
Servers hosted in the US, support in India, R&D in Europe.

Communication gets slower because different time zones.
You’ll get your answer tomorrow morning.

Differences of culture can also affect the communication.

Coordination is key.

The means

IT operators, support people and developers do not think the same way.
They basically do not speak the same language.

Tools at their disposal are really different :
 – Production tools are monitoring oriented, very expensive.
 – Customer support have often home made investigation tools.
 – R&D tools are powerful but not adapted for production environments.

Issue resolution is currently relying again on human talents : the ones able to abstract the communication and the ones able to take advantage of the various tools. Rare people.

The organizations

Different companies can be involved : barriers get in place.
It means different internal process flows and company cultures.

Management and administrative rules can slow down investigations.

Team rotation is also a reality in support and production.
And resource turnover, especially in developing nations, cannot be ignored on medium term.

Sorry but your support contact has unfortunately left the company.
Yes I know, he was very knowledgeable on the product.

The Ping Pong game

And finally are the interactions, called the ping pong game.

Successive round trips, taking time to access partial information,
hoping R&D will get the right ball to catch the issue.

Some people do play the watch, asking useless questions
while waiting for knowledgeable people to appear.

At the end of the day, the end user expects to get the critical issue already resolved. If not the case, he must be informed about the root cause and the resolution time. Will the ping pong game allow it ?

Behind the scene, multiple failure factors are probably active.

What are the needs ?

How to make the incident management fast and efficient ?