I spreadsheet therefore I am…

...probably extremely frustrated. And, frankly on some kind of list, double underlined for wanton neologism and crimes against innocent nouns. There’s a nebulous link between that frustration, and the argument I’ll be advancing about Excel. Specifically, its widespread use as a records management system is symptomatic, not causal.

Context is everything here; Excel is clearly brilliant at what it’s designed to do. From 20 cells to budget a weekly shop to complex multi-page epics forecasting global sales. Unfortunately, it’s even more successful at what it wasn’t designed for - a repository for hidden data factories. It has become de-facto middleware, the ETL tool of choice and even a clumsy integration bus.

It’s such a powerful and flexible tool, Microsoft may as well market it with the strapline ‘When in doubt, paste it into Row A1’. Even those of us who proudly label ourselves data practitioners are in no away above the daily quick’n’dirty stick in in a spreadsheet and we’ll do it properly later. Later- that lovely cosy word which proxies laziness for rigour.

Mostly though I’m a man for whom a single version of trust and a data lineage you can look up in Debretts are a daily quest. Which is why much of my work recently has been developing the excellent HE Data Capability map for individual universities. This order from chaos starts by matching the generic capabilities with those of the real world by analysing each with a set of lenses around process, applications and data.

The application sweep tends to find Excel brushed from dark corners into the blinking light. It’s being repurposed for everything from master reference data lists, integration points between multiple systems and temporary (although with GDPR upon us, one probably should check exactly HOW temporary) cuts of data gluing disparate processes together.

The consequences are well known; dubious data quality, no mastering whatsoever, errors hidden in complex equations, etc. We could go on but it’s all been done a million times before. Probably documented in a spreadsheet would be my guess and that’s pretty much the point. In the same way that I refuse to countenance the idea that data is somehow bad - it isn’t, it’s not sentient, it is what it is because of what we do to it, it doesn’t KNOW - I’m starting to struggle with Excel being a metaphor for worst practice data management.

I’m not sure it is that at all. Rather it has become the love child of poor process, legacy applications and hard to use data. Most times I hear subject matter experts mutter that yes maybe that particular process does have a sprinkling of spreadsheet, I listen very carefully to the next sentence which explains why.

Take out the nefarious, the corporate rebel, the ‘I’m too special for any process but my own’ and we’re left with the frustration we started with. If I had a pound for every ‘I can’t get the data out of system X in a way we can use’ or ‘That process was written years ago and it just doesn’t work anymore’ and ‘we’ve tried to get it changed but nobody thinks it’s a priority’, I’d have… well enough to add up in a new spreadsheet for sure.

My experience of six sectors and twenty years of mostly crap data management doesn’t suggest good people get up in the morning and declare ‘today is the day I shall take some more data hostage’ or ‘That bloke Bob in accounts, he’s in so much trouble when he uses the numbers I’m going to send him’. It’s more a tired pragmatism in response to a whole load of broken stuff not in their control to fix.

It’s not close to best practice, it cannot scale, it will break eventually and it’s a million miles from sustainable, but the guilt needs to be shared around. We develop shiny applications then fix them in aspic while the organisation cheerfully rewrites all the original requirements. We build processes vertically in the organisation focusing on the boxes not the relationships between them. We hide data behind semantic pedantry, and an approach to the information life cycle chosen apparently to ensure the lowest data quality at the highest possible cost.

We’re not bad people either. It’s the whole Enterprise Architecture argument again which - for the sake of brevity and sanity - I shall summarise by the simple maxim that we never had time to do things right, but we always have time to do things twice. Data Governance is the latest poster child for this holistic view of an asset across the whole university, and it’s being well supported by resources such as the capability model.

All of which is for naught without major behavioral change. By which I mean people giving up their spreadsheets voluntarily and willingly. Not those same people being castigated for casting around for the application of last resort when expensive systems and difficult data have let them down.

If we want to break the spreadsheet culture for records management and dodgy ETL - and for any kind of sustainability and reuse we really must - then we need to offer something better. I’d like to see investment targeted at incremental improvements towards an agreed end state, rather than shiny new things attracting the big budgets. These fanfare great strides forward, but are often implemented as yet another digital silo.

So I don’t believe Excel is at the root of the problem. It’s symptomatic of something bigger. We need to move on to some new thinking. That’s going to require a shelving of personal agendas, an abandonment of some cherished positions, and an unrelenting focus on sustainability and not quick fixes. None of that is easy. If it were, we’d have done it already. What’s different now is the option of doing nothing is being increasingly recognised as a blocker to university goals.

The skill now is not to shoot the messenger.

Alex Leigh is director of the Leigh Partnership, enterprise architect, and data management and data governance specialist. This article was first published on LinkedIn.

< Back to all news