Monthly Archives: February 2012

Why I Don’t Trust Data.

Dilbert Cartoon

I’m a moron. Why? Because I prefer manipulative anecdotes to data.

Why does data escape labeling? The reason I guess is that most people think that data is telling the truth, that looking at raw data is someone a window into the soul. I think data is more manipulative than any anecdote I’ve ever heard, and far more dangerous given that so many people regard it as simple truth.

There are two ways I think data is manipulated, and why it can’t always be trusted.

1. Data is mis-represented. Take a look at this piece of research from Think With Google. Have a look at the top stat:

39% of smartphone shoppers user videos while researching and shopping.

39%? That certainly seems like a reasonably high number of people – we should all probably start filling YouTube with product videos and paying Google to promote them for us, right?

Well, on slide 3 of the actual presentation, they explain how the data was collected, and who qualified for the survey:

1186 respondents who qualified for the survey after indicating they had both a) shopped / purchased a tablet/TV/smartphone and b) watched some form of online video in the past 6 months

So hang on a sec. What we’re talking about here isn’t actually smartphone shoppers, but smartphone shoppers who have already watched some form of online video in the past 6 months; shoppers who have already exhibited the behaviour of watching online videos. Tapping into existing behaviour is much easier than forming new behaviours, so it’s absolutely no surprise to see this number up at almost 40%. But this isn’t actually justification to pursue product video as a strategy for our product, because we don’t know what percentage of our product’s consumers do this.

So what do we know? All we know is that 39% of people who check online video already also do it when researching smartphones. Which is potentially quite a different claim to the one that is actually made. Note as well that by presenting it this way, it makes the number seem much smaller – “wow, only 39% of people who watch online video bother to watch product demos”, rather than “wow, 39% of smartphone shoppers watch product videos”.

Suddenly the context has shifted, and with it so has what the data is appearing to tell you.

And this problem is not just in the presentation of this data, but also in the way this data is then used. It may be that smartphone shoppers are a tech-savvy bunch where almost none of them do not watch online video. But you can’t just take this statistic and apply it to other categories and sectors. Laundry detergent buyers won’t have the same profile as smartphone buyers. If you’re in the laundry detergent business, then this data is useless to you. But, statistics probably don’t exist for laundry detergent shoppers, so people use what they can find. And what the can find is wrong.

2. Data is mis-collected. Our behaviour sometimes leaves direct trails of data. For example, as we browse our way around a website, data is collected on what pages we visit, how long we spend on each page, what we click on each page and so on. No-one has to ask us to tell them how long we spent on a page. Other data however is collected by asking people directly what they think or do – we’re asked what we think about something, and if enough of us answer in the same way as to be statistically significant, it becomes ‘truth’.

To me, this seems to ignore the whole field of behavioural economics, and the fact that human beings rarely behave like rational creatures. There are a myriad of ways that people could misrepresent themselves when asked to provide data, but below are just two that occur to me right now

1. We answer questionnaires from the point of view of the ‘perfect me’. When you answer a questionnaire, you’re not answering from the point of view of who you actually are, but from who you’d like to be seen to be. Often these are very similar, but are usually different enough to create potentially interesting results. When I fill in a survey on recycling, for example, am I filling it in as the true me, or the me I’d like to present to the world? And how different are those two points of view? Many people say the environment is important, and when surveyed say that they use ‘green’ goods. And yet the numbers rarely correspond with the actual check-out data. The numbers who actually buy ‘green’ goods are lower than the numbers who say they do.  Needless to say, if people aren’t answering the questions honestly in the first place, trusting your data is going to be a little tricky.

2. We want to please. Most people are quite keen to please others, particularly those who are rewarding us to do something. So if people who completed the survey were persuaded to do so through some form of reward mechanism, whether actual payment or competition opportunity or whatever, then the results you get back will be influenced by their desire to please. You won’t be looking at real results, but at the results the people completing the survey thought you wanted to see. You’ve just accidentally hired a bunch of ‘yes’ men.

It’s important to note that I’m not saying ‘all data is therefore pointless’, but that by presenting data as ‘truth’ we’re encouraging people to accept it wholesale, which leaves them open to more manipulation than any anecdote could ever cause. We need to be thorough in our interpretation of the numbers and not just blindly accept that just because we call it data it is somehow indisputable truth.