About data

I have recently watched the documentary The Great Hack, which explains the Cambridge Analytica scandal: how they used a great amount of data from facebook to spread biased and fake messages and influence the outcome of Brexit and the US presidential election of 2016. The film focuses a lot on the role of big data-owning corporations such as facebook, google, or amazon (the usual suspects). It hints towards holding these companies responsible for what people do with the data they collect, and sort of portraits them as “evil”, but perhaps too much [1]. It also blames the technology and the “algorithms”, whatever those are. In my opinion, this is a very shallow analysis of what is actually going on, and I think they got it all upside down. Let me break down this problem into three parts.

The technique

People that do not understand what is going on behind the curtains of recommendation algorithms tend to see the spread of targeted manipulative information as an extremely bad thing (which it is), and how music streaming services that suggest music that you like as an extremely good thing (which it is). I am no expert in recommendation systems myself, but I know enough to know that the principles behind these two tasks are basically the same. So blaming and prohibiting the technique itself will mean that a lot of wonderful things we have today will no longer be available. And a huge field of research, which can be used for good, will go away. I don’t think this is a good direction to follow…

The data

Our data is being collected all the time. I know that, you know that, everyone knows that. It is because of this data, and the techniques, that we can talk to our phones, that we can search for a photo by a keyword, that our watches can measure our steps, and many other conveniences that make our lives today very comfortable. The corporations that collect and use the data to develop these wonderful things know how this is sensitive information, and how everything should be super anonymized if it is supposed to go into the hands of others (such as researchers).

In the case of Cambridge Analytica in particular, I think Facebook has actually little blame on this [1]. The story plays out like this. A researcher at Cambridge University developed an app and provided it to Cambridge Analytica. The company, in turn, used this app to do a survey and people had to agree that their data was going to be used for academic purposes. It turns out that, because of the way facebook was designed, the app also collected data from all the friends of the person taking the survey. Now, I think this is a serious design flaw, and whoever saw that, the researcher or whoever was working on this at Cambridge Analytica, should have at least informed facebook about it. Instead, they have exploited this to get all the data, and kept quiet. Needless to say, there was no “academic purpose” there. So facebook’s fault is this design flaw. They have openly apologized for that, and implemented more strict privacy rules (GDPR) since then. However, it is not fair to place the responsibility of how others used this data solely on them.

These big data companies tend to be already protective enough of their data. Ask any researcher that needs data how hard it is to get a fraction of what they have. So it is not like facebook is completely irresponsible and is just allowing anyone to tap into their pool. By the way, the researcher who developed the app was actually a consultant for facebook. Which makes me even more suspicious why he wouldn’t report such a breach in the first place.

The use

This is the crucial point in my opinion, and the point that is most overlooked. The problem, in the end, was not the existence of data or fancy techniques to process it. The problem is how this framework was used. And it is not like it was used for a super novel thing. The data was used for propaganda, and propaganda has been around for centuries. And people warning about the dangers of propaganda have also been around for a while now. For example, this quote from Everett Dean Martin: “Propaganda is making puppets of us. We are moved by hidden strings which the propagandist manipulates.” is from 1929. 1929!

The problem this time is that we had propaganda on steroids. Because of the amount of data Cambridge Analytica had, and the ability to quickly analyse it, they could provide the most targeted and efficient advertisements. If you think about it, this is the dream of *every* advertisement agency. Now, does that mean we should forbid propaganda? Well, I lean towards yes, but I also admit that there is a thing such as good propaganda [2]. So we should not discard it as an activity altogether. One could say propaganda/advertisement [3] should not be biased, but isn’t that an inherent characteristic of it? When people are trying to sell something (a product, an idea, an image), they kind of have to be biased towards that thing.

So what is it about the Cambridge Analytica case that bothered so many people (myself included)? I have had this lingering feeling that it crossed a line. But which line? After thinking a lot about it, I suspect this line is ethics.
First of all, these people were using exaggerated messages, sometimes blatant lies, to attack the opposition. Attacking the opposition is a low blow of desperate people, specially in politics [4]. This is a dirty move which I find disgusting. You want to advertise yourself? Fine. Show off your advantages, don’t hit the adversary. That makes me very angry.
Secondly, this was targeted to influence people on important decisions, which ideally should be taken with the minimum amount of bias. The only way people are going to start to think critically, is if they have to in order to form an opinion. If they are bombarded with information from only one side, it is brainwashing. I can see why a politician or government would think this is a good thing (“I know what is best for the people, I am just making them realize faster”). However, if you stop to really think about it, this is a disservice for the society as a whole. It simply creates citizens that follow the herd, and are lazy to think for themselves [5].
At this point, one possible solution would be to forbid advertisement related to elections and any issue that would affect society as a whole. Honestly, I think this would be a good thing, and it could solve many problems we face today in a democratic process (not only the one presented here). But I need to think more about this one.

Ultimately, this whole situation is much more complicated than portrayed in “The Great Hack”. There are fundamental problems that we have been facing for many years now, it is simply exacerbated by the effectiveness of a new method. The issue, as I see, is not the method in itself, but the fundamental problem of advertising/propaganda. Where do we draw the line of good/acceptable propaganda, and unacceptable one? Who should be responsible for regulating this? As usual, there is a whole area of study about persuasion, advertisement, etc. As usual, we will only pay attention and try to solve it when things explode and it becomes obvious that this is the problem. We never learn…

[1] And I hate Facebook…
[2] Government funded campaigns for vaccination, anti-tobacco, more exercises, etc.
[3] I am using the two words interchangeably, since it is quite debatable what distinguishes one from the other. In Portuguese, for example, there is only one word to describe what would be two different things for an English speaking person.
[4] 11 years ago, this was an issue in the election of a mayor where I come from. I wrote about here (in Portuguese), after hearing an admission that, still today, makes me furious.
[5] For example, do you think I just thought about all of these things at once? It took me weeks thinking about the documentary, discussing it with other people, and researching, until I formed this opinion (which could be changed in light of new information).