Ethics of Big Data

This is pretty accurate. Never expected to learn data science from a Dhamma talk.

Not being able to really opt-out from these services is also correct: you generate data on behalf of your friends as well. Although it is now illegal in most countries (at least in the EU) to profile unregistered users, data laws are quite hard to enforce. Also they are coming for your eye movement data as well through VR headsets (because eye tracking improves the quality of virtual reality headsets’ rendered images).

The problem is that you can not be an effective service without exploiting user data - and the more interactions users have, the more data they generate as a byproduct. For instance, it is possible to detect with high accuracy if a person is walking up stairs, standing in a queue or if they have certain kinds of mental health problems just by using the raw data from the accelerometer and gyroscope in their smartphone (ie the movements detected by their phone that is mostly in their pocket). The more user data you have the less possible anonymization techniques become.

I’m interested in your views on how data mining can be done ethically. It might be helpful to others too.


“how data mining can be done ethically”.

imo this ship has sailed, made friends in every port, caught and distributed a large variety of resistent social diseases, lied to mother, and sank somewhere in the Bermuda Triangle. (No offense to Bermuda.)

Ethically would mean IMO fully informed non transferable consent. Plus care for possible negative consequences, no matter the ignorance or trust offered by targets. IMO

Good question. :slight_smile: I look forward to reading more opinions. May your practice flourish, and may all be freed from suffering.


Okay, that made me laugh.

But seriously, my point is that

  1. receiving and handling of user data is inevitable
  2. solutions that work well must be based on insights based on user data
  3. even with the best intentions, partial data leaks and accidentally finding sensitive user information is inevitable
  4. even if you give detailed information to the users, they will most likely ignore it, and one of the points of data mining to find patterns in data that were previously unknown, so it is impossible to list up front how user data will be used in the future

So it’s not just corporations selling user data for selfish reasons, it’s more like making stuff that people find useful in one of the most competitive environments while also inevitably opening up a pandora’s box.

1 Like

I am glad to participate in that laugh :slight_smile:

I think i see some of your concerns and interests in this topic, but imo humans simply lack the capacity at this time to handle big data or data mining skillfully (in a buddhisty sense.) i really do think at this time that

Ethically would mean IMO fully informed non transferable consent. Plus care for possible negative consequences, no matter the ignorance or trust offered by targets. IMO

:slight_smile: maybe your interest might contribute to benefit many beings!

1 Like

It’s frequently used in medical research, to prevent accidents and injury, and so on.
Those uses seem ethical to me.
To have the data and not use it seems ethically questionable.

It’s done ethically by considering the ways in which the data will be used.
In some cases it’s done by “sanitizing” the data to make it anonymous.

In many cases the “size” of the data (big data, small data) is often not relevant to the ethics. Because there is a definitional ambiguity in the idea of “big”, a number of people in the so-called big data and data mining fields warn that the use of the “big” label in “big data” may be more of a marketing ploy or an attempt to impress. Medium and small sized data can also be used for good or ill.

But do we mean the same thing?
Data mining can be prevented by not tracking and collecting the data in the first place.

Data mining is the process of discovering patterns in large data sets … Data mining - Wikipedia

While the term “data mining” itself may have no ethical implications, it is often associated with the mining of information in relation to peoples’ behavior (ethical and otherwise).
The ways in which data mining can be used can in some cases and contexts raise questions regarding privacy, legality, and ethics.

1 Like

Good point.
There is a well developed system of ethics for the data used in medical research. Hospitals and academia in western countries have ethical review boards that has to approve the use of data that ends up in medical journals and for most government use.
There are similar considerations for data used in the social sciences.

So there are some good guidelines in place for some people and uses.

1 Like

These are valid points, however there are circumstances when some data is generated during communication due to the nature of these processes (then again, one could always discard them right after the events). It also gets harder to properly anonymize data as the amount of data increases and more diverse datasets become available on the same user.

My point is that it’s pretty much inevitable for online businesses to rely on user data, so having Buddhist ethical guidelines on the topic would be useful.

Huh. That became a huge problem for me in my last years of teaching statistics and empirical research I’ve been very unhappy to have not found some serious discussion from a buddhist platform. I even began getting ethical obstacles to motivate the students into involving into it… Let’s see, whether something comes out here (or elsewhere)

Nessie Your experience suggests that I’m going to appreciate hearing your perspective. But what please are the words “that” and “it” referring to?

Feynman - “that” means: to deal seriously and ethically with the “big data” problem. And “it” means to motivate the students to involve themselves into the matters of empirical survey and statistic evaluation. I used to use at least the last session of our course to become reflective about what we have learnt and what we would do later when the students have professional jobs with the need of application of empirical tools for evaluation of their clientel and/or their own professional setting - and the unavoidable problem for instance of becoming “indiscrete” with the matters of their clientel. But also we reflected the general tendency of getting total insight in the matters of society and of living together - the tendency of some totalitarism inside that and of its mental factors. Unfortunately I could not make contact with peer teachers to develop some position and methodology about that part of the problem, as easily being in conflict with a sense of right livelyhood . And so on. The last two years (2016-2018) have then been especially difficult also because of decreasing health and decreasing ability to sort things out.