Big Data
Viktor Mayer-Schönberger and Kenneth Cukier
Overview: In this book, Viktor Mayer-Schönberger and Kenneth Cukier argue that the emergent ability to
process ever-increasing amounts of data allows us to shift from causation to correlation standards for
understanding the world around us. The authors work hard to convince us that the “what” of
correlation now trumps the “why” of causation. They argue that future operators will be part of a “big
data value chain” that will place a premium on the labor of data collectors and interpreters as “the”
value added workers of the future. In addition, they alter our questions of privacy protection and make the user
responsible and accountable for the protection of “big data.”
-
In Chapter 1, Mayer-Schönberger and Cukier (M-S&C) use examples ranging from bird flu to Google to fare casting for airline tickets to lay down
their rationale for big data’s value. They use facial recognition as an example of how big data can provide inferred probabilities of
information to the analyst. They conclude the chapter noting three key shifts in the information age: we now have the ability to analyze
big data, we must lessen our desire for exactness, and we must move away from our search for causality.
-
Discussion: Do you agree with the authors’ assertion that “by changing the amount [of data], we change
the essence” (pg. 10), and will big data change the human experience as much as some of its messengers claim?
Does big data signal an evolutionary or revolutionary change in human interactions, and what is the practical
significance of either conclusion as it relates to the nature and character of war? Are there fundamental limits
to what we can capture and process as data?
-
In Chapters 2 – 5, the authors argue that the “more” factor in “more data” provides
a value that trumps precision. In other words, scale has a value all its own. As sample sizes increase, so
does the utility of the information and findings to the big data analyst. They note the tremendous increase
in information rates since the days of the Gutenberg press, the spread of networks with multiple interconnecting
communities leading us back to Turing’s “statistical probability of occurrence” that was at the heart of breaking
the Enigma code in World War II. They note that today, Amazon can use one’s buying choices to predict future
purchases with a high level of accuracy. M-S&C conclude that the world is rapidly approaching a “datafication of all”
the information around us such that virtually all human activity can become data for analysis.
- Discussion: What are the underlying scientific assumptions about predictability and randomness built into most big
data applications? Are there fundamental limits to what we can capture and process as data, and the degree to which
we can extract value from it in terms of identifying correlation and causation? What kinds of problems will big data
driven approaches excel at solving, and in which cases are they likely to introduce new forms of confusion and
complexity rather than providing additional clarity?
-
In Chapter 6, M-S&C seek to convince us that data now has its own value, providing the researcher with an option
value for data similar to stored versus potential energy. However, data, unlike energy, does not necessarily
decrease in value with use. It is agnostic to who is using it and analysts can extrapolate data in new ways to
extend its value into new fields of inquiry. That said, they note that some value does have a “shelf life”
and can depreciate over time.
- Discussion: Does the reality of cyber warfare, the normal degradation of data storage media and handling systems,
and the possibilities of intentional subversion and deception change the inherent value of big data? How can we
preserve that value in the face of such threats? How will the increased use of artificial intelligence affect our
ability to use open data and metadata?
-
In Chapters 7 – 9, the authors discuss implications, risks, and control issues as big
data expands in our lives. A key implication is the shift in future work force requirements. They argue that
data collectors, providers, and operators [analysts] could replace current reliance on subject matter experts
because asking the right question of the data should provide a better analysis at the end of the day. Mayer-Schönberger
and Cukier foresee the following as potential risks with big data: privacy may suffer as big data leads to the
de-anonymization of society’s members, governments may find their “secure” data less secure than desired or used
in a [Minority Report] manner to punish people based on probability versus action, and/or producing a “dictatorship of
data” where one fixates on the data and loses sight of the purpose for its use. These possibilities lead to their
concerns for control issues and a possible major societal shift in the tension between individual privacy and accountability
for how one might use big data.
- Discussion: What are the potential impacts – positive and negative - as tasks usually performed by humans become
increasingly given to machines, and which tasks should perhaps never be delegated? How do we capture, analyze, and
evaluate risk in such a world? What unintended consequences are we already seeing as the “Internet of Things”
places computing power and data collection in places it has never been before, including basic appliances, tools, toys,
clothing, etc.? What are the legal and ethical implications of increased reliance on big data driven applications?
-
In their conclusion, the authors reiterate their concerns and highlight some hopes for the future. They remain
concerned over the need to depart from previous reliance on causation, shifting toward the correlation potential of
big data analysis. The product of our abilities to use big data, they note, are arriving with new tools and algorithms,
which they believe we can use to make probabilistic determinations of future events. The authors reemphasize concerns
for privacy versus accountability, while noting that more extensive data sets will be available in the future. They do
caution that the effect of big data on societies is something we have yet to determine, while recommending we approach
the future “use [of] this tool with a generous degree of humility…and humanity.”
-
Discussion: Are there any theoretical concepts or examples from military history that would help to usefully illuminate
the challenges of evaluating the claims of either big data proponents or their critics? As innovators attuned to the
fundamental nature of war and strategy, what approaches can we take to gain military advantage in the world of big
data without either assuming too much or too little about how big data will affect the nature and character of human
competition and cooperation?