Data Does Not Organize Itself


9 Feb , 2020  

In researching the relation between weaving and computation, I ran across this astounding passage by Ellen Harlizius-Klück:

What the Digital Humanities features, is rather the digital processing and representation of data. The concept of the digital itself is just as little explored as manual aspects of programming, which always include the question of how data is classified. “Digital” has, strangely enough, the meaning of “objective” against “analog”, “hermeneutic”, or “interpretative”. On the Herrenhausener Conference in December 2013 was as predicted the end of theory or hypothesis-based research and Lev Manovich advised: “Do not start with research questions! Look at the data instead.” This I think is a dangerous misconception. The success of the CERN in discovering Higgs boson can teach us something quite different. In an article in DIE ZEIT of 2011, the Speaker of the detector team explained that out of 40 million data delivered by the LHC only 200 “interesting results” were used for evaluation. This makes 0.005 per thousand! And when is a result interesting? If it fits a previously formulated model or a hypothesis. Physicists are safe to ignore 99.9995, or rounded 100 per cent of their data due to a hypothesis. Data does not organize itself.

Lev Manovich’s book Language of New Media, published almost twenty years ago, did much to propel the incipient field of digital studies. His turn toward big data in recent years is more problematic. I would agree with Harlizius-Klück that Manovich’s “theory < data” is a “dangerous misconception.” It’s also just wrong: data are always the result of theory; there is no data that is not already the result of a hypothesis, which is to say a kind of active mental speculation.

More interesting is Harlizius-Klück’s second claim, that the vast majority of data — a statistical totality (!) — had to be ignored by physicists at the Large Hadron Collider. Why? Because their theory said so. I’m reminded how this is all wrapped up in larger discussions of “normal science” and “epistemological breaks,” a la Thomas Kuhn and Gaston Bachelard.

People want to believe that we live in a purely empirical world. For instance, behavioral economics is all about nudging, Latour keeps insisting we need to read William James, and Deleuze’s radical empiricism has never been more popular. But data does not organize itself. As Vilém Flusser liked to point out, even a seemingly innocent piece of data is itself the result of a complex apparatus of technology and knowledge. Photographs, Flusser argued, are “the products of applied scientific texts” (Towards a Philosophy of Photography, 14), channeling, in a sense, what Marxists refer to as “general intellect.”

Data does not organize itself. A simple vignette proves it: if Google or Facebook could create, organize, and valorize data, they’d have their perpetual motion machine and they wouldn’t need any of us. The loop would be closed; production would be frictionless. So while “absence of evidence is not evidence of absence” it’s pretty clear that  if Google could do frictionless value production they would have done it already and they’d be stockpiling gold bullion on the moon. Truth is, Google et al. has figured out the third step but not the first two. They know how to valorize data. But they don’t know how to create it, and they don’t know how to organize it. They outsource those things–data creation and data organization–outsourcing them to chumps like us.

For more on the relation between weaving and computation I would suggest Harlizius-Klück’s article “Weaving as Binary Art and the Algebra of Patterns.”

taken from here


, ,