Data are imagined and enunciated against the seamlessness of phenomena (Gitelman, 2013, p.2).
Data is a word used across most spheres of life. Whether it be mobile phone plans, empirical research or mathematical and computational analysis – data has become a ubiquitous concept in contemporary society. Despite the many different applications of the word, data has had surprisingly little critical conceptualisation. While my focus in this project is on digital data, it is important to understand the etymology of the word, as this frames how we understand and use the word today.
Contemporary dictionaries define ‘data’ as ‘facts and statistics collected together for critical analysis’ (Oxford English Dictionary). However, the word ‘data’ comes from the Latin verb dare – to give – and emerged in the English language in the 17th century. Early usage of the word can be found in the work of Joseph Priestley – a polymath, theologian and pioneer of ‘data graphics’ (Rosenberg, 2013), who used it to describe “historical facts” as data. The increasing use of the term in the 18th century aligned with the development of the ‘modern concepts of knowledge and argumentation’ (Rosenberg, 2013, p.15). Data was mainly used in mathematics and theology – an unsurprising coincidence given the almost religious faith placed in data-driven insights today.
Critical data scholar, Rob Kitchen (2014a), has argued that technically, data should be referred to as capta – that which is captured, not given. This is an important point. Even as early as the 17th century data was being used to refer to information beyond the empirical – meaning information is ‘captured’ according to the motive and method of the individual. By the late 18th century, the connotations of the word had shifted to mean ‘facts in evidence determined by experiment, experience or collection’ (Rosenberg, 2013, p.33). During this time, data could be something that is subjectively derived or experienced, meaning it is no longer a given or a fact. Despite these shifts, the original definitions of the word have persisted. Contemporary definitions of data, like the one from the Oxford English Dictionary above, highlight an inherent relationship to facts, even though clearly not all data are facts (see also Floridi, 2008). This helps to explain just why ‘data’ is often thought to be resistant to deconstruction or questioning – the connotations that it is a given fact persist.
Sister terms – fact and evidence – depicted in the Google Books Ngram image above, highlight the trends in usage across centuries. No doubt the latitude the word enjoys and the rise of the digital explain the exponential growth in use in the late 20th early 21st century. Yet unlike a fact, data does not correspond with any ontological truth. For example, if a fact is proven wrong, it is no longer a fact, but false or incorrect data is still data (Rosenberg, 2013, p.18). In this sense, data is purely rhetorical. In some respects, this has been augmented by the increasing use of the word data as a mass noun with singular verb form – i.e. ‘this data’ as opposed to ‘these data’ – and the virtually non-existent use of the singular ‘datum’. Indeed, data no longer exists as isolated factoids, but as ‘aggregative’ datasets (Gitelman, 2013).
Usage of the word data has clearly shifted and proliferated across time. While some research contexts, such as the humanities and the arts, seek to conserve the role of analogue data in empiricism (see Kitchin, 2014b), in the main we tend to think about data today as digital. What is significant with digital data is the many ways that it can be generated, collected and used. Formatting of data is particularly important to argumentation. As Joselit (2015) explains, when it comes to data ‘formatting is as much a political as an aesthetic procedure…determining a format thus introduces an ethical choice about how to produce intelligible information from raw data’ (p.268). In this way, data is a ‘matter for the disciplines’ as data are ‘cooked’ according to the ‘varied circumstances of their collection, storage and transmission’ (Gitelman, 2013, p.3).
Perhaps most confusing are computational definitions, in which data is defined as a collection of binary elements that are processed and transmitted electronically (Floridi, 2008). The main limitation here is that what data is representing becomes confused with the format its encoded in. Data can be both digital and analogue, yet typically form takes precedence over substance, particularly when considering the focus of ‘big’ data on velocity, volume and variety (Laney, 2001). The logic of big data prioritises the digital form of data over the appropriateness and fairness of the representational form. Floridi (2008) proposes instead a ‘diachronic’ definition of data, where data are meaningful because they capture and denote variability and difference, such as unique patterns and wavelengths.
So what does all this mean in the context of this project? First, while data are created, culled and used for specific purposes, the connotation that it is factual prevails. False data remains data. So too does data collected inappropriately or data processed with unjust algorithms. What needs to be remembered is that data is not given, but is instead built or created through human speculation and prejudice (Fuller, 2014). In this way, it is just as important to identify what is not captured through data or what falls outside a dataset as what is included within. Second, the way data is formatted is key, particularly as data is so frequently used when developing knowledge and argument. Data are defined contextually and relationally, and ‘must always shift with argumentative strategy and context’ (Rosenberg, 2013, p.36). Third, while we typically think of data as digital, it is important to differentiate the form of data from that which it is representing. Given the ubiquity of the word and the slippage between definitions, specifying these differences is key.
References:
Gitelman, L. (2013). Introduction. Raw Data is an Oxymoron. L. Gitelman. Cambridge MA, The MIT Press.
Floridi, L. (2008). Data. International Encyclopedia of the Social Sciences. W. A. Darity Jr. New York, Macmillan 234 – 237.
Fuller, M. (2015). Data. The John Hopkins Guide to Digital Media. M. Ryan, L. Emerson and B. Robertson. Baltimore, MD, The John Hopkins University Press: 125-126.
Joselit, D. (2015). What to do with pictures. Mass Effect: Art and the Internet in the Twenty-First Century. L. Cornell and E. Halter. Cambridge, MA, The MIT Press.
Kitchin, R. (2014a). The Data Revolution: Big Data, Data Infrastructures, & Their Consequences. London, Sage.
Kitchin, R. (2014b). “Big Data, new epistemologies and paradigm shifts.” Big Data & Society April-June: 1-12.
Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety. Application Delivery Strategies. Retrieved from https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
Rosenberg, D. (2013). Data before the fact. In L. Gitelman (Ed.), Raw Data is an Oxymoron (pp. 15 – 40). Cambridge, MA: The MIT Press.