Information is the new black

It must be the popularising effect of James Gleick’s new book “The Information”, because suddenly everyone I meet wants to talk about information: its history, its epistemology and Shannon-Weaver’s 1948 mathematical theory of communication (MTC), which became known as the mathematical theory of information. This is certainly good news for our information science course, where information has been considered from an academic perspective since 1961. I feel my time has come; all those hours spent memorizing equations to show that I truly, deeply understood how many signals you can push down a channel of a certain size, allowing for noise, have finally been rewarded, and I can now brandish my information-science credentials with a superior air of I told you so. Information is the new black, and everyone is wearing it.

I believed that I would forget Shannon’s theory entirely, as soon as the exam was over. It did not seem so relevant to my work at the time, which was with information resources in toxicology. Life, however, with a patient smirk, ensured that the ashes of the MTC rose like a phoenix 20 years later, when I was faced with presenting the mathematical good news to contemporary LIS students taking our Library and Information Science Foundation module as part of their masters. I dusted off my 1986 copy of Robert Cole’s “Computer Communications”, my notes still there in the margins of page 10, where I left them.

The issue I faced was one of presenting a definition of ‘information-science’, and of outlining its history as a discipline, to modern LIS students. Many of the papers considering the origins of information science gaze back in time to illuminate Shannon’s equations with a rosy pink glow, suggesting that his theory somehow led to the birth of information science as a true science (Shera 1968, Meadows 1987). This was the story in the 1980s, but in the 21st century, a more plausible thread is emphasized, the work of Kaiser, Otlet and Farradane on the indexing of documents, which suggests that the MTC was a bit of a red herring in respect to the history of information science. Rather then that information science grew out of a need to control scientific information, coupled with the feeling amongst scientists that this activity was somehow separate from either special-librarianship or the more continental term for dealing with the literature, documentation (see Gilchrist 2009, Vickery 2004, Webber 2003).

MTC

A look back at the original ideas and documents show that Shannon’s work was built on that of Hartley (1928). Stonier (1990 p 54) refers to Hartley:

“.. who defined information as the successive selection of signs or words from a given list. Hartley, concerned with the transmission of information, rejected all subjective factors such as meaning, since his interest lay in the transmission of signs or physical signals.”

Consequently, Shannon used the term information, even though his emphasis was on signalling. The interpretation of the MTC as a theory of information was thus somewhat coincidental, but this did not prevent it being embraced as a foundation of a true ‘information science’.

Shannon himself suggested that there were likely to be many theories of information. More recently, contemporary authors such as Stonier (1992) and Floridi (2010), have reiterated that MTC is about data communication rather than meaningful information.

Floridi (2010 p 42 and 44) explains:

“MTC is primarily a study of the properties of a channel of communication, and of codes that can efficiently encipher data into recordable and transmittable signals.”

“.. since MTC is a theory of information without meaning, (not in the sense of meaningless, but in the sense of not yet meaningful), and since [information – meaning = data], mathematical ‘theory of data communication’ is a far more appropriate description…”

He quotes Weaver as confirming:

“The mathematical theory of communication deals with the carriers of information, symbols and signals, not with information itself.”

Floridi’s definition of information as ‘meaningful data’ is more aligned to the field of information science as understood for our LIS related courses. Whilst we can still argue what is data and what is meaning, we can see that the MTC utilizes ‘information’ as a physical quantity more akin to the bit, rather than the meaningful information handled by library and information scientists.

This difference is set out  by Stonier (1990, p 17):

“In contrast to physical information, there exists human information which includes the information created, interpreted, organised or transmitted by human beings.”

Nonetheless, the MTC is still relevant to today’s information science courses because it has a played a pivotal role in the subsequent definitions and theories about information per se. And it is rather hard to have information science without an understanding of ‘information’. Many papers have been written on theories of information, and on the relevance of such theories to information science (see, for example Cornelius 2002).

MTC and other disciplines

The MTC provides the background for signalling and communication theory within fields as diverse as engineering and neurophysiology. At the same time that Shannon was writing, Norbert Wiener was independently considering the problems of signalling and background noise. Wiener (1948 p 18) writes that they:

“.. had to develop a statistical theory of the amount of information, in which the unit amount of information was that transmitted as a single decision between equally probable alternatives.”

Further (p 19), that

“This idea occurred at about the same time to several writers, among them the statistician R.A. fisher, Dr. Shannon of the Bell Telephone Laboratories, and the author.”

Wiener decided to:

“call the entire field of control and communication theory, whether in the machine or in the animal, by the name Cybernetics”.

The relationship of information to statistical probability (the amount of information being a statistical probability) meant that information in Shannon and Wiener’s sense related readily to entropy (anecdotally von Neumann is said to have suggested to Shannon that he use the term entropy, as it was already in use within the field of thermodynamics, but not widely understood).

“The quantity which uniquely meets the natural requirements that one sets up for ‘information’ turns out to be exactly that which is known in thermodynamics as entropy.”

Shannon and Weaver (1949) p 103

“As the amount of information in a system is a measure of its degree of organization, so the entropy of a system is a measure of its degree of disorganization; and the one is simply the negative of the other.”

Wiener (1948) p 18

The link between information and entropy had been around for some time. In 1929, Szilard wrote about Maxwell’s demon, which could sort out the faster molecules from the slower ones in a chamber of gas. Szilard concluded that the demon had information about the molecules of gas, and was converting information into a form of negative entropy.

The term ‘negentropy’ was coined in 1956 by Brillouin:

“… information can be changed into negentropy, and that information, whether bound or free, can be obtained only at the expense of the negentropy of some physical system.”

Brillouin (1956) p 154

Brillouin’s outcome was that information is associated with order or organization, and that as one system becomes organized, (entropy decrease), another system must becomes more disorganized (entropy increase).

Stonier (1992 p 10), agrees:

“Any system exhibiting organization contains information.”

A well-known anomaly becomes apparent, however, when over 60 years later we try to understand the correlation between information and either entropy or probability. A trawl through the original equations and explanations, and subsequent revisitations, reveals that an increase in information can be associated with either an increase or decrease in entropy/probability according to your viewpoint. Tom Stonier (1990) refers to this in chapter 5, but Qvortrup (1993) gives a more detailed explanation:

“In reality, however, Wiener’s theory of information is not the same, but the opposite of Shannon’s theory. While to Shannon information is inversely proportional to probability, to Wiener it is directly proportional to probability. To Shannon, information and order are opposed; to Wiener they are closely related.”

The correlation between the measurement of entropy and information did however, lead to the separate field of information-physics, where information is considered to be a fundamental, measurable property of the universe, similar to energy (Stonier 1990).

This field stimulates much debate, and is currently enjoying what passes for popularity in science. A recent article in New Scientist tells how Shannon’s entropy provides a reliable indicator of the unpredictability of information, and of thus of uncertainty, and how this has been related to the quantum world and Heisenberg’s uncertainty principle. Ananthaswamy (2011).

Information-biology also appears to stem from work undertaken around the MTC. The connection between signalling in engineering and physiology was made by Wiener in the 1940s, and in 1944 Schrödinger, in his book “What is Life?”, made a connection with entropy as he considered that a living organism:

“… feeds upon negative entropy.”

Further that:

“.. the device by which an organism maintains itself stationary at a fairly high level of orderliness (= fairly low level of entropy) really consists in continually sucking orderliness from its environment.”

In the same book, Schrödinger outline the way in which genetic information might be stored, although the molecular structure of DNA was not published until 1953, by Crick and Watson (see Crick 1988). The genetic information coded in the nucleotides of the DNA is transcribed by messenger RNA and used to synthesize proteins. Information contained in genetic sequences also plays a role in the inheritance of phenotypes, so that informational approaches have been made within the study of biology (see Floridi 2010, also for discussion of neural information).

Information and LIS

For the purposes of our library and information science courses here at City University, we consider information as that which is ‘recorded for the purposes of meaningful, human communication’. Although I personally find Floridi’s definition helpful, information in our model is open to definition and interpretation, and is often used interchangeably with the term ‘knowledge’. In either case we regard the information as being instantiated within a ‘document’. The term ‘document’ also does not demand a definitive explanation, it merely needs to be understood as the focus of ‘information science’, its practitioners and researchers.

To complete the picture, when I became Program Director for #citylis at City University London, I wanted to strengthen and clarify the way in which we defined ‘information science’, and particularly to explain its relationship with library science (Robinson 2009). I suggested that library science and information science were part of the same disciplinary spectrum, and that information science (used here to include library-science) could be understood as the study of the information-communication chain, represented below:

Author  —> Publication and Dissemination —> Organisation —> Indexing and Retrieval —>  User

The chain represents the flow of recorded information, instantiated as documents, from the original author or creator, to the user. The understanding and development of the activities within the communication chain is what library and information specialists do in both practice and research. As a point of explanation, I take organisation in the model to include the working of actual organisations such as libraries and institutions, information management and policy, and information law. Information organisation per se, fits within the indexing and retrieval category.

Our subject is thus a very broad area of study, one which is perhaps better referred to as the information sciences. The question of how we study the activities of the model can be answered by applying Hjorland’s underlying theory for information science, domain analysis (Hjorland 2002). The domain analytic paradigm describes the competencies of information specialists, such as knowledge organization, bibliometrics, epistemology and user studies. The competencies or aspects distinguish what is unique about the information specialist, in contrast to the subject specialist. Further, domain analysis can be seen as the bridge between academic theory and vocational practice; each competency of domain analysis can be approached from either the point of view of research or of practice.

There are many definitions of information science, and there are other associated theories or meta-theories. The latter of which may also be associated with a philosophical stance. Nonetheless, the model portrayed above has proved to be a robust foundation for teaching and research, yet it is flexible enough to accommodate diverse opinions and debate as to what is meant by ‘information’. It allows for diverse theories of information.

It is interesting to reflect on whether ‘information’ as understood for the purposes of library and information science has any connection with ‘information’ as understood by physics and/or biology, or whether it is a standalone concept. Indeed later authors such as Bateson (1972) have suggested that if information is inversely related to probability, as Shannon says, then it is also related to meaning, as meaning is a way of reducing complexity. Cornelius (2002) reviews the literature attempting to elucidate a theory of information for information science (see also Zunde 1981, Meadow and Yuan 1997).

At a recent conference in Lyon, Birger Hjorland’s (2011) presentation considered the question of whether it was possible to have information science without information. He writes that there should at least be some understanding of the concept that supports our aims, but concludes:

“.. we cannot start by defining information and then proceed from that definition. We have to consider which field we are working in, and what kind of theoretical perspectives are best suited to support our goals.”

I agree with him. I do not think we can have information science without a consideration of what we mean by information – but information is a complex concept, and one that can be interpreted in several ways, according to the discipline doing the interpretation, and then again within any given discipline per se. It is not an easy subject to study, despite its sudden popularity. The literature of information theory is extensive, and scary maths can be found in most of it. Nonetheless, it is essential for anyone within our profession to have in mind an understanding of what we are working with; otherwise it is impossible to justify what we are doing, and we appear non-descript. Understanding information is like wearing black. Any colour will do, but black makes you look so much taller and slimmer.

References

Ananthaswamy A (2011). Uncertainty untangled. New Scientist. 30th April. 2011, 28-31

Bateson G (1972). Steps to an ecology of mind. Ballantine: New York

Brillouin L (1956). Science and information theory. Academic Press: New York

Cornelius I ( 2002). Theorizing information science. Annual Review of Information Science and Technology 2002. 393-425

Crick F (1988). What mad pursuit. A personal view of scientific discovery. Penguin: London

Floridi L (2010). Information: a very short introduction. Oxford University Press: Oxford

Gilchrist A (2009). Editorial. In: Information science in transition. Facet: London

Hartley RVL (1928). Transmission of information. Bell system Tech. Journal, vol 7 535-563

Hjorland B (2011). The nature of information science and its core concepts. Paper presented at: Colloque sur l’épistémologie comparée des concepts d’information et de communication dans les disciplines scientifiques (EPICIC), Université Lyon3, April 8th 2011. Available from: http://isko-france.asso.fr/epicic/en/node/18

Meadow CT and Yuan W (1997). Measuring the impact of information: defining the concepts. Information Processing and Management, vol 33(6) 697-714

Meadows AJ (1987). Introduction. In: The origins of information science. Taylor Graham: London

Qvortrup L (1993). The controversy of the concept of information. Cybernetics and Human Knowing, vol 1(4) 3-24

Robinson L (2009). Information science: communication and domain analysis. Journal of Documentation, vol 65(4) 578-591

Schrödinger E (1944). What is life? The physical aspect of the living cell. Cambridge University Press: Cambridge

Shannon CE and Weaver W (1949). The mathematical theory of communication. University of Illinois Press: Urbana

Shera JH (1968). Of librarianship, documentation and information science. Unesco Bulletin for Libraries, 22(2) 58-65

Stonier T (1992). Beyond information. The natural history of intelligence. Springer-Verlag: New York

Stonier T (1990). Information and the internal structure of the universe. Springer-verlag: New York

Szilard L (1929). Uber die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Zeitschrift fur Physik, vol 53 840-856

Vickery B (2004). The long search for information. Occasional Papers no. 213. Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign

Webber S (2003). Information science in 2003: a critique. Journal of Information Science, vol 29(4) 311-330

Wiener N (1948). Cybernetics: or control and communication in the animal and the machine. Wiley: New York

Zunde P (1981). Information theory and information science. Information Processing and Management, vol 17(6) 341-347