Science and Information Theory, Second Edition by Leon Brillouin (Dover Phoenix Editions: Dover Publications) unabridged republication of the edition published by Academic Press, Inc., New York, 1962. 81 figures. 14 tables. Author Index. Subject Index.
A classic source for understanding the connections between information theory and physics, this 1962 work was written by one of the giants of twentieth-century physics. Leon Brillouin's Science and Information Theory applies information theory to a wide variety of problems—notably Maxwell's demon, thermodynamics, and measurement problems—and is appropriate for upper-level undergraduates and graduate students.
Brillouin begins by defining and applying the term "information." His topics include the principles of coding, coding problems and solutions, the analysis of signals, a summary of thermodynamics, thermal agitation and Brownian motion, and thermal noise in an electric circuit. A discussion of the negentropy principle of information introduces the author's renowned examination of Maxwell's demon. Concluding chapters explore the associations between information theory, the uncertainty principle, and physical limits of observation, in addition to problems related to computing, organizing information, and inevitable errors.
Also available from Dover
Information Theory and Statistics by Solomon Kullback. 416pp. 5% X 8y.
Information Theory by Robert Ash. 352pp. 5x X 8m. 66521-6
An Introduction To Information Theory by J. R. Pierce. 320pp. 5% x 8K.
Excerpt: A new scientific theory has been born during the last few years, the theory of information. It immediately attracted a great deal of interest and has expanded very rapidly. This new theory was initially the result of a very practical and utilitarian discussion of certain basic problems : How is it possible to define the quantity of information contained in a message or telegram to be transmitted ? How does one measure the amount of information communicated by a system of telegraphic signals ? How does one compare these two quantities and discuss the efficiency of coding devices? All of these problems, and many similar ones, are of concern to the telecommunication engineer and can now be discussed quantitatively.
From these discussions there emerged a new theory of both mathematical and practical character. This theory is based on probability considerations. Once stated in a precise way, it can be used for many fundamental scientific discussions. It enables -one to solve the problem of Maxwell's demon and to show a very direct connection between information and entropy. The thermodynamical entropy measures the lack of information about a certain physical system. Whenever an experiment is performed in the laboratory, it is paid for by an increase of entropy, and a generalized Carnot Principle states that the price paid in increase of entropy must always be larger than the amount of information gained. Information corresponds to negative entropy, a quantity for which the author coined the word negentropy. The generalized Carnot Principle may also be called the negentropy principle of information. This principle imposes a new limitation on physical experiments and is independent of the well-known uncertainty relations of quantum mechanics.
About the 2nd edition: Without changing the general structure of the book a number of improvements, corrections, and explanations have been introduced in the first chapters. The last chapters, 21 and 22, are completely new and are concerned with the line of research followed by the author during the last few years. Some new problems are introduced. The role of errors in scientific observations is reexamined and leads to a critical discussion of the idea of "determinism." Scientists believe in determinism, but they are completely unable to prove it, because their experiments always lack in accuracy.
The lack in accuracy is also of very special importance in the definition of very small distances. Here, the experimenter or the mathematician violently disagree; the experimenter refuses to consider discussion of things he cannot measure. This operational viewpoint leads to some curious consequences which are discussed in the last chapter
From the Author’s Introduction: A new territory was conquered for science when the theory of information was recently developed. This discovery opened a new field for investigation and immediately attracted pioneers and explorers. It is an interesting phenomenon to watch, in the history of science, and such a sudden expansion of the domain of scientific research deserves closer consideration. How did it happen ? How far does it reach ? And where can it still expand ? Does it mean an invasion of science into a territory traditionally belonging to philosophy, or is it a discovery of a new country, of some "no man's land," which has escaped previous exploration ? All of these questions should be examined and carefully answered.
First of all, what is "Information" ? Let us look at Webster's dictionary: "Communication or reception of knowledge or intelligence. Facts, ready for communication, as distinguished from those incorporated in a body of thought or knowledge. Data, news, intelligence, knowledge obtained from study or observation ... " We may state that information is the raw material and consists of a mere collection of data, while knowledge supposes a certain amount of thinking and a discussion organizing the data by comparison and classification. Another further step leads to scientific knowledge and the formulation of scientific laws.
How is it possible to formulate a scientific theory of information ? The first requirement is to start from a precise definition. Science begins when the meaning of the words is strictly delimited. Words may be selected from the existing vocabulary or new words may be coined, but they all are given a new definition, which prevents misunderstandings and ambiguities within the chapter of science where they are used. It may happen that the same word is used with different meanings in two different branches of scientific research: The word "root" has one clearly defined meaning for the student of algebra and another equally specific meaning for the botanist. There is, however, little danger of confusion in such widely separated fields. The "roots" of algebra do not grow, and the botanist's "roots" are never imaginary! This uniqueness of the meaning of words is characteristic of the scientific method. Since similar definitions have been introduced by scientists of every country, translation is made easy by a "one-to-one" correspondence between scientific vocabularies. If such a situation prevailed in everyday usage, international understanding would be very much easier!
The layman has an uneasy feeling when common words are used with a new scientific definition, and he may be inclined to call such usage "scientific jargon."
But "jargons" are the rule in every specialized field — in theology or even in philosophy — as well as in engineering. The lay reader cannot understand the language of the specialists because he does not know enough about the matters under discussion.
The precise definition of words in the scientific language is usually based on two distinct methods: In mathematics, definitions start with a certain number of carefully selected and stated postulates, and more complex entities are derived from, and defined in terms of, these initial postulates. The new definitions amount to a verbal translation of formulas given symbolically and based on postulates. Experimental sciences have introduced another type of definition, often called "operational." Force, mass, velocity, etc., are defined by a short description of the type of experiment required for the measurement of these quantities. The operational point of view in the experimental sciences has been strongly recommended by many prominent scientists, and the name of P. W. Bridgman is often quoted in this connection. As a rule it has been found advisable to introduce into the scientific language only those quantities which can be defined operationally. Words not susceptible of an operational definition have, usually, eventually been found untrustworthy, and have been eliminated from the scientific vocabulary. For example, remember the "ether," and how relativity theory rendered the term meaningless.
Returning to information theory, we must start with a precise definition of the word "information." We consider a problem involving a certain number of possible answers, if we have no special information on the actual situation. When we happen to be in possession of some information on the problem, the number of possible answers is reduced, and complete information may even leave us with only one possible answer. Information is a function of the ratio of the number of possible answers before and after, and we choose a logarithmic law in order to insure additivity of the information contained in independent situations. These problems and definitions are discussed in Chapter I, and constitute the basis of the new theory.
The methods of this theory can be successfully applied to all technical problems concerning information : coding, telecommunication, mechanical computers, etc. In all of these problems we are actually processing information or transmitting it from one place to another, and the present theory is extremely useful in setting up rules and stating exact limits for what can and cannot be done. But we are in no position to investigate the process of thought, and we cannot, for the moment, introduce into our theory any element involving the human value of the information. This elimination of the human element is a very serious limitation, but this is the price we have so far had to pay for being able to set up this body of scientific knowledge. The restrictions that we have introducedenable us to give a quantitative definition of information and to treat information as a physically measurable quantity. This definition cannot distinguish between information of great importance and a piece of news of no great value for the person who receives it.
The definition may look artificial at first sight, but it is actually practical and scientific. It is based on a collection of statistical data on each problem to be discussed, and these data, once available, are the same for all observers. Hence our definition of information is an absolute objective definition, independent of the observer. The "value" of the information, on the other hand, is obviously a subjective element, relative to the observer. The information contained in a sentence may be very important to me and completely irrelevant for my neighbor. An item from a newspaper may be read with some interest by many readers, but a theorem of Einstein is of no value to the layman, while it will attract a great deal of attention from a physicist.
All these elements of human value are ignored by the present theory. This does not mean that they will have to be ignored forever, but, for the moment, they have not yet been carefully investigated and classified. These problems will probably be next on the program of scientific investigation, and it is to be hoped that they can be discussed along scientific lines.
The present theory extends over the "no man's land" of absolute information, problems that neither scientists nor philosophers ever discussed before. If we reach into the problems of value, we shall begin to invade a territory reserved for philosophy. Shall we ever be able to cross this border and push the limits of science in this direction ? This is for the future to answer.
The definition of absolute information is of great practical importance. The elimination of the human element is just the way to answer a variety of questions. The engineer who designs a telephone system does not care whether this link is going to be used for transmission of gossip, for stock exchange quotations, or for diplomatic messages. The technical problem is always the same: to transmit the information accurately and correctly, whatever it may be. The designer of a calculating machine does not know whether it will be used for astronomical tables or for the computation of pay checks. Ignoring the human value of the information is just the way to discuss it scientifically, without being influenced by prejudices and emotional considerations.
Physics enters the picture when we discover a remarkable likeness between information and entropy. This similarity was noticed long ago by L. Szilard, in an old paper of 1929, which was the forerunner of the present theory. In this paper, Szilard was really pioneering in the unknown territory which we are now exploring in all directions. He investigated the problem of Maxwell's demon, and this is one of the important subjects discussed in this book. The connection between information and entropy was rediscovered by C. Shannon in a different class of problems, and we devote many chapters to this comparison. We prove that information must be considered as a negative term in the entropy of a system; in short, information is negentropy. The entropy of a physical system has often been described as a measure of randomness in the structure of the system. We can now state this result in a slightly different way:
Every physical system is incompletely defined. We only know the values of some macroscopic variables, and we are unable to specify the exact positions and velocities of all the molecules contained in a system. We have only scanty, partial information on the system, and most of the information on the detailed structure is missing. Entropy measures the lack of information; it gives us the total amount of missing information on the ultramicroscopic structure of the system.
This point of view is defined as the negentropy principle of information, and it leads directly to a generalization of the second principle of thermodynamics, since entropy and information must, be discussed together and cannot be treated separately. This negentropy principle of information will be justified by a variety of examples ranging from theoretical physics to everyday life. The essential point is to show that any observation or experiment made on a physical system automatically results in an increase of the entropy of the laboratory. It is then possible to compare the loss of negentropy (increase of entropy) with the amount of information obtained. The efficiency of an experiment can be defined as the ratio of information obtained to the associated increase in entropy. This efficiency is always smaller than unity, according to the generalized Carnot principle. Examples show that the efficiency can be nearly unity in some special examples, but may also be extremely low in other cases.
This line of discussion is very useful in a comparison of fundamental experiments used in science, more particularly in physics. It leads to a new investigation of the efficiency of different methods of observation, as well as their accuracy and reliability.
An interesting outcome of this discussion is the conclusion that the measurement of extremely small distances is physically impossible. The mathematician defines the infinitely small, but the physicist is absolutely unable to measure it, and it represents a pure abstraction with no physical meaning. If we adopt the operational viewpoint, we should decide to eliminate the infinitely small from physical theories, but, unfortunately, we have no idea how to achieve such a program.
Discovering Knowledge in Data: An Introduction to Data Mining by Daniel T. Larose (Wiley-Interscience) Data mining can be revolutionary—but only when it's done right. The powerful black box data mining software now available can produce disastrously misleading results unless applied by a skilled and knowledgeable analyst. Discovering Knowledge in Data: An Introduction to Data Mining provides both the practical experience and the theoretical insight needed to reveal valuable information hidden in large data sets.
Employing a "white box" methodology and with real-world case studies, this step-by-step guide walks readers through the various algorithms and statistical structures that underlie the software and presents examples of their operation on actual large data sets. Principal topics include:
Data preprocessing and classification
Neural and Kohonen networks
Hierarchical and k-means clustering
Model evaluation techniques
Complete with scores of screenshots and diagrams to encourage graphical learning, Discovering Knowledge in Data: An Introduction to Data Mining gives students in Business, Computer Science, and Statistics as well as professionals in the field the power to turn any data warehouse into actionable knowledge.
Excerpt: Data mining is predicted to be "one of the most revolutionary developments of the next decade," according to the online technology magazine ZDNET News (February 8, 2001). In fact, the MIT Technology Review chose data mining as one of ten emerging technologies that will change the world. According to the Gartner Group, "Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques."
Because data mining represents such an important field, Wiley-Interscience and Dr. Daniel T. Larose have teamed up to publish a series of volumes on data mining, consisting initially of three volumes. The first volume in the series, Discovering Knowledge in Data: An Introduction to Data Mining, introduces the reader to this rapidly growing field of data mining.
Human beings are inundated with data in most fields. Unfortunately, these valuable data, which cost firms millions to collect and collate, are languishing in warehouses and repositories. The problem is that not enough trained human analysts are available who are skilled at translating all of the data into knowledge, and thence up the taxonomy tree into wisdom. This is why this book is needed; it provides readers with:
Models and techniques to uncover hidden nuggets of information
Insight into how data mining algorithms work
The experience of actually performing data mining on large data sets
Data mining is becoming more widespread every day, because it empowers companies to uncover profitable patterns and trends from their existing databases. Companies and institutions have spent millions of dollars to collect megabytes and terabytes of data but are not taking advantage of the valuable and actionable information hidden deep within their data repositories. However, as the practice of data mining becomes more widespread, companies that do not apply these techniques are in danger of falling behind and losing market share, because their competitors are using data mining and are thereby gaining the competitive edge. In Discovering Knowledge in Data, the step-by-step hands-on solutions of real-world business problems using widely available data mining techniques applied to real-world data sets will appeal to managers, CIOs, CEOs, CFOs, and others who need to keep abreast of the latest methods for enhancing return on investment.
DANGER! DATA MINING IS EASY TO DO BADLY
The plethora of new off-the-shelf software platforms for performing data mining has kindled a new kind of danger. The ease with which these GUI-based applications can manipulate data, combined with the power of the formidable data mining algorithms embedded in the black-box software currently available, make their misuse proportionally more hazardous.
Just as with any new information technology, data mining is easy to do badly. A little knowledge is especially dangerous when it comes to applying powerful models based on large data sets. For example, analyses carried out on unpreprocessed data can lead to erroneous conclusions, or inappropriate analysis may be applied to data sets that call for a completely different approach, or models may be derived that are built upon wholly specious assumptions. If deployed, these errors in analysis can lead to very expensive failures.
"WHITE BOX" APPROACH: UNDERSTANDING THE UNDERLYING ALGORITHMIC AND MODEL STRUCTURES
The best way to avoid these costly errors, which stem from a blind black-box approach to data mining, is to apply instead a "white-box" methodology, which emphasizes an understanding of the algorithmic and statistical model structures underlying the software. Discovering Knowledge in Data applies this white-box approach by:
Walking the reader through the various algorithms
Providing examples of the operation of the algorithm on actual large data sets
Testing the reader's level of understanding of the concepts and algorithms
Providing an opportunity for the reader to do some real data mining on large data sets
Algorithm Walk Throughs
Discovering Knowledge in Data walks the reader through the operations and nuances of the various algorithms, using small-sample data sets, so that the reader gets a true appreciation of what is really going on inside the algorithm. For example, in Chapter 8, we see the updated cluster centers being updated, moving toward the center of their respective clusters. Also, in Chapter 9 we see just which type of network weights will result in a particular network node "winning" a particular record.
Applications of the Algorithms to Large Data Sets
Discovering Knowledge in Data provides examples of the application of various algorithms on actual large data sets. For example, in Chapter 7 a classification scheme as problem is attacked using a neural network model on a real-world data set. The resulting neural network topology is examined along with the network connection weights, as reported by the software. These data sets are included at the book series Web site, so that readers may follow the analytical steps on their own, using data mining software of their choice.
Chapter Exercises: Checking to Make Sure That You Understand It
Discovering Knowledge in Data includes over 90 chapter exercises, which allow readers to assess their depth of understanding of the material, as well as to have a little fun playing with numbers and data. These include conceptual exercises, which help to clarify some of the more challenging concepts in data mining, and "tiny data set" exercises, which challenge the reader to apply the particular data mining algorithm to a small data set and, step by step, to arrive at a computationally sound solution. For example, in Chapter 6 readers are provided with a small data set and asked to construct by hand, using the methods shown in the chapter, a C4.5 decision tree model, as well as a classification and regression tree model, and to compare the benefits and drawbacks of each.
Hands-on Analysis: Learn Data Mining by Doing Data Mining
Chapters 2 to 4 and 6 to 11 provide the reader with hands-on analysis problems, representing an opportunity for the reader to apply his or her newly acquired data mining expertise to solving real problems using large data sets. Many people learn by doing. Discovering Knowledge in Data provides a framework by which the reader can learn data mining by doing data mining. The intention is to mirror the real-world data mining scenario. In the real world, dirty data sets need cleaning; raw data needs to be normalized; outliers need to be checked. So it is with Discovering Knowledge in Data, where over 70 hands-on analysis problems are provided. In this way, the reader can "ramp up" quickly and be "up and running" his or her own data mining analyses relatively shortly.
For example, in Chapter 10 readers are challenged to uncover high-confidence, high-support rules for predicting which customer will be leaving a company's service. In Chapter 11 readers are asked to produce lift charts and gains charts for a set of classification models using a large data set, so that the best model may be identified.
DATA MINING AS A PROCESS
One of the fallacies associated with data mining implementation is that data mining somehow represents an isolated set of tools, to be applied by some aloof analysis department, and is related only inconsequentially to the mainstream business or re-search endeavor. Organizations that attempt to implement data mining in this way will see their chances of success greatly reduced. This is because data mining should be view as a process.
Discovering Knowledge in Data presents data mining as a well-structured standard process, intimately with managers, decision makers, and those involved in deploying the results. Thus, this book is not only for analysts but also for managers, who need to be able to communicate in the language of data mining. The particular standard process used is the CRISP–DM framework: the Cross-Industry Standard Process for Data Mining. CRISP–DM demands that data mining be seen as an entire process, from communication of the business problem through data collection and management, data preprocessing, model building, model evaluation, and finally, model deployment. Therefore, this book is not only for analysts and managers but also for data management professionals, database analysts, and decision makers.
GRAPHICAL APPROACH, EMPHASIZING EXPLORATORY DATA ANALYSIS
Discovering Knowledge in Data emphasizes a graphical approach to data analysis. There are more than 80 screen shots of actual computer output throughout the book, and over 30 other figures. Exploratory data analysis (EDA) represents an interesting and exciting way to "feel your way" through large data sets. Using graphical and numerical summaries, the analyst gradually sheds light on the complex relationships hidden within the data. Discovering Knowledge in Data emphasizes an EDA approach to data mining, which goes hand in hand with the overall graphical approach.
HOW THE BOOK IS STRUCTURED
Discovering Knowledge in Data provides a comprehensive introduction to the field. Case studies are provided showing how data mining has been utilized successfully (and not so successfully). Common myths about data mining are debunked, and common pitfalls are flagged, so that new data miners do not have to learn these lessons themselves.
The first three chapters introduce and follow the CRISP–DM standard process, especially the data preparation phase and data understanding phase. The next seven chapters represent the heart of the book and are associated with the CRISP–DM modeling phase. Each chapter presents data mining methods and techniques for a specific data mining task.
Chapters 5, 6, and 7 relate to the classification task, examining the k-nearest neighbor (Chapter 5), decision tree (Chapter 6), and neural network (Chapter 7) algorithms.
Chapters 8 and 9 investigate the clustering task, with hierarchical and k-means clustering (Chapter 8) and Kohonen network (Chapter 9) algorithms.
Chapter 10 handles the association task, examining association rules through the a priori and GRI algorithms.
Finally, Chapter 11 covers model evaluation techniques, which belong to the CRISP–DM evaluation phase.
DISCOVERING KNOWLEDGE IN DATA AS A TEXTBOOK
Discovering Knowledge in Data naturally fits the role of textbook for an introductory course in data mining. Instructors may appreciate:
The presentation of data mining as a process
The "white-box" approach, emphasizing an understanding of the underlying algorithmic structures:
application of the algorithms to large data sets
The graphical approach, emphasizing exploratory data analysis
The logical presentation, flowing naturally from the CRISP–DM standard process and the set of data mining tasks
Discovering Knowledge in Data is appropriate for advanced undergraduate or graduate courses. Except for one section in Chapter 7, no calculus is required. An introductory statistics course would be nice but is not required. No computer programming or database expertise is required.
New Developments in Categorical Data Analysis for the Social & Behavioral Science by L. Andries Van Der Ark, Marcel A. Croon, Klaas Sijtsma (Quantitative Methodology Series: Lawrence Erlbaum Associates) Almost all research in the social and behavioral sciences, economics, marketing, criminology, and medicine deals with the analysis of categorical data. Categorical data are quantified as either nominal variables—distinguishing different groups, for example, based on socio-economic status, education, and political persuasion—or ordinal variables—distinguishing levels of interest, such as the preferred politician for President or the preferred type of punishment for committing burglary. New Developments in Categorical Data Analysis for the Social and Behavioral Sciences is a collection of up-to-date studies on modern categorical data analysis methods, emphasizing their application to relevant and interesting data sets.
A prominent breakthrough in categorical data analysis are latent variable models. This volume concentrates on two such classes of models—latent class analysis and item response theory. These methods use latent variables to explain the relationships among observed categorical variables. Latent class analysis yields the classification of a group of respondents according to their pattern of scores on the categorical variables. This provides insight into the mechanisms producing the data and allows the estimation of factor structures and regression models conditional on the latent class structure. Item response theory leads to the identification of one or more ordinal or interval scales. In psychological and educational testing these scales are used for individual measurement of abilities and personality traits. Item response theory has been extended to also deal with, for example, hierarchical data structures and cognitive theories explaining performance on tests.
Excerpt: The focus of this volume is applied. After a method is explained, the potential of the method for analyzing categorical data is illustrated by means of a real data example to show how it can be used effectively for solving a real data problem. These methods are accessible to researchers not trained explicitly in applied statistics. This volume appeals to researchers and advanced students in the social and behavioral sciences, including social, developmental, organizational, clinical and health psychologists, sociologists, educational and marketing researchers, and political scientists. In addition, it is of interest to those who collect data on categorical variables and are faced with the problem of how to analyze such variables—among themselves or in relation to metric variables.
Almost all research in the social and behavioral sciences, and also in economic and marketing research, criminological research, and social medical research deals with the analysis of categorical data. Categorical data are quantified as either nominal or ordinal variables. This volume is a collection of up-to-date studies on modern categorical data analysis methods, emphasizing their application to relevant and interesting data sets.
Different scores on nominal variables distinguish groups. Examples known to everyone are gender, socioeconomic status, education, religion, and political persuasion. Other examples, perhaps less well known, are the type of solution strategy used by a child to solve a mental problem in an intelligence test and different educational training programs used to teach language skills to eight-year old pupils. Because nominal scores only identify groups, calculations must use this information but no more; thus, addition and multiplication of such scores lead to meaningless results.
Different scores on ordinal variables distinguish levels of interest, but differences between such numbers hold no additional information. Such scores are rank numbers or transformations of rank numbers. Examples are the ordering of types of education according to level of sophistication, the choice of most preferred politician to run for president, the preference for type of punishment in response to burglary without using violence, and the degree in which someone who recently underwent surgery rates his or her daily quality of life as expressed on an ordered rating scale.
Originally, the analysis of categorical data was restricted to counting frequencies, collecting them in cross tables, and determining the strength of the relationship between variables. Nowadays, a powerful collection of statistical methods is available that enables the researcher to exhaust his or her categorical data in ways that seemed illusory only one or two decades ago.
A prominent breakthrough in categorical data analysis is the development and use of latent variable models. This volume concentrates on two such classes of models, latent class analysis and item response theory. These methods assume latent variables to explain the relationships among observed categorical variables. Roughly, if the latent variable is also categorical the method is called latent class analysis and if it is continuous the method is called item response theory.
Latent class analysis basically yields the classification of a group of respondents according to their most likely pattern of scores on the categorical variables. Not only does this provide insight into the mechanisms producing the data, but modern latent class analysis also allows for the estimation of, for example, factor structures and regression models conditional on the la-tent class structure. Item response theory leads to the identification of one or more ordinal or interval scales. In psychological and educational testing these scales are used for individual measurement of abilities and personality traits. Item response theory has been extended to also deal with, for example, hierarchical data structures and cognitive theories explaining performance on tests.
These developments are truly exiting because they enable us to get so much more out of our data than was ever dreamt of before. In fact, when realizing the potential of modern days statistical machinery one is tempted to dig up all those data sets collected not-so-long ago and re-analyze them with the latent class analysis and item response theory methods we now have at our disposal. To give the reader some flavor of these methods, the focus of most contributions in this volume has been kept applied; that is, after a method is explained, the potential of the method for analyzing categorical data is illustrated by means of a real data example. The purpose is to explain methods at a level that is accessible to researchers not trained explicitly in applied statistics and then show how it can be used effectively for solving a real data problem.
insert content here