I am going to write this up "formally" please offer input as you see fit. This is only the first section of many.
Information and Entropy in Mathematical and Physical Systems
Abstract: We consider the relationship of entropy between physical and mathematical systems. These basic concepts of information theory and statistical mechanics are frequently misunderstood or conflated, and clarifying the relationship allows further insight into the nature of the information content of physical systems. We demonstrate the relationship between physical entropy and mathematical entropy and consider the probability of physical systems as having meaning independent of the entropy of the physical system. This result has potential application in understanding the evolution of physical systems at specific points in time.
The representation of information was first formalized by H. Nyquist who introduced the notion of the information content of a symbol space as the log of the size of the symbol set. [1] He termed this the “maximum speed of transmission of intelligence” for transmitting knowledge from a sender to a receiver. In this formulation, if there are 64 symbols the log base 2 is equal to 6 bits of information. The choice of the base of the log is related to the nature of the transmission method- a signal supporting 3 levels would use the log base 3 and require fewer independent signals to represent the same symbol space.
(* this last sentence is technically only true for larger sizes of symbol sets)
C. Shannon further developed this work to define the information content of a source transmitting to a receiver. Shannon defined the information contained within a message to be equal to the logarithm of the inverse of the probability of the message being generated.
I sub SR = Log ( 1/p) or equivalently I sub SR = - Log(p)
This representation of information will be denoted as I sub SR or the “Symbol Represented message information” value. The log is taken to transform the space size into the number of signals required to represent the signal, as demonstrated by Nyquist. If we are not going to transmit the information across a channel, we can define the information to be the inverse of the probability of the message, or
I sub I = 1/p
This representation of information will be denoted as I sub I or the “Improbability of message information” which mathematically represents the less probable message as having more information.
In the same paper, Shannon further defined the average amount of information of a message which provides the channel capacity of a transmission line. The average amount of information of a message is derived to be:
H = - Sum (from 0 to i) probability (message sub i) times Log probability(message sub i)
Unfortunately for history and the understanding of physical systems, Shannon chose the term “Entropy” to represent this average information content of a message.