Information

■Expression of explanatory variables and information in the smallest division unit

Example: I want to guess the sweetness from the color, size, weight, and variety name of strawberries.
The objective variable is sweetness, and the explanatory variables are color, size, weight, and variety name.
So far, we have discussed inference in the absence of "sweetened" data.
From now on, we also consider explanatory variables.
First, consider how a single strawberry is associated with specific color and sweetness information.
Information can be expressed in different ways even if the content is the same.
Example: "00001111" and "0000" "1111" "former followed by latter"
The two above show the same information.
Consider the case where the information is divided most finely.
Example: "1st bit is 0", "2nd bit is 0", "3rd bit is 0", "4th bit is 0"...
It can be divided like this.
The part expressing "0" is omitted, but a unit is necessary to explain what it indicates.
As an example, consider the case of video information.
(1 [mm vertical]) and (1 [mm horizontal]) and ( 1 [sec] ) → 1 [R]
(1 [mm vertical]) and (1 [mm horizontal]) and ( 1 [sec] ) → 1 [G]
(1 [mm vertical]) and (1 [mm horizontal]) and ( 1 [sec] ) → 1 [B]
In this way, the minimum division unit of information is expressed in the form of A→B.
A is a set of numbers and units.
B is a set of numbers and units.
Such a kind of disjunctive normal form is the normal form of information.
Multiple pieces of information are linked depending on whether or not A matches.
Numerical units are likely to contain a variety of information, but we only need information on whether or not they match.
A unit of numerical value may be a unique value for distinguishing other units. (For example, a long bit string randomly determined so as not to overlap).
The meaning and relationship of numerical units are expressed by a combination of multiple methods.
Example: 1[qwerty]=1[asdfgh], 1[qwerty]=1000[zxcvbn]
The numeric unit names don"t have to be meaningful, [qwerty] is known to be 1000 times the unit of [zxcvbn].
The value portion of the numerical value may be a binary number with 1 bit each, or may be a real number.
By changing the unit of the value part, it is possible to change the expression while keeping the same contents.
For example, all values ​​can be commonized as real numbers between 0 and 1.
Standard form: 0 to 1 (value), unique value (unit) and 0 to 1 (value), unique value (unit) and … → 0 to 1 (value), unique value (unit)
Any information can be expressed with "0 to 1" and "unique value".
In the brain, "0-1" might correspond to one state of a neuron, and "unique value" might correspond to which neuron.

■ "branch information", "through information" and "invalid information"

Calculating the amount of information may result in ∞.
For example, suppose you want to measure and compare the lengths of two rods.
Assume the use of an ideal measurement instrument with ∞ resolution.
Since the number of significant digits is ∞, the amount of information is ∞.
The lengths of the two bars are the same up to 5000 trillion digits and may differ by 5000 trillion digits.
It is not invalid information as it could be usefully used.
The usefulness of information depends on how it is used.
Example: If the length of a bar is more than 1m, give its length in units of 0.000001m
Example: x=1.114514; if(x>=1){return x}
The only information needed for program branching was the integer part.
However, since the information after the decimal point also requires a reply, it is not unnecessary information.
Therefore, the information used for branching is referred to as "branch information".
Information that is not used for branching but is necessary for answering is referred to as "through information".
When solving a problem, the amount of information of "through information" is meaningless.
Even if the amount of "through information" is ∞, there is no adverse effect on the theory of computation.
For example, in the Traveling Salesman Problem, suppose you are asked to answer city names in order.
Suppose a city name is long, 5000 trillion characters.
Computational complexity theory goes wrong unless we ignore the amount of information in city names as "through information amount".
Even if it is used for calculation, it is "through information" if the program does not branch.
Information that is not used for calculation at all is called "invalid information" and is distinguished from "through information".
If you just want to calculate the ratio of the lengths of two bars, the length information is through information.
Even if you do the four arithmetic operations, it only changes the way of expressing information, and the essential information itself does not change.
Even though 1÷3 becomes 3.33333333 and the number of digits increases, the amount of information does not increase.
Calculations other than conditional branching can be eliminated by optimizing the algorithm.
All you have to do is create a dictionary of answers to be given to the input information.
In order to search from the dictionary, it is necessary to branch as many times as the logarithm of the input information, with 2 as the base.

■Two-step inference by "through information"

The amount of information (entropy) for continuous values ​​cannot be calculated as for discrete values.
If we consider a continuous value to be a discrete value divided into infinitesimals, it becomes ∞.
However, I don't think that the continuous amount of information divided infinitely can be used effectively.
It can be expected that the amount of information in the extremely finely divided parts will be "through information".
For example, suppose we have a continuous probability distribution.
The probability distribution is assumed to be data predicted from a finite number of data.
For example, it may be a normal distribution.
However, there is a "nonparametric method" that considers only the order of data without assuming the distribution.
In that method, information other than ordering is ignored.
That is, the order of data is "branch information".
The distance between each data is "through information".
If there are 100 pieces of data, the range from -∞ to +∞ can be divided into 100, and the "nonparametric method" can be used to estimate which interval the target of estimation is most likely to be in.
However, the "non-parametric method" considers only the ranking, so it is said that the prediction ability is inferior.
Therefore, after estimating by the "non-parametric method", the second stage of estimating is performed.
By using the information set as "through information", it is possible to further narrow down the area within the section.
It is possible to take advantage of "nonparametric method" and "parametric method".