Complex Inference

■ Non-overlapping repeating units

For example, when inferring the color of a still image, an infinitesimal area is the repeating unit.
If "white, black" repeats and is expected to repeat next time, one cycle is the repeat unit.
In the case of moving images, "area x time" can be used as a repeat unit.
You can also ignore the "time" and compare the color of the "area" at different times in the video.
The repeating unit can be freely determined as needed.
Consider the conditions that the repeating unit must satisfy.
In the first place, it can be said that the repeating unit statistically infers "noise".
When making inductive inferences about the sweetness of strawberries, it is best to collect samples of strawberries that are as similar in variety, size, and color as possible.
By examining the sample, it is possible to predict the component of sweetness other than "variety, size and color" due to "noise".
Here, it is assumed that strawberries from the same individual have exactly the same sweetness no matter which part is eaten.
At that time, it is not permissible to cut one strawberry in two and assume that there are two specimens.
This is because it is assumed that the same individual has the same sweetness, that is, the same "noise".
Consider the case where there is no assumption that if the individual strawberries are the same, the sweetness is also the same.
"One whole strawberry" and "one strawberry center" are not the same in sweetness, but don't count them as two specimens.
If the noise of one sample is determined, even if the noise of other samples is partially determined, it cannot be counted as one sample.
If the moving average has overlapping intervals, it cannot be counted as one sample.

■Repeating unit size

A repeating unit must always have a "size".
Consider the case of predicting Y for a given X from a plot on the XY plane.
For example, define a size of "infinitely small" for each plot.
At that time, the size of the prediction object is also "infinitely small".
It must not be 0, as it is a ratio of "size" and infers a probability distribution.
Consider the case of using images from the optic nerve as data.
It should not be treated as the color of a certain X coordinate, a certain Y coordinate at a certain moment.
Make sure that there is a "size", such as the average color value of a certain area from the previous time to the current time.
Consider the high-level case, such as one strawberry, rather than the low-level one, such as one pixel.
Both "large strawberry" and "small strawberry" can be interpreted as the same "size (one piece)".
Because the areas of the images do not overlap, it is recognized as "one" without overlapping.
Even if there is a difference in the area of ​​the image, it is recognized as the same "one piece".
The area of ​​the image is only used to check for overlap.
As another example, consider a case where a certain still image is divided into bright portions and dark portions by a certain threshold. Suppose we have two bright circles.
If the two circles are far apart, each can be counted as one sample.
As long as it can be distinguished in some way, the area of ​​the image used for the distinction is irrelevant.
Even high-level concepts can be assumed to be one sample, as long as they can be distinguished.
Conditions such as "torso, head, two arms, and two legs" can be assumed to be human repeat units.

■ Repeating unit of objective variable and repeating unit of explanatory variable

Assume that all input information is assigned a "value" for some "size".
For example, "size" is the area of ​​the visual XY coordinates, and the value is "color".
Not only objective variables, but all explanatory variables are "values" assigned to "sizes."
For example, suppose the sweetness of strawberries is the objective variable and the color of the stem of the strawberry is the explanatory variable.
As an image, only part of the area of ​​the strawberry is the area of ​​the stem of the strawberry.
It can be said that strawberries have feature amounts of "sweetness" and "color of strawberry calyx".
However, strictly speaking, "strawberry" and "strawberry stem" have different areas.
In other words, the objective variable and the explanatory variable need not have the same properties.
For example, it can be hypothesized that the sweetness of the nth strawberry eaten is related to the nth numerical value written in a book.
It is only necessary to assume a correspondence relationship between the “repeating unit of the objective variable” and the “repeating unit of the explanatory variable”.
However, if it is not injective, sufficient inference cannot be made.

■Conversion of information by deduction

Consider the information processing of the visual cortex of the brain as a reference.
Information representing contrast can be obtained from the difference in color between two points.
A continuous portion of contrast is recognized as a line.
Recognize the positional relationship between lines and the color of the part surrounded by lines.
Inferring the contrast from the color difference between two points is deductive rather than inductive.
Even if the purpose is to perform inductive inference, deductive inference is necessary as preprocessing.
To acquire a higher concept, we must first assume it by deduction.

■Multiple prediction targets

In the visual cortex of the brain, information in the lower layers is learned to explain in the upper layers.
For example, the lower layer learns the color of each pixel.
Once the upper layer learns that the interior of a rectangle is a certain color, the lower layer no longer needs to learn pixel by pixel.
Express the information in the upper layer as much as possible, and express the information in the lower layer as much as possible.
It can be said that the information is compressed.
Information compression also serves to reduce the number of combinations and make it easier to find similar patterns.
Even if you see an unknown image, you will still be able to recognize it in the upper layer.
Irregular image parts are unavoidably recognized as the color of each pixel in the lower layer.
Even when making hypotheses with inductive inference, it is not necessary to make an inference for each pixel.
If possible, you can make a hypothesis that predicts a certain range collectively.
However, for each pixel, we should adopt the best possible inference result.
As an example, consider the case of recognizing the color of a chocolate chip cookie.
First, it is possible to make inferences such as "brown: 90%, black: 10%" for the entire circle.
However, if the position of the chocolate chip can be specified, the inference of "black: 100%" is preferentially adopted only there.
Furthermore, overwrite the other parts with the inference "brown: 100%".

■Number of parameters

The brain behaves like it compresses information.
Information compression can be interpreted as representing information with fewer parameters.
However, even if the number of parameters cannot be reduced, it is not necessarily worthless.
For example, consider the case of inferring by drawing a straight line between two points on the XY coordinates.
At this time, both "Δx, Δy" and "slope, intercept" are two valid parameters.
Only the wording has changed.
However, if you draw a line, you will be able to do both interpolation and extrapolation, and you will be able to make inductive inferences about what you are predicting.
It can be said that "points" are used as repeating units and are arranged continuously to form "lines".
It can be used for inductive inference as long as a hypothesis is made so that the prediction target corresponds to the repeating unit.
However, the number of parameters cannot be ignored.
Due to the number of parameters, the "bias" of matching by chance must be considered.

■Inference of known information

When the brain guesses what it will look like in the next moment, it is "inductive" because it infers unknown information.
However, when recognizing what we see now, we focus on known information.
At that time, although it is known information, try to think that you are doing the same inductive inference as for the unknown case.
For example, suppose you see a black straight line.
On the extension line of the straight line, it can be inductively inferred that the black probability is high.
If it is black, it is inferred that the probability is high, and the result is black, so it can be said that the selection entropy is small.
Consider a case where a straight line cannot be recognized.
Suppose that we could only infer "unknown" for all minute areas.
In that case, the selection entropy is large.
"Recognition" interprets known information as a more inevitable result rather than a random result.
If the purpose given to AI is to maximize future rewards, past and present "recognition" is not directly necessary.
However, if the repeating unit includes the future as a target, it can be used to speculate on the future.
It is the value of "recognition" that is used as a reference for selecting hypotheses to be used for future inductive inference.
If the computation can only be done with a single thread, it would be efficient to determine the "prediction target" and then perform only the "recognition" that can be used for that prediction.
However, if it can be calculated in parallel like the brain, it is enough to "recognize" as much as possible without worrying about whether it can be used in the future.
The brain as a whole is unlikely to be the most efficient sequential process that maximizes some reward.
Both ``known information'' and ``unknown information'' are equally likely to be interpreted as trying to explain more consequentially than chance.
It can be interpreted that we are trying to minimize the selection entropy received in the future, that is, to minimize the selection entropy.
However, in order to calculate entropy, we need to be able to determine whether predictions and observations "matched".
When there is a slight difference between the predicted value and the actual measured value, it is difficult to regard it as "matching".
It is necessary to substitute Wasserstein distance or the like as the degree of "match" instead of whether or not there is a "match".
Also, in order to make a correct guess, we have to consider "bias" in addition to probability and entropy.
Even if you estimate that the predicted value and the actual value are close and calculate that the average entropy is small, if the guess is random, you will receive a large selection entropy.
The "average entropy" and the "expected value of the selection entropy actually received" match when there is no bias and the inference is correct.
Since the "bias" of thinking performed by the brain is not 0, simply imitating the brain does not guarantee the optimal solution.

■How to determine repeating units by clustering

In order to predict a certain feature that "cat" has, the repeating unit "cat" should be used as a sample.
In order to determine the repeating unit "cat", we must define what characteristics are necessary.
A precise definition requires knowing the nature of what people commonly call a "cat".
Even if you don't know the definition, you can distinguish "cat-like things" by clustering similar things together.
For clustering, it is only necessary to know the "distance".
However, it is not clear which features of the animals are closer to each other and which species are closer.
The optimum "distance" must be determined by learning or the like.
If you want to distinguish "species", you can learn that the farther apart the species are, the farther apart the features are.
If you don't want to know the species, but want to know a certain characteristic, use the distance.
In other words, the smaller the distance of the objective variable, the smaller the distance of the explanatory variable.
Moreover, it is strange to treat the vicinity of the boundary of the cluster and the vicinity of the center of the cluster exactly the same.
At the boundary, it is more robust to assume that 50% belong to each cluster on both sides.
A "cluster" corresponds to a "set of repeating units", and the degree of certainty of belonging corresponds to a biased "unknown" ratio.

■Scope and properties

A person can recognize a "human being" by looking at an image of a "person in clothes" cut out from a photograph.
However, since "clothes" are not human, they recognize that they are "almost human" to be exact.
On the other hand, when I try to cut out "pure human beings" from photographs, I wonder if "clothes" should be included.
Here, "human" is a "property" that a certain "area" of an image has.
Accurately determining the "range" makes it difficult to accurately determine the "nature".
If the "property" is precisely determined, it becomes difficult to accurately determine the "range".
This "property" is a condition for determining the repeating unit.
For example, when we want to make inductive reasoning about "human", we take the "range" with the property "human" as a sample.
The sample size need not equal the area of ​​the image.
You can freely set the conditions for the nature of "human" to count as one person.
Until the recognition of "human beings" is solidified, it is impossible to cut out the range of "human beings" from an image.
Therefore, it is difficult to determine the definition of the repeating unit "human".
Therefore, instead of collecting "repeating units", first, appropriately cut out "ranges" are collected.
Then, the collected "ranges" are evaluated to see if they are repetitive.
Each 'range' has an "unknown" value that indicates the degree to which it is not valid for repeated samples.
Tweak the "range" to improve the accuracy of inductive reasoning.
There is a trade-off between "bias", "variance" and "explainable range in the image".