Consider the simplest example of inductive inference.

Example 1. Answer the result of the next coin toss. No information on past results is available.

"Unknown" is correct because the coin may be neither heads nor tails.

Example 2. Answer the result of the next coin toss. Suppose a coin only has heads or tails.

"Heads or tails" is correct.

Although there is only 0 points of past coin toss information, the possible value information narrows down the answer rather than "unknown".

50% for "heads" and 50% for "tails" is wrong.

Because that's the answer for Example 3. The information you have is different if you know it is a uniform distribution and if you don't know it.

Example 3. Answer the result of the next coin toss. Suppose a coin only has heads or tails. Assume that the front and back are uniformly distributed.

It is a mistake to use the "principle of indiscrimination" in inference. The correct usage is shown below.

Example 4. Answer the result of the next coin toss. Suppose a coin only has heads or tails. You must answer numerically the probability of heads.

"Solving a problem" was defined as identifying the correct answer, regardless of how it was answered to the questioner.

If the answerer's mind understands the correct answer of "heads or tails", even if he or she cannot help but give the wrong answer of "50%", the problem is solved.

Example 5. Answer the value of X.

Example 5 is one of the simplest examples of inductive inference.

Expressing the answer as a probability distribution, "Unknown" = 100%.

Just saying that there is a state of "unknown" does not require an "axiom of induction" here.

If there is no information, it is "unknown", but if there is information within the range of numerical values, it will be the result of guessing.

For example, it is narrowed down to "one of all real numbers" instead of completely "unknown".

Consider a case in which only 1-point of information is given that can be used for inductive inference.

We call this one piece of information “perfect quality” in the sense that it is okay to use it for this inductive inference.

Example 1. Draw a lottery with a number written on it. Last time it was "5", answer the next lottery number.

It's too rough to say that this lottery is 100% "5" because 1 out of 1 times in the past, that is, 100% was "5".

The sample variance is 0, but the unbiased variance is ∞.

However, if it is marked as "unknown", the information of "5" will not be used at all.

Intuitively, it is also possible that all lotteries have the same number "5" written on them.

Also, the odds of an unrelated number like "114514" would be relatively low.

Intuitively, numbers closer to "5" are more likely than "114514".

It seems reasonable intuitively that the closer the numbers, the higher the possibility.

However, we can only say that the probability of "6" is higher than "7", and we cannot say what percentage each is.

"5" is the most likely, but I can't say what percentage.

It's a two-way choice, whether it's the same number or not, so it's quick to say that "5" is 50%.

It is also necessary to consider the probability of duplication of numbers.

Assuming that the numbers do not overlap, the probability is 50% less than "5" and 50% greater than "5".

If the result of the lottery is only two points, "1" and "3", and there is no duplication of numbers, you will intuitively feel that the next probability is "2".

However, in the case of "0" and "10000", some people may feel that the probability is higher around "0" and "10000" than "5000".

At this time, if it is divided into "<0", "0-10000", and ">10000", it can be said that the probability of "0-10000" is the highest.

Here, no matter what the two numbers are, the probability of the 3 intervals does not change.

If you measure the distance of the second point relative to the first point, there is only one valid distance. If there is only one numerical value of the same unit, there is no comparison target. "1-3kg" and "0-10000m" cannot be compared, so the result does not change.

There are two ways to discuss the probabilities of the middle interval "0-10000".

In the first method, for each of the two points, the inference for the case of one point is performed, and the weighted average is calculated by 50% each.

For the first point: "<0" = 25%, ">0" = 25%.

For the second point: "<10000" = 25%, ">10000" = 25%.

Here, I would like to divide the second point "<10000" (=25%) into "<0" and "0-10000".

"0-10000" is closer to "10000" than "<0", so the probability is high, but I don't know what percentage it is.

Considering the most biased case from the even case, "<0" is 12.5 to 0% and "0-10000" is 12.5 to 25%.

Similarly, the first point ">0" (=25%) is also divided into "0-10000" and ">10000".

Result, "<0" = 37.5-25%, "0-10000" = 25-50%, "<10000" = 37.5-25%.

The second method considers the frequency of points within the interval.

If the first point is taken as the reference point, the only valid point is the second point.

The frequency of points in the range "0-10000" is 1/1=100%, including the borders.

Even so, it cannot be said that the third point drawn next is also 100% in "0-10000".

The probability that the third point will enter "0-10000" is 0-100%.

Considering the second and third points, the frequency of entering "0-10000" is 1/2-2/2=50-100.

Result, "<0" = 25-0%, "0-10000" = 50-100%, "<10000" = 25-0%.

One result that satisfies two calculation results is determined.

Result, "<0"=25%, "0-10000"=50%, "<10000"=25%.

Integrate calculation methods for 0, 1, and 2 points.
"Perfect quality" refers to the assumption that n points of data are equally probable and can be used for inductive inference.

For example, if the data of different people tossing a coin is mixed, it is assumed that all the data are used without considering which person's data to use.

When the number of points that can be used for induction is n, and the interval is divided by n points, the probability of the next point is as follows.

-∞ to minimum: {1/(n+1)}/2

Maximum to +∞: {1/(n+1)}/2

Each of the other n-1 intervals: 1/(n+1)

You can think that +∞ and -∞ are connected, and that they are evenly distributed over all intervals.

Within the interval, the closer to the boundary line, the higher the probability.

Example 1: Past data is "1" "2" "2" "3"

Let's try to calculate "inductive inference of perfect quality n-point non-overlapping numbers".

Result: "-∞ to 1" = 1/8, "1 to 2" = 1/4, "2 to 2" = 1/4, "2 to 3" = 1/4, "3 to +∞" = 1/8

In the form of "2 to 2" = 1/4, it expresses the possibility that a duplicate value of "2" will appear with pinpoint accuracy.

Even if only the size relationship of the data is known, it can be inferred without any problem by "inductive inference of perfect quality n-point non-overlapping numerical values".

In this method, values are arranged in ascending order, so all that is required is to be able to arrange them.

For example, when guessing the three values R, G, and B, you can guess without any problem with "inductive inference of perfect quality n-point non-overlapping numbers".

Each should be considered and calculated separately.

Example 1: Past data is "1" "2" "2" "3". Constrain to integers only.

Let's try to calculate "inductive inference of perfect quality n-point non-overlapping numbers".

Result: "-∞ to 1" = 1/8, "1 to 2" = 1/4, "2 to 2" = 1/4, "2 to 3" = 1/4, "3 to +∞" = 1/8

Here, "1 to 2" = 1/4 does not claim to be a uniform distribution.

However, "1 to 2" can only take "1" or "2", and since it is symmetrical, it can be divided into 1/8.

Result: "-∞ to 1" = 1/8, "1" = 1/8, "2" = 1/2, "3" = 1/8, "3 to +∞" = 1/8

Example 2: Past data is "1" and "3". Constrain to integers only.

Result: "-∞ to 1" = 1/4, "1 to 3" = 1/2, "3 to +∞" = 1/4

Here, we don't know whether "2" is more likely than "1" or "3".

If all distribution shapes are assumed and averaged, a uniform distribution is obtained.

In practice, without knowledge of the shape of the distribution, it is inevitable to make some assumptions.

However, we must be able to distinguish between what is decided by inductive inference and what is decided by assumption.

Example 1: Past data is "1" "2" "2" "3". Constrain to only integers between 1 and 3.

Let's try to calculate "inductive inference of perfect quality n-point non-overlapping numbers".

Result: "-∞ to 1" = 1/8, "1 to 2" = 1/4, "2 to 2" = 1/4, "2 to 3" = 1/4, "3 to +∞" = 1/8

When discretized within the effective range, it becomes as follows.

Result: "1" = 1/4, "2" = 1/2, "3" = 1/4

In this case, it seems intuitively correct because the guess is equal to the sampling distribution.

Example 2: Past data is "1". Constrain to integers between 0 and 1 only.

Assuming no constraints, it looks like this:

Result: "-∞ to 1" = 1/2, "1 to +∞" = 1/2

When discretized within the effective range, it becomes as follows.

Result: "0" = 1/4, "1" = 3/4

In this case the guess is not equal to the sampling distribution. This is because although "0" is not observed, it cannot be said to be 0%.

The result above is equivalent to the two-point distribution of the observed value "1" plus the second unobserved point "unknown".

If "unknown" = 1/2 is transformed assuming a uniform distribution, the result above is that it is distributed to "0" = 1/4 and "1" = 1/4.

As much as possible, it is desirable to keep the value "unknown" as it is.

Example: Past data is "dog" "cat" "cat" "pig"

Let's try to calculate "inductive inference of perfect quality n-point non-overlapping numbers".

Result: "-∞~dog" = 1/8, "dog~cat" = 1/4, "cat~cat" = 1/4, "cat~pig" = 1/4, "pig~+∞" = 1/8

If there are only three possible values, "dog", "cat" and "pig", the result is the same as "valid range" example 1.

Result: "dog" = 1/4, "cat" = 1/2, "pig" = 1/4

same as the sampling distribution.

Here too, as in example 2 of "effective range", it can be interpreted as a state in which "unknown" is added to the sampling distribution.

In this case, "unknown" = sampling variance, so it can be said that the result is sampling distribution + "unknown" = sampling distribution.

"unknown" is interpreted as a uniform distribution of "dog" "cat" "cat" "pig" rather than "dog" "cat" "pig".

Stop thinking which is the same value and think of them all as different values.

For example, it is strange that simply combining "dog" and "cat" into "pet" increases the probability of "pig".

Now consider the case where the effective range is not limited.

"dog ~ cat" = 1/4 refers to dogs, cats, and anything in between. However, the probability is higher at both ends.

Since the data were just arranged randomly, not in ascending order, there should also be "pig-dog".

Furthermore, there should be a mixture of "dog", "cat" and "pig".

In the case of continuous values, values that can be expressed by mixing the given numerical values except for both ends.

Both ends "-∞~,~+∞" are values that cannot be expressed without mixing values that are not given.

The same can be said for the nominal scale.

Result: "dog, cat, pig and mixtures thereof" = 3/4, "mixtures of dog, cat, pig and other values" = 1/4

Consider the case of unique values, such as serial numbers, where the same value does not appear.

A value that has never appeared in the past is described as "new".

Example 1: Historical data is "cat"

If it's not a unique value, then guess that 'cat' is more likely than 'new'.

This agrees with the idea of the maximum likelihood method.

But for unique values, "cat" = 0% and "new" = 100%.

If no information is given as to whether it is a unique value, you have to guess for yourself.

Examples: "dog" "cat" "cat" "pig"

If we replace the first occurrence of the value with "new", we get:

"new" "new" "cat" "new"

Using this as a sampling distribution, add "unknown" as the fifth point.

"new" "new" "cat" "new" "unknown"

In this way the probability of the first occurrence of a value can be predicted.

However, this is just one example of the simplest idea.

The more data you have, the more chances you have of overlapping by chance.

The first point is always unique, so it should be calculated without one "new".

A unique value is whether or not it overlaps with other values, and is a property that spans multiple values.

Other multivalued properties are possible.

For example, if multiple values are arranged at regular intervals, it can be predicted based on them.

An infinite number of such regularities can be assumed.

For example, numbers can be assigned by arranging them in ascending order.

There may be a rule "guessed value = some function f (number in ascending order)".

The number can be said to be an explanatory variable.

Here, we consider the case where there is no explanatory variable as "perfect quality".

In the case where there are explanatory variables, which will be considered later, such assumptions are also handled.

As a characteristic of individual data, for example, even numbers are likely to appear.

There are countless such regularities.

For example, a value that satisfies "even" can be interpreted as having an explanatory variable of "even"="True".

Here, we consider the case where there is no explanatory variable as "perfect quality".

In the case where there are explanatory variables, which will be considered later, such assumptions are also handled.

First, consider the situation where there is no data that can be used for induction, and there is only one data, [unknown].

Add data that can be used for induction here one by one.

Example: Adding a discrete value "1" gives "1" = 1 and "unknown" = 1

Example: Adding the nominal scale "cat" results in "cat" = 1 and "unknown" = 1

Example: If you add a continuous value "1.0", "-∞ to 1.0" = 1, "1.0 to +∞" = 1

For continuous values, interpret "unknown" = "-∞ to +∞" and divide before and after the added value to increase the number.

There are n+1 data including "unknown" for n data to be used.

Each n+1 piece of data is assumed to have a probability distribution of 1/(n+1).

However, among the individual data, the closer to the boundary, the higher the probability. This is only related to size.

In practice, "unknown" should be converted assuming a uniform distribution, but should be kept as "unknown" as much as possible.

Note that this inference method does not infer properties across multiple values.

For example, it does not guess whether it is a unique value that does not overlap or whether multiple values are arranged at regular intervals.

Even if it doesn't span multiple values, it doesn't even guess whether it's "even".

Whether or not it is "even" can be interpreted by considering the explanatory variable "even".

Here, "perfect quality" means that only the objective variable is given and no explanatory variables are given.

Since there are no explanatory variables, the assumption is that all data are of perfect quality and are safe to use.

Based on this inference method, it is necessary to extend it to the case where there are explanatory variables.