As usual, Kostja has forgotten some details. During the conversation statisticians make several assumptions not explicitly written out in the formula:
1) they model binary variable with Bernoulli distribution
2) they fix uniform prior to theta
As a result, they actually compute a conditional distribution
P(x=1| we use Bernoulli distribution with uniform prior on parameters to model x)
The latter can be quite different form
P(x=1|their initial knowledge which they did analyse enough).
In particular, there are situations where
P(x=1|their initial knowledge which they did analyse enough) = 0.1
but you can still later use Bernoulli model with uniform prior, although
P(x=1|their initial knowledge which they did analyse enough) < P(x=1| we use Bernoulli distribution with uniform prior on parameters to model x)
Indeed, the whole trick here is just a (very cunning) notation abuse. However I do not like your attempt at explaining it.
For example, there really is no problem in assuming the Bernoulli distribution for a binary random variable - there simply are no other options. The problem occurs when you claim the uniform prior. Although seemingly equivalent to the statement "We don't know anything about X" it is not. In the first case we have an undefined distribution (essentially, an infinite set of candidate random variables X), whereas in the second case - a well defined one (and thus a single well-specified random variable X). Types don't match.
And I didn't even understand your last part about "a cooler trick". Explain it in simple terms and we'll have an episode 2 out soon 😉
Yes, as long as being in a Bayesian world comes with the requirement that all distributions be specified. There is probably nothing wrong with that - after all, the uniform prior is also a reasonable way to formalize the lack of knowledge. But the fact that this formalization is subtly different from the straightforward "we don't know" can be very confusing once it surfaces, especially if it does so in a somewhat more involved context (as was the case in one discussion, which inspired this post).
It's nice to meet you here, I know your blog! 🙂
> Yes, as long as being in a Bayesian world comes with the requirement that all distributions be specified
Well, that's the impression I get. I remember reading some Bayesian resolution of two-envelopes paradox, and the author wrote something along the lines of paradox happening because proper priors are not specified. (specifying priors resolves some versions of the paradox)
Ahh... the two-envelopes. I guess there's a version of this evil puzzle capable of confusing anyone, be it a bayesian, a frequentist or just a logician.
As usual, Kostja has forgotten some details. During the conversation statisticians make several assumptions not explicitly written out in the formula:
1) they model binary variable with Bernoulli distribution
2) they fix uniform prior to theta
As a result, they actually compute a conditional distribution
P(x=1| we use Bernoulli distribution with uniform prior on parameters to model x)
The latter can be quite different form
P(x=1|their initial knowledge which they did analyse enough).
In particular, there are situations where
P(x=1|their initial knowledge which they did analyse enough) = 0.1
but you can still later use Bernoulli model with uniform prior, although
P(x=1|their initial knowledge which they did analyse enough) < P(x=1| we use Bernoulli distribution with uniform prior on parameters to model x)
This is an even more cooler trick
Thank you, captain! 😉
Indeed, the whole trick here is just a (very cunning) notation abuse. However I do not like your attempt at explaining it.
For example, there really is no problem in assuming the Bernoulli distribution for a binary random variable - there simply are no other options. The problem occurs when you claim the uniform prior. Although seemingly equivalent to the statement "We don't know anything about X" it is not. In the first case we have an undefined distribution (essentially, an infinite set of candidate random variables X), whereas in the second case - a well defined one (and thus a single well-specified random variable X). Types don't match.
And I didn't even understand your last part about "a cooler trick". Explain it in simple terms and we'll have an episode 2 out soon 😉
Nice one 🙂
Basically what it comes down to is that there's no way of saying "I don't know the distribution of X" in Bayesian world.
Yes, as long as being in a Bayesian world comes with the requirement that all distributions be specified. There is probably nothing wrong with that - after all, the uniform prior is also a reasonable way to formalize the lack of knowledge. But the fact that this formalization is subtly different from the straightforward "we don't know" can be very confusing once it surfaces, especially if it does so in a somewhat more involved context (as was the case in one discussion, which inspired this post).
It's nice to meet you here, I know your blog! 🙂
> Yes, as long as being in a Bayesian world comes with the requirement that all distributions be specified
Well, that's the impression I get. I remember reading some Bayesian resolution of two-envelopes paradox, and the author wrote something along the lines of paradox happening because proper priors are not specified. (specifying priors resolves some versions of the paradox)
Ahh... the two-envelopes. I guess there's a version of this evil puzzle capable of confusing anyone, be it a bayesian, a frequentist or just a logician.
I'm still collecting courage to blog on that 🙂