Consider the following excerpt from a recent article in the British Medical Journal:
Mike has only two children, and they are called Pat and Alex, which could equally be boys’ or girls’ names. In fact, Pat is a girl. What is the probability that Alex is a boy?
a 50%
b Slightly less than 50%
c Slightly more than 50%
d Between 60% and 70%
e Between 40% and 30%d—Although this could be about the relative popularity of ambiguous names for boys and girls or about subtle imbalances in the sex ratio, it is not meant to be. The clue to the correct answer is in thinking about what we do not know about the family and what we do know already, and applying this to the expected probabilities of girls and boys.
We do not know if Pat was born first or second. We do know that there are only two children and that Pat is a girl. I am assuming that in the population, 50% of children are girls.
The birth order and relative frequency of two child families are: boy, boy (25%), girl, girl (25%), boy, girl (25%) girl, boy (25%). We know Mike’s family does not have two boys, since Pat is a girl, so we are only left with three choices for families with at least one girl. Two of these pairs have a boy and one does not. Hence the probability that Alex is a boy is two thirds or 66.7%.
If we had been told that Pat was born first then the probability that Alex is a boy drops to 50%.
The well-known "Boy or Girl" paradox that is referenced in the fragment above is probably as old as the probability theory itself. And it is therefore quite amusing to see an incorrect explanation for it presented in a serious journal article. You are welcome to figure out the mistake yourself.
For completeness sake, here is my favourite way of presenting this puzzle:
In the following, let Mike be a randomly chosen father of two kids.
- Mike has two kids, one of them is a boy. What is the probability that the other one is a girl?
- Mike has two kids, the elder one is a boy. What is the probability that the other one is a girl?
- Mike has two kids. One of them is a boy named John. What is the probability that the other one is a girl?
- I came to visit Mike. One of his two kids, a boy, opened the door to me. What is the probability that Mike's other child is a girl?
- I have a brother. What is the probability that I am a girl?
- I have a brother named John. What is the probability that I am a girl?
You can assume that boys and girls are equiprobable, the births of two kids are independent events, a randomly chosen boy will be named John with probability p, and that a family may have two kids with the same name.
If you haven't tried solving these yet, give it a try. I'm pretty sure you won't do all 6 questions correctly on the first shot.
> The birth order and relative frequency of two child families are:
> boy, boy (25%), girl, girl (25%), boy, girl (25%) girl, boy (25%).
> We know Mike’s family does not have two boys, since Pat is a girl,
> so we are only left with three choices for families with at least
> one girl. Two of these pairs have a boy and one does not. Hence
> the probability that Alex is a boy is two thirds or 66.7%.
Why is the event of Alex being a boy treated as dependent on the other event, that Pat is a girl? Or rather why is it incorrect to treat them as independent, and therefore having 50/50 for Alex's gender? Or rather why is the order of Alex's and Pat's birth treated as important?
Well, because the author of the article knows that boy-girl families are more frequent than girl-girl families, and he wanted to show that knowledge off.
Thus, the answer to your question "why is the order of Alex's and Pat's birth treated as important" is "but why not consider the order?". There is no problem in that.
Now you go figure where the actual mistake is. 🙂
The odds are 50% (assuming 50% of all children are girls).
First off, the problem is misstated from what was intended. By saying "Pat is a girl," the ordering (most people choose to use age for ordering, but any random factor will do) became irrelevant. The odds that Alex is a girl are equal to the odds that any random child is a girl.
Ordering is usually used just to count the various cases properly, but you have to consider all possibilities when you do. The author didn't. Here is the correct solution to the problem as stated, by the ordering method: There are eight equally-likely family compositions:
(1) Pat is the older boy, Alex is the younger boy.
(2) Pat is the older boy, Alex is the younger girl.
(3) Pat is the older girl, Alex is the younger boy.
(4) Pat is the older girl, Alex is the younger girl.
(5) Alex is the older boy, Pat is the younger boy.
(6) Alex is the older boy, Pat is the younger girl.
(7) Alex is the older girl, Pat is the younger boy.
(8) Alex is the older girl, Pat is the younger girl.
We can eliminate 1, 2, 5, and 7 since we know Pat is a girl. In the four remaining cases, there are two (4 and 8) where Alex is a girl, and two (3 and 6) where Alex is a boy. The odds are 50%.
The problem is usually stated "At least one of Pat and Alex is a girl." So it could be either Pat, or Alex, or both. But the correct answer is still 50%. The event that defines the sample space in this case is "you found out that that at least one is a girl," not "there is at least one girl." The difference is that in half of the cases where one is a boy and one is a girl, you will (see note below) find out that at least one of Pat and Alex is a BOY. So while 50% of all families have "at least one girl," in only half of those families will "you find out that at least one is a girl." And in all of the 25% where there are two girls, you will "find out that at least one is a girl." The answer is (50%/2)/(50%/2+25%) = 50%.
(Note: this presumes that you will find out "at least one is a girl" and "at least one us a boy" with equal likelihood. If the way you find out is biased one way or the other, you will find out one of those facts only when both children are the same gender. But the problem doesn't tell you which way the observation is biased; the answer is just as likely to be 0% as 67%. And if you weight those two possible answers by the 50% probability of getting them? The answer that Alex is a boy is again 50%.)
To test this out, and get away from preconceived notions, have a friend flip two coins and tell you either "at least one is heads" or "at least one is tails." What are the odds you got a heads and a tails? Regardless of what your friend actually says each time, this is the same problem. If the answer is always 2/3, rgardless of whether your friend says heads or tails, then it means you get a heads and a tails 2/3 of the time.
As for the first part of your comment, yes, you are completely right. I would explain the misconception in the same way.
As for the second part, I find it arguable, because what you basically present is an alternative, "tricky" interpretation of the phrase "you found out", where you formalize "found out" as "checked a random kid and detected".
Now the possibility of multiple interpretations for "found out" is quite insightful indeed. However, even if I would state the problem using the verb "found out" (which I, luckily, didn't, phew!), I would still not mean it as a linguistic puzzle. I would expect the reader to interpret "you found out" naturally, as "it is given that". Otherwise, as you yourself noted, the reader would have to make unnecessary additional assumptions about this "process of finding out". For example, what's wrong with the 100% biased version of a "found out" process, where you just call out: "hey, if one of you is a girl, could you please come here?"
The reasoning that you present would fit as a solution for the problem stated as follows: "Mike has two kids, Pat and Alex. You checked the sex of a randomly chosen kid (sounds creepy, doesn't it) and found out that it was a girl. What is the probability ...". And then, this is just a clumsier version of the formulation nr. 4 in the post.
Thanks for your comment, it's nice to see someone reads this, after all!
An "event" in probability theory is not a specific outcome, it is the set of possible outcomes under the same set of circumstances. It cannot be defined by a partial observation. If the facts do not completely describe the outcome, as is the casew with "at least one is a girl," then you have to handle the all of the possibilities that could have happened as well. In this case, that means how you decided to handle the case of a boy and a girl, when you will only mention one.
In order for "at least one is a girl" to be observed in 75% of all cases, you have to have decided before you said anything that you would always say "at least one girl" whenever there was a boy and a girl. If that is true, then saying "at least oen is a boy" means it is 100% certain that both are boys.
What I am saying is that one of these two statements must be false:
(1) The answer to "At least one is a girl; what are the odds there is a boy and a girl?" and "At least one is a boy; what are the odds there is a boy and a girl?" are the same.
(2) The answer to "At least one is a girl; what are the odds there is a boy and a girl?" is 67%.
#2 is false when you answer randomly. Both answers are 50%. #1 is false when you always mention girls first. The answers are 2/3 and 0. And the problem is, that you can't know whether the 2/3 answer applies to the boy version or the girl version, so you can't say that either one is correct. 50% has to be the answer.
Let me still stand by the claim that you are just digging too deep here. The specification "at least one is a girl" is a perfectly well-defined event in terms of the sample space and you just compute the probability of another event given this information. It is not at all about "how you came to ask the question about one kid being a girl, and not about one kid being a boy?".
Consider the simple frequentist interpretation of the problem: Of all the families with two kids, where at least one of the kids is a girl, what is the proportion of families, where the other kid is a boy?
By casting all other possible formulations in this manner you should easily see that it is not about the "method of finding out" the information about the family. This information is just given. And then obviously, both (1) and (2) are true.
It is not a "perfectly well defined event," because the condition that defines an event needs to be a necessary AND a sufficient condition. Saying "there is at least one girl" is just a necessary condition. You are assuming the sufficient part - that for every two-child family with a boy and a girl, you will know "at least one is a girl" but not "at least one is a boy." And again, the test is the two-coin experiment I described. If the answer is 67% regardless of whether I say "heads" or "tails," that means the answer is 67% independent of the information. And that is wrong, it is 50%.
Yes, the problem can be worded so that the answer is 67%. The wording for that problem is "I ASKED a father of two if he has at least one girl, and he said yes." This reverses the flow of information, and creates the sufficient condition I mentioned. Without that sufficient condition, the answer is 50%.
Once again, consider the frequentist example I gave you.
I'll repeat: "Of all the families with two kids, where at least one is a girl, what is the proportion of families, where the other kid is a boy?"
This is exactly what is meant in the original problem, and I hope you have no problem with this task.
What you are trying to argue is that "when you 'are told' that there is at least one boy in the family, then this will not cover all of the families with one girl, because for some of them you will be 'told' that there is at least one boy". It is a very complicated way to understand things. You are introducing unnecessary entities by modeling the way you obtained information about the family. You assume here, that, necessarily, someone told you the information, and moreover, this "someone" must have been chosing between either telling you only that "there is at least one girl" or only that "there is at least one boy", but nothing more. And now you must deal with the problems related to this choice that you introduced into your interpretation, such as "how likely would you be to find out about a boy rather than a girl", etc.
However, as you see, there are actually various ways you could have obtained the information. "Someone chose one of the two facts to tell you and told you" is far not the most obvious way to understand the given fact "at least one is a girl". Nothing states that there was any choice involved in you getting this information. You get this by default, as if "you ASKED", as you express it yourself.
It's like the family goes into a testing machine and the machine flashes a lightbulb if the family has at least one girl. The machine does not choose, it just gives you this information. There could also be another machine, tailored for the alternative problem, that would give you information on whether one of the kids is a boy. For some families, both machines would flash simultaneously.
You do have the right to interpret "I found out" the way you propose, but this turns the whole puzzle into a very arguable linguistic mess. The majority of people would prefer to see it as a mathematical problem, and thus interpret given information as given, without searching for hidden messengers that could choose what to say and what to conceal.
The wording you gave (btw, "frequentist" refers to a solution method, not to a way to phrase the problem), “Of all the families with two kids, where at least one is a girl, what is the proportion of families, where the other kid is a boy?” does have the answer 67%. I said that before (or at least implied it. I thought). HOWEVER, that wording does NOT correspond to "A father says that he has two children, and at least one is a girl." Which is the problem at hand. That phrasing is ambiguous at best.
It is ambiguous, because you don't know how father would decide to tell about his family if there were a boy and a girl. What I am saying, is that if no process is given for that, you have to assume one. Your solution (67%) for this phrasing assumes that process is "in all cases where he has a boy and a girl, he will tell you about the girl." So whether or not you admit it, you are also assuming a "you found out." I'm just formalizing it.
Your "frequentist" phrasing specified that process by explicitly limiting the sample space to "all the families with two kids, where at least one is a girl." It's answer has no bearing on the quesiton about the father. You do not know, as you do in your "frequentist" wording, that the father would always tell you about a daughter when he also had a son.
Now, will you address my coin problem? If I flip a pair of coins 100 times, and each time I tell you either "at least one is heads" or "at least one is tails," then what is the probability (each time) that I got a heads and a tails? What does your answer imply about the odds of getting heads an tails in general?
And finally, which alternate version, your "frequentist" oen or my coin one, is a better analogy to the father wording?
1) "Frequentist" is not just a "solution method", but rather an interpretation of probability. One of many.
2) The problem is not phrased as "Father says that ..."! It is not the problem at hand. Re-read the post, there is no "telling" process involved. You are given information a-priori. You just know it.
And the problem is not formulated in "your" way precisely in order to avoid the interpretation ambiguity you are trying to promote.
3) If the problem were formulated as "father told you", you would have more grounds, but then it would still be a rather subjective choice of interpretation. The best argument you could use to defend your interpretation would be something like what you tried to use: "... But clearly this is better, don't you see? ...". Well, to some people it is better, to some - not.
4) Finally, in your coin reformulation you specifically claim that "a person throws a coin, and then decides what to tell me". This is a different problem whatsoever. If you would like to consider this as a proper well-defined mathematical problem you would need to specify the way the person makes a decision, e.g. "if coin falls once heads and once tails, the person decides randomly by flipping a third coin".
By the way, here is a yet another version of your "coin" formulation without the decision aspect: A friend throws a coin twice. If at least once the coin falls heads, he says "The coin fell heads at least once". If at least once the coin falls tails, he says "The coin fell tails at least once". It can happen that he makes both statements.
In an experiment the friend said "The coin fell heads at least once". What is the probability that he also said "The coin fell tails at least once"?
5) It seems like you are trying to argue or prove something. I think it is somewhat meaningless, because I believe that we understand each other perfectly.
Let me just summarize:
There are (yet other) various possibilities to phrase the original problem:
1) One of the two kids is a girl. What is the probability that the other one is a boy?
7) Father told you that one of the kids is a girl. What is the probability ...
8) You asked a friend to throw two coins and tell you either "one of them is heads" or "one of them is tails", ...
9) You asked a friend ...., and you know that if a friend can choose any of the two, he will decide with 50/50% probability.
Version 1) is a classical problem with a unique non-ambiguous solution 67%.
Version 7) is ambiguous. The reader has the right to interpret it either as 1) or as 8) or as 9).
Version 8) is nonambiguous, but not well-posed, because the way your friend makes a decision is left unspecified. The answer can be anything between 50 and 67%.
Version 9) is a well-posed version of 8), the answer being 50%.
1) From your link, "frequentist" is an "interpretation of probability." As such, it applies to probability calculations, not to the description of the circumstances you use for them. I.e., it is a solution method, not a wording method.
2) The original problem was phrased "Mike has only two children..." All that was said was that "Pat is a girl." Yes, it could be somebody else that "told" you this fact, but who it was doesn't matter. The original problem statement is that we know about a specific child.
3) The 1/3 solution requires two unstated facts: that (A) however it was that you found out that "at least one is a girl" or "Pat is a girl," that the determination considered both children, and (B) the determination is intentionally withholding some information from you, since it knows about both children. If you are not told these two facts, they are unreasoinable to assume.
4) The coin problem is not a different formulation FROM THE ONE I ORIGINALLY DESCRIBED. It is different form the "Mike" formualtion, which you agreed was improperly formed. So I went back to he earliest formulation I know of for this problem, from Martin Gardner's Matehematical Games column in Scientific American. That was "Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?" which is functioanlly equivalent to what I used (and NOT to your "frequentist" wording). Gardner himself admitted it was ambiguous, and that 1/2 is a valid answer. My claim is that he was being generous - it can't be answered 1/2.
5) Yes, the critical point is how you find out "one is a girl." That's the whole point. The 1/3 answer requires some very specific facts (see #3 above) that you can't assume. Without specific knowledge otherwise, you have to assume that a BG family can, with equal probability, provide the information "at least one is a boy" or "at least one is a girl."
> 1) One of the two kids is a girl. What is the probability that the other one is a boy?
The answer can only be 1/2, because you can't assume the facts in #3.
Sorry, I didh't see all your summaries:
> Version 1) is a classical problem with a unique non-ambiguous solution 67%.
Version 1 is a classical problem that was labeled ambiguous by the person who proposed it.
> Version 7) is ambiguous. The reader has the right to interpret it either as 1) or as 8) or as 9).
Version 7 is unambiguous because Father must know both childrens' genders. To solve it, you must assume Father is being unbiased towards mentioning boys or girls, and so the answer is 50%.
> Version 8) is nonambiguous, but not well-posed, because the way your friend makes a decision is left unspecified. The answer can be anything between 50 and 67%.
I have no idea what you think "ambiguous" means, since you described what I would call an ambiguity. But, it doesn't matter how the decision was made, because it could (with equal likelihood, for all you know) have been made the other way. The answer is 50%.
You are trying to convince me that the following two formulations must be considered equivalent:
a) "Mr. Smith has two children. At least one of them is a boy."
b) "I asked Mr. Smith to tell me one fact about his children: is at least one of them a boy, or a girl? He said that at least one of them is a boy".
Sorry, but I shall stand by my opinion that these formulations should better be understood differently. You have complete right to believe the opposite, but there is no way you could convince me. The question of interpretation is not a mathematical problem and you can't use logical reasoning to support an opinion here.
I'm afraid there is also no way I could convince you, but I'll make my last two attempts now.
Attempt 1.
You are told, literally, "at least one is a boy".
You are interpreting it as follows:
The last sentence is an enormous assumption you are introducing. Why are you so sure that there was necessarily a choice between telling you "one of them is a boy" and, alternatively, "one of them is a girl"? Why should a phrase about a girl necessarily be an alternative to what is given. Why couldn't you reason as follows:
And this now makes a completely different problem with a different solution. Here's another way:
This, again, is a yet another problem with a different answer.
Do you see that you have taken a given statement "at least one is a boy", artificially introduced a 'logical alternative' to it (the similar statement about a girl), and artificially introduced an agent that makes a choice between the two alternatives.
You said "without specific knowledge otherwise, you have to assume that a BG family can, with equal probability, provide the information "at least one is a boy" or "at least one is a girl."".
I say no, you do not have to assume anything. And if you want to assume something, why don't you assume that a BG family actually provided you with both statements simultaneously, you just chose to throw one of them away. But then simply assuming nothing about the way of obtaining information is much more natural.
Attempt 2.
You threw a coin three times. You got heads at least once. What is the probability that you got all three heads.
"Your" method of solution:
Do you see that such "solution" introduces artificial assumptions and that it would be much more natural to do without them and obtain the answer 1/7?
And you are trying to convince me that these two are equivalent:
a) "Mr. Smith has two children. At least one of them is a boy."
c) "I asked Mr. Smith if one of his two children was a boy. He said yes."
And there are many others possibilities. Here are two more:
d) You met just one of Mr. Smith's two children. It was a boy.
e) You met Mr. Smith at a meeting for parents at an all-boys school.
The point is that you can't conclude that either b) or c) is equivalent to a) from the text alone. So at best, the problem is ambiguous and WE CANNOT CONCLUDE THAT EITHER 50% OR 67% IS CORRECT. We have to assume something about way the information was obtained in order to get either answer. Let me repeat that in a different way: the problem in unsolvable unless you make an assumption, and both you and I are making one.
What I am trying to tell you is NOT that any of b) thru d) must be the equivalent of a), but that you cannot make the assumptions you need to make for c) or e) to be that equivalent. Any assumption that leads to the 67% answer requires that you add information to the fact that you know "there is at least one boy." The added information is "the fact was obtained in a way that required an answer about boys." You are making that assumption. On the other hand, if you assume the information was obtained in an unbiased manner, the answer will always be 50%.
So don't get me wrong, I admit both are assumptions. So it is really a matter of applying Occam's Razor. Is it "simpler" to assume that the information was obtained with a bias toward finding out about boys, or to assume it was unbiased? All I'm saying, is the unbiased assumption is the better one.
-----
You said "Do you see that you have taken a given statement 'at least one is a boy,' ..." and the rest isn't important. In a probability text book, the word "given" implies more than it does in plain English. It implies a necessary and sufficient condition. A sample is in the sample space if the condition is true, and it is not in the sample space if the condition is false. In English, all that stating "Mike has at least one boy" means is that Mike must have at least one boy to be in the sample space. It does not mean that every father of at least one boy is in the sample space. This is the ultimate cause of the controversy that surrounds this question - mistaking the English use of the word "given" and the more strict Mathematical one.
-----
Anyway, I'll apply this discussion to your six versions, with an eye toward what you wanted to get for an answer, and what I think the answer is:
1. Mike has two kids, one of them is a boy. What is the probability that the other one is a girl?
You wanted 67%. I say 50%.
2. Mike has two kids, the elder one is a boy. What is the probability that the other one is a girl?
We agree on 50%.
3. Mike has two kids. One of them is a boy named John. What is the probability that the other one is a girl?
This one is the telling case. It absolutely can not matter what the name of the boy is; this answer cannot be different from question #1. Your approach leads to a little over 50% (exactly how much depends on how common the name is and whether you assume there can be two Johns in a family). Mine leads to exactly 50%.
Your approach must be wrong, because it changed the answer. The reason it is wrong is because the "added information" here is that you are assuming the creation of a sample space with all fathers of a son named John. That you made a list of all such fathers, and picked one. The name "John" became a requirement for you to knock on Mike's door, before you knocked on it.
And I have written Professor Mlodinow to tell him why he is wrong. He has not answered.
4. I came to visit Mike. One of his two kids, a boy, opened the door to me. What is the probability that Mike’s other child is a girl?
This is the same as #2. You observed a random child. It does not matter how you determined it - by age, or by who answered the door. I'm pretty sure you will agree.
5. I have a brother. What is the probability that I am a girl?
The correct answer is 50%. "You" are still a random person as far as this question establishes. I don’t know how you will answer it, because you are making invalid assumptions in other cases that you could make here as well.
6. I have a brother named John. What is the probability that I am a girl?
Same as question #5. Again, your brother's name cannot change anything.
-----
Oh, and your coin problem is also unanswerable without an assumption. The solution you think is correct assumes that the information was obtained by effectively asking "Yes or no, is there at least one heads?" I would solve it by first stating that I assumed the question was "Tell me about one coin." The odds for each case are:
HHH 1/8 * 3/3 = 1/8
HHT 3/8 * 2/3 = 2/8
HTT 3/8 * 1/3 = 1/8
TTT 1/8 * 0/3 = 0
The odds of HHH, given that you were told about one heads, is (1/8)/(4/8)=1/4. Which is the intuitive answer, given that you know about one coin.
Firstly, for the record, yes, I do regard a) and c) as effectively equivalent. You understand me absolutely correctly.
The version d) is (in my view) different. It is stated in the post under number 4.
Secondly, it is a problem from a probability textbook we are talking about here, and that's why you should not look for hidden meanings that you could "find in plain English". After all, English is not a formal language and you could find any hidden meaning you wish, if you digged deeply enough.
Thirdly, as for your solutions:
1. 67% is correct (according to "textbook interpretation", if you wish).
2. Correct.
3. There is nothing wrong with the approach. You agreed that answers to 1. and 2. could be different. Therefore, it actually does matter whether the statement "has a son named John" is equivalent to "at least one of the kids is a boy" (which would be the limit case if John were the only possible name for a boy, in which case knowing the name would provide no additional information), or rather to "he is a father of The One and Only John" (which would be the other limit case if there were only one John in the whole world, and then specifying the name "John" would uniquely identify a single kid).
4. Correct.
5. Correct.
6. Correct.
Fourthly, your solution to the "three coin problem" is, well, highly unorthodoxal at least. Any probability textbook would tell you the answer is actually 1/7. You do have the right for your own opinion, but don't disrespect the "standard approach" so much. You need to at least understand it to claim knowledge of probability theory.
Could we stop the flood now, please?
A) and c) are not effectively the same. In a probability textbook, the word "given" will be (at least, it should be) defined. In an isolated question, it is not. The point is that you can't say 50% is a wrong answer unless you state what want to be assumed.
I agreed that the answers to 1 and 2 could be different IF YOU ESTABLISH THE ASSUMPTION you want, that you include all families with at least one boy. But without being explicit in the problem statement, that is an assumption you cannot make. And the fact that assuming it changes the answer for 3 proves me right. The answer to 3 is not 67%, even by your approach. But knowing a boy's name can't contain any more information than just knowing he is a boy, when determining the probabilities for his sibling.
In my answer to the coin problem, I said I would state my assumption, so there is nothing unorthodox about it. The probability textbook you allude to would do the same, by saying "given you rolled at least one heads" and defining what that "given" meant. The wording you used would be unanswerable, even in that textbook. To quote Martin Gardner, "That the best of mathematicians can overlook such ambiguities is indicated by the fact that this problem, in unanswerable form, appeared in one of the best of recent college textbooks on modern mathematics."
Regarding answer 3. Knowing the boy's name provides you with more information than just knowing that he is a boy. I already gave you an example. Knowing that the boy's name is "R2D2", so that there is, hypothetically, perhaps just one boy in the world with such a name, means knowing way more than just his sex.
Your coin problem answer is unorthodox precisely because you are taking a straightforward formulation ("You got heads at least once"), and introducing "your assumptions" about it, which are radically different from the kind of assumptions most "other" people would introduce. You see, a mathematical problem is not expected to define the word "given" every time it is used. There is a standard interpretation for this word and it is normal to assume the person solving a mathematical task knows this interpretation.
For example, when I tell you, "let x < 2", I have the right to assume that you will just take this information as given, without questioning its origin and attempting to "introduce priors", by, for example, reasoning in the following manner:
"But what didn't you say 'let x < 1'. You would probably say so if x were actually less than 1, but you didn't. Therefore it makes sense to assume that x is actually greater than 1".
You claim that the straightforward expression "one of the kids is a boy", is "not enough" for you to "establish the assumption that I am considering all families where one kid is a boy"? Well, then you are having a serious case of misunderstanding. Because when I say "one kid is a boy" I really mean that you must consider precisely those families where one kid is a boy, no more and no less. Any other interpretation is "unorthodox".
Another analogy: imagine a textbook problem stated as follows: "There are 10 mechanics in city X. Five of them are engineers, and five other play the violin. When walking along the streets of city X, we met John, who turned out to be a mechanic. What is the probability that he plays the violin?".
I guess that instead of providing a straighforward answer 0.5 you would argue here as follows: "well, the problem did not state clearly enough that we should consider the space of all mechanics, because some of the mechanics are actually engineers and for these you would not find out that John is a mechanic, because you would instead find out that John is an engineer. Hence the probability must be greater than 0.5".
This just doesn't make sense to me.
But then again, why are we arguing, you're stuck with your opinion which I can't change, and I'll stick to mine, there is no way you could convert me.
So let's just close the flood, OK?
I'll "stop the flood" when you stop making the reasons fit the answer you want, rather than make the answer fit the reasons. Try an open mind, as though you didn't come in already "knowing" the answer. Finding out information about a boy does not, and cannot, provide any more information about his sibling than just finding out he is a boy. To prove it, use the same method you are trying to use for these two questions:
1) Mike has two children, one of whom is a boy named John. What is the probability that he has a sister?
2) Mike has two children, one of whom is a boy NOT named John. What is the probability that he has a sister?
These two quesitons divide youe sample space in two. Using your methods, the "added information" in both cases reduced the odds from 67% to something less than 67%. But it can't reduce both. They have to average to 67%, so one must increase.
That absurd result means something was wrong in the solution. The "added information" you are assuming is more than just knowldge of the name. It is that you are eliminating ALL families with a boy named John, not just the boy who made you say say "at least one is a boy."
The coin problem is not unorthodox as you describe, because an assumtion is NECESSARY to decipher what "at least one" means. Does it mean you looked at one, and it was heads? Or that you looked at all three, and all three were not tails? You are making the tacit assumption of the last case, which is not justified from the problem statement. In otehr words, we are both making an assumption. My answer was more orthodox than your, because I stated my assumption.
The problem statements we are discussing didn't used the word "given." That is a word you associated with them, based on your understanding of the English meaning for the word. It is your use of the word that is in conflict with the mathematical definition. They are not the same. And it only matters if ambiguous information is "given," unlike your inequality comparisons. If I tell you that a zebra has black stripes, it isn't "given" that it doesn't have white stripes.
If I meet a Mr. Smith in the street walking with a boy he introduces as his son, and he says he has two children, it is perfectly valid for me to state "Mr. Smith has two children, and at last one of them is a boy." But in this case, the odds are 50% that he has a sister, not 67%. So yes, it is straightforward to say "'one of the kids is a boy' is not enough to establish 'all families where at least one is a boy.'"
These two quesitons divide youe sample space in two. Using your methods, the “added information” in both cases reduced the odds from 67% to something less than 67%. But it can’t reduce both. They have to average to 67%, so one must increase.
The answer to the question 3 in the post is 1/(2-0.5p), where p is the probability for a boy to be named "John".
It translates into your two versions as follows:
1. If you know that one of the boys is named John, the probability that he has a sister is 1/(2-0.5p), which is, for a small p something close to 0.5.
2. If you know that one of the boys is not named John, the probability he has a sister is 1/(2-0.5(1-p)), which is, for a small p something close to 0.66.
I see no absurdity here and I see no reason for the two answers should "average to 0.67". Imagine that there are only two names for boys in the whole world - "John" and "Mike", given out with equal probability. In this case the amount of information contained in the statement "his name is John" is actually equivalent in bits to the amount of information contained in the statement "name is not John" (i.e. "name is Mike"). And the two answers would, naturally, be equal (4/7).
When the name "John" is more rare, the information "is John" is more specific and this drives the answer down towards 0.5. The information "is not John" in this case is less specific and drives the answer up towards 0.66
Because an assumtion is NECESSARY to decipher what “at least one” means.
Sorry to repeat it again, but in my world, "at least one" always means literally at least one. For example, here are some natural numbers that contain at least one numeral '2' in their decimal representation:
20, 12, 1982, 42, 5201, 2222, 671892.
According to your logic, the number 5201 might very well not be a number containing at least one "2", because, well, you didn't care to look at its second numeral. I guess that for you 5201 is in fact a number "containing at least one 2 with probability 1/4". Because you like to introduce "ASSUMPTIONS" regarding what "at least one" might mean in your world and how you should decide, for a given object, whether it belongs to the class or not. Your assumptions are fun, but I prefer to stick to basic set theory.
Well, this will be my last post in the "flood" since you simply can't seem to see what I am saying. The assumption is not about what "there is at least one" means in isolation, but about what it means about the sample space. "There is at least one" always means "there is at least one," but the issue is whether that is both a necessary and sufficient condition, or just a necessary condition.
I understand that you want "there is at least one boy" to be both necessary and sufficient. What I'm saying is that, in a story problem, it can't be sufficient unless you include a reason why in the story. I'm saying that by stating a fact, a story problem only tells you that the fact applies to all of the cases you need to consider (necessary). Not that it can't apply to cases you can dismiss (which would be sufficient). Any story problem that doesn't tell you what you can dismiss as well is ambiguous.
Example: If I tell you "All roses in my flower shop are red," then you can safely conclude "if a flower is a rose, then it is red." But you can't conclude "if a flower is red, then it is a rose." The knowledge that 75% of my flowers are red only tells you that AT MOST 75% are roses.
Similarly, if I tell you "Mike has at least one boy," then you can conclude "If a man is a possible Mike, he has at least one boy." But you can't conclude "If a man has at least one boy, he is a possibile Mike." Yet that is exactly what you are doing. You are using the knowledge that 75% all families have at least one boy to say that 75% are possible Mikes.
The best way to resolve that ambiguity is to assume Mike randomly told you (or however you found out the information, and you DID find it out so there has to be an "however") about one child's gender. It isn't the only way; you can also assume he will always tell you about girls. Both are assumprtions, but one is biased (toward girls) and one is not. You can't assume a bias.
Again similalry, "according to my logic," 5201 is a number containing at least one 2. But "according to my logic," when I hear you say "I am thinking of a number between 0 and 9999 with at least one 2," I have to consider that if you were thinking of the number 1234, that you could equally likely have said "at least one 1," "at least one 3," or "at least one 4." You only told me the number has a 2, not that you choose it from all numbers with a two.
Finally, say you start off knowing that Mike has at least one boy, and also say that the probability he has a girl as well is P. If you ask him "Do you have at least one boy named John?" then the two possible answers you can get separate the previous sample space into two NON_INTERSECTING sets. The average of the adjusted probabilities (weighted appropriately by the probabilities for the two answers) has to be P. But it can't be P if both answers make it go down.
Good day.
1) I don't understand what you mean under a "story problem" and how you separate it from a "non-story problem". But if a "story problem" for you means something that you should interpret differently from a "textbook problem", then just accept that this is not meant to be a story problem. There is no story here. Just re-read carefully:
"In the following, let Mike be a randomly chosen father of two kids. Mike has two kids, one of them is a boy." Where's the story? The situation is completely defined, no need to invent any "probability that the father of a boy would not be included in the sample". You see, in a mathematical "textbook problem" you must assume that all you need for a solution is already given to you.
2) You think that by proposing a "girl" alternative to a problem that explicitly contains the word "boy" you are "de-biasing it". But again, why do you think that you should "de-bias" specifically the word "boy". Why not presume an alternative would be "has at least two boys"? Or why don't you argue with the initial phrase "Mike has two kids"? Could it not be that some Mikes with two kids would actually not be in the sample for some reason?
You accuse me (and, in fact, nearly the whole world) of being "biased" by "only considering boys" in an interpretation of a problem that explicitly asks to consider boys. The word "boy" is in the statement of the problem. The word "girl" is the one you try to artificially introduce.
And then what about "has at least one boy named John". What would your "alternative" be? 'Has at least one girl named John'?
3)
"Has a kid named John" and "Has no kids named John" are indeed non-intersecting. But these are not the versions I discussed in my last comment. They were: "Has a boy named John" and "Has a boy not named John". These are intersecting (a family "Mike, son John, son Bill" falls in to both sets).
The "Has no kids named John" version does indeed raise the probability of having a girl over 0.66 because it cuts away a whole lot of boys ("Johns") as potential brothers.
4) I understand your example of “I am thinking of a number between 0 and 9999 with at least one 2", but it's just not the case here. If a probability textbook tells you "LET x be a randomly chosen number with at least one 2" then you are expected to presume that it was indeed chosen from all possible numbers with at least one 2. Or, equivalently, that you "have asked" and "found out".
5) I would actually be happy to close the flood. Believe me, I do see what you mean. It's just that this specific problem is not meant to be solved the way you want it to. What essentially follows from your remarks is that the formulation in the post must be changed to:
"I asked Mike whether ...., and he said yes".
And despite considering your remark insightful, I still feel the freedom to disagree with such a change in formulation, because experience shows that most people would understand clearly what is actually meant if I just write "Mike has two kids and one of them is ...".