FeebleMinded Finally Preemie! 4472 Posts user info edit post |
So one of my favorite hobbies is playing Scrabble. Believe it or not there is a fairly large subculture of people who play this game very seriously. Yes, many of them are really dorky and have virtually no social skills, but there are some "normal" people too. Anyway, the game goes far beyond what most people have played casually. There are well over 100,000 words that serious players memorize, many of the words (I'd say roughly 75% of them) are words you would never know if you did not see them listed and study them. [/lame intro]
Seen above is a screen capture of a program called ZYZZYVA (which incidentally is a word) that unscrambles words, creates quizzes, etc. One of the functions it has is determining probability, which is a great way to study. In other words, you are much better off studying words like AILERON or DARIOLE made with common letters with higher probability than words like FILIBEG or FUMULUS, at least initially. So what I was wondering is how does this program go about determining the probability of playing a word? Here is the letter distribution:
At first I thought I could assign a probability to each letter (for instance E=12/100, G=3/100, Z=1/100) and then just multiply all these numbers together, take the inverse of the product, and whichever letter combination had the highest value would be the most probable. Well, just by using this method on a few different cases I found it to be an epic FAIL. I would personally think a word like BEEBEES would be near the top considering it has 4 E's, which appear at the highest probability, along with BBS, which are not totally uncommon. However BEEBEES is about 23,000 of 24,000 7-letter words.
Being able to calculate this during a game would be an immense help. So if anyone out there could give me some help, I would be greatly appreciative. 3/18/2009 6:46:52 AM |
Jrb599 All American 8846 Posts user info edit post |
I just woke up, but let me make one comment.
Take a word like ZEE (if it is a word, I only use it cause it's the two letters you give us).
One thing I noticed is the probability by your method would not be
(1/100)*(12/100)*(12/100)
it would be
(1/100)*(12/99)*(11/98)
Because once you get to your e (12/99), you've already pulled out one letter and when you get to your second e you've already pulled out 2 letters, one which is an e.
[Edited on March 18, 2009 at 8:11 AM. Reason : ] 3/18/2009 8:08:39 AM |
FeebleMinded Finally Preemie! 4472 Posts user info edit post |
I agree that I didn't take into account the fewer number of remaining tiles, but I don't believe that would effect the results because it's a common error across the board. 3/19/2009 1:04:21 AM |
ncsu919 All American 1067 Posts user info edit post |
you'd be surprised... 3/19/2009 10:06:09 PM |
1985 All American 2175 Posts user info edit post |
it especially matters when letters already have low probabilities. For instance, if b was 2/100, well, once you use the first, the probability of getting the second is nearly cut in half. 3/20/2009 11:36:06 PM |
BigHitSunday Dick Danger 51059 Posts user info edit post |
this is why scrabble is bullshit and i can never win
haha
god i hate this game, but i love it
and no, i dont have a clue what u talkin bout...but im feelin it man 3/22/2009 1:53:41 AM |
A Tanzarian drip drip boom 10995 Posts user info edit post |
If you alone were to draw letters from an initially full set, the probability of drawing BEEBEES, in that order, is:
(2/100)*(12/99)*(11/98)*(1/97)*(10/96)*(9/95)*(4/94) = 1.2E-9
However, there are multiple orders in which the letters can be drawn. Also, you can draw 'different' letters; e.g., a different set of 4 E's than you drew the previous time. Each individual scenario that gives the neccessary letters must be taken into account to get a 'true' probability of a particular word.
I'm sure ZYZZYVA is making assumptions about how and when the letters are drawn. If you want to duplicate ZYZZYVA probabilities, you're going to need to know those assumptions.
[Edited on March 22, 2009 at 1:04 PM. Reason : ] 3/22/2009 12:45:44 PM |
aaronian All American 3299 Posts user info edit post |
I'll stick to playing dumbass slutbags on lexulous on facebook.. 3/24/2009 12:06:35 AM |
wolfpackgrrr All American 39759 Posts user info edit post |
The official Scrabble dictionary is filled with such bs words. 3/24/2009 4:41:59 AM |
Jrb599 All American 8846 Posts user info edit post |
Quote : | "Also, you can draw 'different' letters; e.g., a different set of 4 E's than you drew the previous time." |
You can only draw a set of 4 Es one way.
EEEE is the same as EEEE3/24/2009 11:07:28 AM |
A Tanzarian drip drip boom 10995 Posts user info edit post |
...but you're choosing 4 out of 12 E's. There are 495 ways to do that.
Ignoring the blank tiles and assuming you're simply pulling tiles from a bag:
[ C(2, 2) * C(12,4) * C(4,1) ] / C(100,7)
[ 1 * 495 * 4 ] / 1.60E10
1980 / 1.60E10
1.24E-7
which is about 1 in 8.1 million. 1 in 7 million if you drop the two blank tiles.
We need FeebleMinded to tell us what the probability is according to ZYZZYVA. 3/24/2009 8:41:12 PM |
FeebleMinded Finally Preemie! 4472 Posts user info edit post |
It doesn't say what the probability is, it simply ranks the words in order of most to least probable. If anyone is a computer programmer type person, the source code is on the website. I couldn't even begin to comprehend it though.
http://www.zyzzyva.net/ 3/26/2009 12:31:48 AM |
aaronian All American 3299 Posts user info edit post |
whats the probability of me starting with 8 vowels in back to back games? because it happened. 3/26/2009 9:40:29 AM |
ncsu919 All American 1067 Posts user info edit post |
(1/whatever chances of starting with 8 vowels)*(1/whatever chances of starting with 8 vowels)
..overall it's probably a pretty high chance respectively. 3/26/2009 3:40:35 PM |
Jrb599 All American 8846 Posts user info edit post |
^^^^ So take your BEEBEES example.
The probability drawing that in that order is
(2/100)*(12/99)*(11/98)*(1/97)*(10/96)*(9/95)*(4/94) = 1.2E-9
but if you take a different set of E's it's still the same scenario. Because you're (12/99) captures all the ways you can get E, not an individual E. What you need to factor in is the order you can draw different letters.
[Edited on March 26, 2009 at 3:58 PM. Reason : ] 3/26/2009 3:54:26 PM |
FeebleMinded Finally Preemie! 4472 Posts user info edit post |
Quote : | "whats the probability of me starting with 8 vowels in back to back games? because it happened." |
If you are playing by the rules, the probability is zero because you only ever have 7 tiles on your rack at once.3/26/2009 4:49:30 PM |
A Tanzarian drip drip boom 10995 Posts user info edit post |
^^ Take a look at http://svn.pietdepsi.com/repos/projects/zyzzyva/trunk/src/libzyzzyva/LetterBag.cpp.
It looks like he's using combinations to calculate probabilities.
^ For each word he's calculating the probability based on drawing the number of tiles in the word from an initially full bag; i.e. he's determining the probability of spelling a three letter word after drawing 3 tiles, not the probability of being able to spell a particular 3 letter word after drawing 7 tiles. He includes the blank tiles (which I didn't do above). 3/26/2009 6:30:43 PM |
Jrb599 All American 8846 Posts user info edit post |
I'll explain it better when I get outta class, but a combination would say that EEEE=EEEE
Essentially, what you are saying is that each E is unique. If that is the case, then the probability of pulling an E is 1/100, not 12/100.
Another thing you're saying is that the order of E's matter, but then you use C(100,7) as your denonminator, which doesn't care about order. The dominator of all the ways you can draw 7 letter combinations will be
100!/92!
100 choices for the first letter, 99 for the second, and so on.
[Edited on March 26, 2009 at 6:53 PM. Reason : ] 3/26/2009 6:44:37 PM |
A Tanzarian drip drip boom 10995 Posts user info edit post |
That's why I started using combinations, because order doesn't matter.
Quote : | "[ C(2, 2) * C(12,4) * C(4,1) ] / C(100,7)" |
The order you select E's doesn't matter, but how many ways you can select 4 E's from 12 E's does matter.
[Edited on March 26, 2009 at 7:00 PM. Reason : I'll be back in awhile]3/26/2009 6:59:56 PM |
Jrb599 All American 8846 Posts user info edit post |
you can only select 4Es from 12 one way. It's because they aren't unique.
You get EEEE, you're trying to make the E's unique.
I'll come back with a much longer explantation tomorrow. 3/26/2009 7:07:33 PM |
aaronian All American 3299 Posts user info edit post |
Quote : | "If you are playing by the rules, the probability is zero because you only ever have 7 tiles on your rack at once." |
true. but I forgot to mention I play lexulous on facebook which gives you 8 tiles.3/26/2009 8:45:34 PM |
Cabbage All American 2086 Posts user info edit post |
It's a pretty straightforward probability problem to determine the probability of picking any particular 7 letters:
Let C(n,r) be the binomial coefficient: n! / [r!(n-r)!], or 0 if r > n.
The method is easiest to illustrate by example:
There are C(100,7) ways of selecting 7 tiles at the beginning of the game.
If you want to calculate the probability of, say, "aaeeejt": Count the number of ways you can get this combination:
C(9,2) * C(12,3) * C(1,1) * C(6,1)
There's exactly one binomial coefficient for each distinct letter:
Count how many ways to select 2 of the 9 a's
times
Count how many ways to select 3 of the 12 e's
times
Count how many ways to select 1 of the 1 j's
times
Count how many ways to select 1 of the 6 t's
That's how to get the numerator.
Then divide by C(100,7) to get the actual probability.
With this formula, it's more or less straightforward to program a computer to calculate the probability of any 7 letter combination, then order them from most to least likely. 3/31/2009 9:55:32 PM |
Jrb599 All American 8846 Posts user info edit post |
^Sorry, error there.
First, you use C(100,7); you need a permutation. You're numerator is off too.
I forgot about this thread, I'll write something up in a bit.
[Edited on March 31, 2009 at 10:29 PM. Reason : ] 3/31/2009 10:26:12 PM |
Cabbage All American 2086 Posts user info edit post |
You're going to need to tell me why I'm wrong. Just telling me I'm wrong doesn't make it so.
Edited to add: For that matter, I think it's clear that this is a combination problem, not a permutation problem. If you draw a,e,e,e,e,e,e, how's that any different from e,e,a,e,e,e,e? You still have one a and 6 e's--that's all that matters.
[Edited on March 31, 2009 at 10:35 PM. Reason : adding stuff] 3/31/2009 10:33:33 PM |
Jrb599 All American 8846 Posts user info edit post |
For aaeeejt
You have (9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)
that is the probability you draw aaeeejt. However, what if you draw jtaaaee. I mean you can still play aaeeejt. So we need to factor that in too. So there is c(7,3) ways to place the E, c(4,2) ways to place the a, and 2 ways to place the j and t.
(9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)*c(7,3)*c(4,2)*2;
that's your probability.
Also notice, the denominator is 100!/93!, a permutation not a combination.
Quote : | "If you draw a,e,e,e,e,e,e, how's that any different from e,e,a,e,e,e,e?" |
They're different. That's simple probability. Suppose you flip a coin twice and you want to know how many times you will get heads once and tails once. You can get HT and TH, they are different. your first term has an a in the first stop and the second has the a in the third spot.
[Edited on March 31, 2009 at 10:53 PM. Reason : ]3/31/2009 10:37:27 PM |
Cabbage All American 2086 Posts user info edit post |
Quote : | "For aaeeejt
You have (9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)
that is the probability you draw aaeeejt. However, what if you draw jtaaaee. I mean you can still play aaeeejt. So we need to factor that in too. So there is c(7,3) ways to place the E, c(4,2) ways to place the a, and 2 ways to place the j and t.
(9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)*c(7,3)*c(4,2)*2;
that's your probability." |
How's that any different from mine? I mean, obviously the method is different, but try calculating mine before you say my method is wrong; you may be surprised. If I'm wrong, then you're wrong, too. My method just seems more natural to me; I respect that the same may be true for you with your method.
Quote : | "They're different. That's simple probability. Suppose you flip a coin twice and you want to know how many times you will get heads once and tails once. You can get HT and TH, they are different. your first term has an a in the first stop and the second has the a in the third spot. " |
They're not different if you're interested in combinations instead of permutations. Playing Scrabble, you're drawing a combination of letters, not a permutation. I can make exactly the same plays on the board with a,e,e,e,e,e,e as I can with e,e,a,e,e,e,e. The order I pull the tiles out of the bag is irrelevant.3/31/2009 11:33:01 PM |
ncsu919 All American 1067 Posts user info edit post |
the odds you draw any 1 combination that you are looking for is so low, it isnt worth trying to study the "most" likely 7 letter word combos. 4/1/2009 12:08:19 PM |
aaronian All American 3299 Posts user info edit post |
this takes me back to st311 4/1/2009 1:00:33 PM |
Jrb599 All American 8846 Posts user info edit post |
Quote : | "How's that any different from mine? I mean, obviously the method is different, but try calculating mine before you say my method is wrong; you may be surprised. If I'm wrong, then you're wrong, too. My method just seems more natural to me; I respect that the same may be true for you with your method." |
We get different numbers, I'm sorry but the solution I presented is right.
Quote : | "They're not different if you're interested in combinations instead of permutations. Playing Scrabble, you're drawing a combination of letters, not a permutation. I can make exactly the same plays on the board with a,e,e,e,e,e,e as I can with e,e,a,e,e,e,e. The order I pull the tiles out of the bag is irrelevant." |
You're interested in permutation.
Can I ask what probability classes you've taken.
[Edited on April 1, 2009 at 1:44 PM. Reason : ]4/1/2009 1:43:50 PM |
Cabbage All American 2086 Posts user info edit post |
Quote : | "We get different numbers, I'm sorry but the solution I presented is right." |
Did you actually try calculating both? If you're not getting the same numbers then you've made a mistake somewhere in your calculations. You should get 2.968597189*10^(-6) for both.
Quote : | "You're interested in permutation." |
No I'm not. A permutation is when order matters. In Scrabble, order doesn't matter. If I pull out seven letters and can make the seven letter word "feature", it doesn't matter if I pulled them out in the order f-e-a-t-u-r-e or in the order a-e-e-f-r-t-u or in any other order--I still have the same combination of seven letters, and can still make exactly the same plays on the board. That's exactly what it means to be a combination as opposed to a permutation.
Quote : | "Can I ask what probability classes you've taken." |
Of course. At State I've taken MA 546. I've taken two or three other probability classes at VA Tech, but that was years ago and I forget the course numbers.4/1/2009 3:15:41 PM |
Jrb599 All American 8846 Posts user info edit post |
If you got the same number as me, rock on. I must of messed up calculating one of the numbers. All I know is that mine is right.
Quote : | "No I'm not. A permutation is when order matters. In Scrabble, order doesn't matter. If I pull out seven letters and can make the seven letter word "feature", it doesn't matter if I pulled them out in the order f-e-a-t-u-r-e or in the order a-e-e-f-r-t-u or in any other order--I still have the same combination of seven letters, and can still make exactly the same plays on the board. That's exactly what it means to be a combination as opposed to a permutation." |
I know the difference, it can be tackled both ways. I was thinking your combination way was wrong, but I guess not. I was wrong when I thought you got a different number then me, which led me to believe you did it wrong with combinations. So I thought I would explain it with permutations Whoops.
So I guess we've posted two different ways to do it.
[Edited on April 1, 2009 at 5:21 PM. Reason : ]4/1/2009 5:03:21 PM |
Cabbage All American 2086 Posts user info edit post |
By the way, I was curious how many different seven letter combinations you can get in Scrabble, so I got a CAS to expand the generating function for me:
1 + 27*x + 373*x**2 + 3509*x**3 + 25254*x**4 + 148150*x**5 + 737311*x**6 + 3199724*x**7 + 12353822*x**8 + 43088473*x**9 + 137412392*x**10 + 404600079*x**11 + 1108793943*x**12 + 2847262062*x**13 + 6890404765*x**14 + 15792242064*x**15 + 34425824044*x**16 + 71646518736*x**17 + 142827698985*x**18 + 273533670283*x**19 + 504576050285*x**20 + 898623709228*x**21 + 1548387401915*x**22 + 2586170833356*x**23 + 4194275182613*x**24 + 6615385384601*x**25 + 10161692700549*x**26 + 15221174189579*x**27 + 22259221214607*x**28 + 31813753798288*x**29 + 44482134367066*x**30 + 60898641337468*x**31 + 81701986711369*x**32 + 107493329723951*x**33 + 138786376090493*x**34 + 175952346689553*x**35 + 219163709706077*x**36 + 268341443489446*x**37 + 323111088944227*x**38 + 382772844896252*x**39 + 446290391042394*x**40 + 512301987174498*x**41 + 579155760119564*x**42 + 644969083769945*x**43 + 707709770134396*x**44 + 765294643135632*x**45 + 815699194394498*x**46 + 857070636209692*x**47 + 887835941961195*x**48 + 906796502925404*x**49 + 913201857455724*x**50 + 906796502925404*x**51 + 887835941961195*x**52 + 857070636209692*x**53 + 815699194394498*x**54 + 765294643135632*x**55 + 707709770134396*x**56 + 644969083769945*x**57 + 579155760119564*x**58 + 512301987174498*x**59 + 446290391042394*x**60 + 382772844896252*x**61 + 323111088944227*x**62 + 268341443489446*x**63 + 219163709706077*x**64 + 175952346689553*x**65 + 138786376090493*x**66 + 107493329723951*x**67 + 81701986711369*x**68 + 60898641337468*x**69 + 44482134367066*x**70 + 31813753798288*x**71 + 22259221214607*x**72 + 15221174189579*x**73 + 10161692700549*x**74 + 6615385384601*x**75 + 4194275182613*x**76 + 2586170833356*x**77 + 1548387401915*x**78 + 898623709228*x**79 + 504576050285*x**80 + 273533670283*x**81 + 142827698985*x**82 + 71646518736*x**83 + 34425824044*x**84 + 15792242064*x**85 + 6890404765*x**86 + 2847262062*x**87 + 1108793943*x**88 + 404600079*x**89 + 137412392*x**90 + 43088473*x**91 + 12353822*x**92 + 3199724*x**93 + 737311*x**94 + 148150*x**95 + 25254*x**96 + 3509*x**97 + 373*x**98 + 27*x**99 + x**100
The exponent corresponds to how many tiles you draw, and the corresponding coefficient counts the number of different combinations. So if you draw seven tiles (like in the regular rules) there are 3,199,724 different letter combinations you could possibly get. 4/1/2009 11:50:00 PM |
Jrb599 All American 8846 Posts user info edit post |
^Generating functions are really helpful, you almost always need a computer to do it.
[Edited on April 2, 2009 at 11:08 AM. Reason : ] 4/2/2009 11:07:41 AM |
Jrb599 All American 8846 Posts user info edit post |
How can you have 27 1-letter combinations with only 26 letters in the alphabet? I guess it's including the empty tile?4/2/2009 7:06:23 PM |
Jrb599 All American 8846 Posts user info edit post |
How can you have 27 1-letter combinations with only 26 letters in the alphabet? I guess it's including the empty tile?4/2/2009 7:06:23 PM |
Cabbage All American 2086 Posts user info edit post |
Yes, I included the blank tile. 4/2/2009 10:42:40 PM |
FeebleMinded Finally Preemie! 4472 Posts user info edit post |
Quote : | "the odds you draw any 1 combination that you are looking for is so low, it isnt worth trying to study the "most" likely 7 letter word combos." |
This is false on so many different levels.
Yes, the odds of simply drawing the 7 tiles on your first turn are not that great, however, the strategy is to "play off" bad bingo-ing tiles (like Z, Q, etc) knowing that you will more than likely draw a high probability tile. So the whole idea is you are learning words that contain either 6 very high probability tiles and one outlier or 7 very high probability tiles. Trust me it works, as I have played/seen played lots and lots of high probability words and very few low probability words.4/5/2009 12:48:59 AM |
David0603 All American 12764 Posts user info edit post |
What language did they use to code it? 4/5/2009 3:29:35 PM |
A Tanzarian drip drip boom 10995 Posts user info edit post |
C++
and
Yay, combinations! 4/6/2009 4:52:15 PM |