ThatGoodLock All American 5697 Posts user info edit post |
If I have a document with 7,500 different words and I want to know how many possible "phrases" I can make out of that document, to include single words so at least 7,500 base phrases...how would I got about doing that?
There's no set number of words for each phrase, it could be as short as 1 word (1 base phrase) or as long as 7,500 words long (also 1 phrase)
you can't repeat the same word when counting
you can make a phrase out of words that don't run next to each other (word 1, word 3, word 1114 = 1 phrase) but again, you can't repeat so there's no other combination of those exact 3 words that would amount to another phrase
god i hate math, this isn't for anything school-related, just trying to wrap my head around something... 11/22/2012 6:37:41 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
7500! 11/22/2012 6:39:42 PM |
ThatGoodLock All American 5697 Posts user info edit post |
ok now calculate using more than just the number of base phrases... 11/22/2012 6:41:33 PM |
Noen All American 31346 Posts user info edit post |
Should be a factorial minus the difference of unique words from total words.
So (7500 !) - (7500 - unique words) 11/22/2012 6:44:41 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
nm
[Edited on November 22, 2012 at 6:46 PM. Reason : a] 11/22/2012 6:45:46 PM |
Krallum 56A0D3 15294 Posts user info edit post |
Lets do something large
I'm Krallum and I approved this message. 11/22/2012 6:46:51 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
summation (7500!)/(7500-n)! n=0..7500! 11/22/2012 6:47:36 PM |
GrayFox33 TX R. Snake 10566 Posts user info edit post |
42
[Edited on November 22, 2012 at 6:55 PM. Reason : Crallum] 11/22/2012 6:53:15 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
3.3249187468*10^25808 11/22/2012 7:02:11 PM |
ThatGoodLock All American 5697 Posts user info edit post |
how is it not 5.211710597023 E+25850 11/22/2012 7:03:05 PM |
ThatGoodLock All American 5697 Posts user info edit post |
whoops, i calculated 7,511! 11/22/2012 7:03:45 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
How did you get that? 11/22/2012 7:03:46 PM |
GingaNinja All American 7177 Posts user info edit post |
(7500 P 1)+ (7500 P 2) + (7500 P 3) + ..... + (7500 P 7500) ?? 11/22/2012 7:25:08 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
^ Which is what summation (7500!)/(7500-n)! n=0..7500! means. 11/22/2012 7:44:43 PM |
lewisje All American 9196 Posts user info edit post |
tree fiddy
thou 11/22/2012 8:58:30 PM |
ndmetcal All American 9012 Posts user info edit post |
11/22/2012 9:01:26 PM |
qntmfred retired 40721 Posts user info edit post |
Hey! 11/22/2012 11:12:28 PM |
GeniuSxBoY Suspended 16786 Posts user info edit post |
Quote : | "you can't repeat the same word when counting " |
If "is" is written 20 times in the document, are we only counting it once or is each "is" an individual word like is1, is2, is3, ... is20
]11/22/2012 11:30:58 PM |
Dentaldamn All American 9974 Posts user info edit post |
50/50 11/22/2012 11:42:45 PM |
ThatGoodLock All American 5697 Posts user info edit post |
^^ the second thing, word 40 can only be used once but if it's "the" it can appear several times 11/23/2012 12:12:06 AM |
modlin All American 2642 Posts user info edit post |
Do the phrases have to make sense? 11/23/2012 7:38:29 AM |
NCSUStinger Duh, Winning 62447 Posts user info edit post |
smath needs to get in on this 11/23/2012 8:00:12 AM |
ThatGoodLock All American 5697 Posts user info edit post |
^^ i have a specific document in mind, but for purposes of this you can imagine that for whatever reason you'd want to combine random words into phrases 11/23/2012 12:01:40 PM |
GrayFox33 TX R. Snake 10566 Posts user info edit post |
What is the prize for this? 11/23/2012 12:03:41 PM |
GeniuSxBoY Suspended 16786 Posts user info edit post |
I think smath just teaches elementary math. 11/23/2012 12:04:18 PM |
JeffreyBSG All American 10165 Posts user info edit post |
Very close 7500!*e, which is about 3.324918748*10^25808, according to Maple.
First consider: how many ways can you make a phrase out of exactly k distinct words? Well, there are
binomial(7500,k)=7500!/((7500-k)!k!) ways of choosing k words. And there are k! ways of arranging each subcollection of k words, so there are
7500!/(7500-k)! phrases you can make out out of exactly k words. The number of words your phrase contains can be any k from k=0..7500, so your answer is
sum( 7500!/(7500-k)!, k=0..7500) = 7500! * sum(1/k!, k=0..7500).
Since sum(1/k!, k=0..infinity)=e, summing up to 7500 will get us pretty darned close to e. So our answer is very close to 7500! * e (although the vastness of the 7500! may make the difference between sum and limit significant; I don't know, offhand.)
oh wait, this is exactly what bbehe said
[Edited on November 23, 2012 at 4:28 PM. Reason : gtjeoi] 11/23/2012 4:24:13 PM |
GrayFox33 TX R. Snake 10566 Posts user info edit post |
Well, he does wear glasses. 11/23/2012 4:59:26 PM |
Arab13 Art Vandelay 45180 Posts user info edit post |
If you want the phrases to make sense then that dramatically reduces the number of combinations. 11/23/2012 7:20:51 PM |
bbehe Burn it all down. 18402 Posts user info edit post |
^^ no pocket protector though. 11/23/2012 7:57:46 PM |
bronco All American 3942 Posts user info edit post |
11/23/2012 8:22:16 PM |
moron All American 34141 Posts user info edit post |
This was easy...
Calculate the closed form. 11/23/2012 9:07:27 PM |
modlin All American 2642 Posts user info edit post |
If words 29 and 378 are both "ass" And words 54 and 5012 are both "deep"
Can we make two valid phrases that are indentical? i.e "ass deep." and "ass deep."? 11/25/2012 11:58:00 AM |
paerabol All American 17118 Posts user info edit post |
^
Quote : | "If I have a document with 7,500 different words and I want to know how many possible "phrases" I can make out of that document, to include single words so at least 7,500 base phrases...how would I got about doing that?
There's no set number of words for each phrase, it could be as short as 1 word (1 base phrase) or as long as 7,500 words long (also 1 phrase)
you can't repeat the same word when counting
you can make a phrase out of words that don't run next to each other (word 1, word 3, word 1114 = 1 phrase) but again, you can't repeat so there's no other combination of those exact 3 words that would amount to another phrase
god i hate math, this isn't for anything school-related, just trying to wrap my head around something..." |
Quote : | "If I have a document with 7,500 different words and I want to know how many possible "phrases" I can make out of that document, to include single words so at least 7,500 base phrases...how would I got about doing that?" |
Quote : | "If I have a document with 7,500 different words" |
Quote : | "7,500 different words" |
11/25/2012 1:45:16 PM |
oneshot 1183 Posts user info edit post |
The answer is and has always been 42. 11/25/2012 2:43:08 PM |
modlin All American 2642 Posts user info edit post |
^^
Quote : | "^^ the second thing, word 40 can only be used once but if it's "the" it can appear several times" |
11/25/2012 5:04:21 PM |
paerabol All American 17118 Posts user info edit post |
Then that would answer modlins question 11/25/2012 5:15:37 PM |
BigEgo Not suspended 24374 Posts user info edit post |
12 11/25/2012 6:01:55 PM |
modlin All American 2642 Posts user info edit post |
That part doesn't, but this:
Quote : | "you can't repeat so there's no other combination of those exact 3 words that would amount to another phrase " |
part does. I shoulda read closer.
So the way I understand it, you can make a three-word phrase of words:
1,2,3 1,2,4 1,2,n 1,2,n+1
as long as words 3,4,n, and n+1 are all unique.
So, you can't just write an equation to add up all the possible phrases you could make. You'd have to enter them all into a database and then have a computer go through and sequentially make each possible combination, and compare them to all previously made phrases to check validity.11/25/2012 6:02:37 PM |
paerabol All American 17118 Posts user info edit post |
I'm sorry for being an asshole modlin. I would that we continue our mutual lack of preconception. 11/25/2012 8:52:26 PM |
ThatGoodLock All American 5697 Posts user info edit post |
holy shit you people are still talking about this...
I was being coy at first because I thought people wouldn't care much but you people really are nerds so I'll explain further and see where it leads me...
So I'm trying to budget for hiring a programmer that can do the following:
Create a website where legislation can be uploaded by the user by pasting the full official text into one "catch all" box (or alternatively, several boxes based on pasting section by section, subsection by subsection, etc... and then labeling it all together in order)
So for example, this is the preamble to the US Constitution (ignore the tabbed format, it was before I realized you could make a database delimited by space alone)
I can write a basic program that can copy/paste this text into excel and then delimit it by space to create
Normally this would be displayed all on row 1 but for ease of explaining, i've fit all the text on the same screen in different rows
Now what I don't know how to do is and want to hire someone to do is then take this dataset and redisplay it on a webpage as "Document 1" (or whatever order it was uploaded, and in reality this would only be part of Document 1, the full Constitution) for the user so that it looks like normal text (as it was originally pasted in full, skip to the last image if you're confused) except that each word is now a selectable object (invisible boxes around each word?)
after it displays what appears to be normal text, the user can then select words on an individual basis to build what I'll term "blobs" (or "phrases" in my original question) like so (ignoring that my picture is not displayed like normal text would and uses the spreadsheet still)
So here, the words "promote the general welfare" which is made up of Objects 26-29, once selected will create Blob 1 (there needs to be a button for "create Blob" since you can select other than all at once) - colors reflect that a blob has been created and different colors represent different blobs
This image shows what would happen if each object just named were its own blob, so Object 26 = Blob 1, Object 27 = Blob 2, etc...
This image shows what would happen if every word was used as part of exactly one blob, so Objects 1-7 = Blob 1, Objects 8-15 = Blob 2, etc... (this is assuming the blobs were created in the same order in which they appear)
This is only one blob. It's just using a ton of word objects from all over the place in order to be created.
Similarly, this is also one blob. It's just that it is using two nonconsecutive groups of consecutive word objects.
Now let's go back to the original blob I showed, where we have created Blob 1, "promote the general welfare". I want everything I just described (the upload of text and display back to the user to select) to occur in Window 1. What I'm going to describe next should occur in Window 2 (its all on the same screen, not separate programs, again skip to the last image real quick).
Every time the "create blob" button is pushed, and after that blob is created, I want Window 2 to display a chat window with a "post comment" box at the bottom of Window 2 where the user can leave a comment at the top of Window 2 that in theory is supposed to be about "promote the general welfare" and no other part. Each blob created has it's own separate chat thread and by clicking from blob to blob in Window 1, you can switch from one conversation to another in Window 2.
This is a display of all the different Windows I've designed, numbered from 1-5. Ignore all the other stuff I haven't talked about yet.
So just based only on the activities that I've described so far, what's a fair flat fee programming price? or a fair estimation of hours at an hourly rate?
ps - im so hopped up on mountain dew right now, this is probably not the clearest explanation at all 11/25/2012 10:16:47 PM |
ThatGoodLock All American 5697 Posts user info edit post |
oh and in case anyone is wondering why the original question mentioned ~ 7,500
[Edited on November 25, 2012 at 10:20 PM. Reason : f] 11/25/2012 10:20:06 PM |
paerabol All American 17118 Posts user info edit post |
man couple that up with an image-to-text converter on the front end for scanned or .pdf text sources and a direct interface to social media and RSS, with a lightweight document editor/exporter on the back and you've got a handy tool there
[Edited on November 25, 2012 at 11:05 PM. Reason : asdf] 11/25/2012 11:03:11 PM |
ThatGoodLock All American 5697 Posts user info edit post |
ive got plans for all that jazz in future iterations, but right now i'm just trying to get away with a minimum viable product 11/26/2012 12:58:04 AM |
moron All American 34141 Posts user info edit post |
That doesn't seem too difficult, and shouldnt take too much time and could be done using existing open source libraries ( functionally just forum software).
Nailing down the GUI is the hard, time consuming part, but there are plenty of existing libraries to handle that type of text selection. 11/26/2012 1:35:27 AM |
ThatGoodLock All American 5697 Posts user info edit post |
so when you say not too difficult can you attach some guess as to cost to hire? or hours to complete? 11/26/2012 1:31:39 PM |
wdprice3 BinaryBuffonary 45912 Posts user info edit post |
working on genassem, eh? 11/26/2012 1:40:30 PM |
ThatGoodLock All American 5697 Posts user info edit post |
you know it!
if anyone wants to contribute to further funding or see a funky video I made http://search.voltcrowd.com/campaign/detail/436 11/26/2012 4:47:54 PM |
David0603 All American 12764 Posts user info edit post |
Can the blobs overlap? Is existing db infrastructure for this already set up? Maybe I'm underestimating "existing open source libraries ( functionally just forum software)" but this seems far from "not too difficult" 11/26/2012 5:41:04 PM |
ThatGoodLock All American 5697 Posts user info edit post |
yes, they can overlap and you can also have blobs within existing blobs or vice versa
nothing is setup for this. again, i'm looking for someone to code just a minimum product I can showoff in order to get further funding - so even if it doesn't actually work in a multi-user environment or even online so it's standalone at first, if it can display the single user experience that i've described above i'll be happy for now 11/26/2012 6:10:31 PM |