TWW - Finding Key Words in Documents

go to bottom

Message Boards » » Finding Key Words in Documents

Page [1]

tnezami
All American
8972 Posts
user info
edit post

I have a list of about 30 words that I need to count the frequency of in about 50 different documents.

I'm sure I'll have to go one document at a time, but is there a faster way than CTRL+F EACH WORD one at a time?

Can I put the entire list of these 30 keywords in and have WORD or some free downloadable program find them in the document?

4/1/2007 3:28:07 PM

drunknloaded
Suspended
147487 Posts
user info
edit post

i dont know if this will help but one time i downloaded google desktop

like a few days later when i serached something on google i noticed it had like 3 of my documents listed with the keywords from my google search...so like couldnt you get google desktop, serach for the terms and it would bring up which documents have said words?

[Edited on April 1, 2007 at 3:32 PM. Reason : \/ prolly not the best one though ]

4/1/2007 3:30:36 PM

tnezami
All American
8972 Posts
user info
edit post

hmm...that's an option...

4/1/2007 3:31:26 PM

benz240
All American
4476 Posts
user info
edit post

copernic desktop search has a much more powerful client

4/1/2007 3:33:38 PM

tnezami
All American
8972 Posts
user info
edit post

is it relatively easy to use/set up for this type of thing?

4/1/2007 3:34:19 PM

benz240
All American
4476 Posts
user info
edit post

it will work, just might get pretty tedious if you are talking about hundreds of instances of the word. Basically the copernic client (i just tried this out) will allow you to search for instances of a word (or words) within certain types of files within one folder, for instance. then when you click on one of the results, it will display the contents in the preview pane below, with a button for each search term...click on that button and it will cycle through all the instances in that particular doc. It doesn't have a function (that im aware of) to summarize the findings...so basically this will be a little easier than Ctrl+F.

4/1/2007 3:44:32 PM

tnezami
All American
8972 Posts
user info
edit post

sweeet...i just downloaded it...~~now I just need to figure out how to get it to search within a specific folder~~

[Edited on April 1, 2007 at 3:49 PM. Reason : got it.]

4/1/2007 3:45:25 PM

CSC4EVER
Starting Lineup
63 Posts
user info
edit post

Why not just write a perl script to parse the documents, and to create hashes of each one incrementing the number of times the words occur for each document?

4/1/2007 4:22:32 PM

Perlith
All American
7620 Posts
user info
edit post

^
Overengineering the problem. And, perl isn't something that could be passed on easily from Person A to Person B. (Not saying not a good solution, but may not be the right solution).

If this is something you'll be doing frequently, I would encourage you to find a solution that indexes and updates what you are looking for automatically. Not sure what's out there, but think in the long-term if you are going to do this more than once.

4/1/2007 5:15:59 PM

CSC4EVER
Starting Lineup
63 Posts
user info
edit post

A program that does something similar could be whipped up fairly quickly in java, and a swing gui slapped on it, and it would be just as portable. If he wants something to do the results fairly quickly, writing a perl script would take probably no more than 10-15 lines, maybe even less to accomplish. I don't necessarily consider that over engineering a problem.

4/1/2007 5:19:06 PM

CSC4EVER
Starting Lineup
63 Posts
user info
edit post

Hmm, here is some code I whipped up real quick in perl. It sorta does what I think you would be looking for. With some modification you should be able to get your results fairly quickly...

@files = <*.txt>;
foreach $file (@files)
{
open myfile,$file;
print $file."\n--------------------------\n";
while ($line = <myfile>
{
chomp $line;
@words = split / /,$line;
for ($i = 0;$i<scalar(@words);$i++)
{
$words{$words[$i]}++;
}

}
@words = keys %words;
@words = sort @words;
foreach $word (@words)
{
print $word." ".$words{$word}."\n";
}
print "\n";
}

4/1/2007 5:31:48 PM

WolfAce
All American
6458 Posts
user info
edit post

Haha, stupid tdub putting a smiley face in teh perl

Quote :
"while ($line = <myfile>"

And yeah the first thing I thought of when I read the first post was perl script, but really you could do one nearly as easily in any language.

[Edited on April 1, 2007 at 7:45 PM. Reason : ]

4/1/2007 7:43:19 PM

FenderFreek
All American
2805 Posts
user info
edit post

Seems we all think alike. I had something like ^^ in mind as well, so I'd go that route. It's quick, easy, and cross-platform. (Though I'm no Perl god, so mine wouldn't have been as neat and concise.)

Can't go wrong with some Perl, B.

4/1/2007 8:25:37 PM

Message Boards » Tech Talk » Finding Key Words in Documents

Page [1]

go to top

Admin Options : move topic | lock topic