Shivan Bird Football time 11094 Posts user info edit post |
I know some websites like this where you can search for good football data: http://www.pro-football-reference.com/play-index/pgl_finder.cgi
Clearly the web page queries some database when you run that. Is there any other way to grab a bunch of data from that database?
I'm sure that I could write a script in SAS or Excel but that could be long, tedious, and have mistakes. 12/30/2010 11:26:33 AM |
EuroTitToss All American 4790 Posts user info edit post |
Hmmm. A web page that uses a database backend. What magics is this you speak of? 12/30/2010 11:54:36 AM |
Noen All American 31346 Posts user info edit post |
Think about what you're asking for. No you cant do this, at least not legally and not for free. You're talking about an enormous amount of data that is the websites entire point of existence. 12/30/2010 12:30:17 PM |
El Nachó special helper 16370 Posts user info edit post |
Obviously you guys have never seen that movie The Social Network.
This is how facebook got started people! 12/30/2010 12:52:35 PM |
FroshKiller All American 51911 Posts user info edit post |
Well, really, the trick is establishing a connection to the database, which you're not gonna be able to do unless some newbie hard-coded the credentials in the source of the page there.
I guess you could write a script to spoof form submissions and collect all the results, but my God, just e-mail the webmaster and ask for whatever it is you want. 12/30/2010 1:24:48 PM |
synapse play so hard 60939 Posts user info edit post |
Quote : | "I'm sure that I could write a script..." |
ok george.
also:
Quote : | "However, our company spends a lot of time, effort, and money producing and checking the data we publish, and as such we can not freely give away large amounts of data that we produce." |
http://www.sports-reference.com/data_use.shtml]12/30/2010 1:33:00 PM |
robster All American 3545 Posts user info edit post |
The way to get their data is to find a place where they themselves have dumped it all in html form ...
Chances are that they have organized it into a heirarchy for display that is somewhat similar to the heirarchy used in their database ... so while you cannot query the whole database, you can scrape the entire website (unless they shutdown your IP when they find out what you are doing).
I did this with IMDB a few years back (but never used the data) ... scraping websites is not some new thing. There are plenty of tools out there to help you (DOM is your friend) ... but you'll just have to look for patterns in the html that you can use to parse out the data you need, and then insert that data into your own database. 12/30/2010 1:39:51 PM |
Noen All American 31346 Posts user info edit post |
^this is fine for personal use, but is going to quickly get you in legal trouble for anything publicly available, commercially or not. 12/30/2010 3:16:59 PM |
qntmfred retired 40726 Posts user info edit post |
as a person who has also scraped a few websites in my day, i have not heard of anybody actually getting in real legal trouble for this kindof activity, and would be interested if you could cite any examples 12/30/2010 3:22:25 PM |
Noen All American 31346 Posts user info edit post |
It's data theft. A database is protected intellectual property just like anything else.
There are plenty of legal precidents for this, going back to the beginnings of google maps before they releases a public access API, there were hundreds of cease and desist orders from mashups and commercial sites leveraging their data without permission. 12/30/2010 3:37:10 PM |
Shivan Bird Football time 11094 Posts user info edit post |
I figured it couldn't be done but it's for personal use and was worth a shot to ask.
Of course I plan to comply with all terms of use. 12/30/2010 4:11:06 PM |
qntmfred retired 40726 Posts user info edit post |
^^ a c&d is no big deal if you comply with it. i'm talking about a scenario where pro-football-reference.com sues Shivan Bird for $1,000,000
web scraping happens every day, and it would be a pity imo if the legal ramifications were so prevalent that nobody dared try to mash up data from some other site
of course, in situations like this, where the site has made a good faith effort (based on synapse's link) to convince people not to try to hammer their site, and also to provide access to the data on a commercial basis, it would be polite to respect their wishes
iow, i think "quickly get you in legal trouble" is a little more FUD than is realistic] 12/30/2010 4:41:46 PM |
Ernie All American 45943 Posts user info edit post |
Here's a better idea, click the damn about link
http://www.pro-football-reference.com/download/ 12/30/2010 4:43:33 PM |
robster All American 3545 Posts user info edit post |
^ well done ...
and of course, in terms of scraping others data from sites ... you put yourself in front of legal issues if you use it for commercial purposes
Don't do it (or just dont get caught doing it) ... all the same
[Edited on December 30, 2010 at 4:49 PM. Reason : .] 12/30/2010 4:47:45 PM |
1985 All American 2175 Posts user info edit post |
I know some websites like this where you can search for good customer data: http://www.BankOfAmerica.com
Clearly the web page queries some database when you run that. Is there any other way to grab a bunch of data from that database?
I'm sure that I could write a script in SAS or Excel but that could be long, tedious, and have mistakes. 12/30/2010 5:59:17 PM |
evan All American 27701 Posts user info edit post |
^^ 12/30/2010 8:27:28 PM |
lewisje All American 9196 Posts user info edit post |
^^
12/30/2010 10:03:37 PM |