User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » Accessing database behind web query? Page [1]  
Shivan Bird
Football time
11094 Posts
user info
edit post

I know some websites like this where you can search for good football data: http://www.pro-football-reference.com/play-index/pgl_finder.cgi

Clearly the web page queries some database when you run that. Is there any other way to grab a bunch of data from that database?

I'm sure that I could write a script in SAS or Excel but that could be long, tedious, and have mistakes.

12/30/2010 11:26:33 AM

EuroTitToss
All American
4790 Posts
user info
edit post

Hmmm. A web page that uses a database backend. What magics is this you speak of?

12/30/2010 11:54:36 AM

Noen
All American
31346 Posts
user info
edit post

Think about what you're asking for. No you cant do this, at least not legally and not for free. You're talking about an enormous amount of data that is the websites entire point of existence.

12/30/2010 12:30:17 PM

El Nachó
special helper
16370 Posts
user info
edit post

Obviously you guys have never seen that movie The Social Network.

This is how facebook got started people!

12/30/2010 12:52:35 PM

FroshKiller
All American
51898 Posts
user info
edit post

Well, really, the trick is establishing a connection to the database, which you're not gonna be able to do unless some newbie hard-coded the credentials in the source of the page there.

I guess you could write a script to spoof form submissions and collect all the results, but my God, just e-mail the webmaster and ask for whatever it is you want.

12/30/2010 1:24:48 PM

synapse
play so hard
60921 Posts
user info
edit post

Quote :
"I'm sure that I could write a script..."


ok george.


also:

Quote :
"However, our company spends a lot of time, effort, and money producing and checking the data we publish, and as such we can not freely give away large amounts of data that we produce."


http://www.sports-reference.com/data_use.shtml

12/30/2010 1:33:00 PM

robster
All American
3545 Posts
user info
edit post

The way to get their data is to find a place where they themselves have dumped it all in html form ...

Chances are that they have organized it into a heirarchy for display that is somewhat similar to the heirarchy used in their database ... so while you cannot query the whole database, you can scrape the entire website (unless they shutdown your IP when they find out what you are doing).

I did this with IMDB a few years back (but never used the data) ... scraping websites is not some new thing. There are plenty of tools out there to help you (DOM is your friend) ... but you'll just have to look for patterns in the html that you can use to parse out the data you need, and then insert that data into your own database.

12/30/2010 1:39:51 PM

Noen
All American
31346 Posts
user info
edit post

^this is fine for personal use, but is going to quickly get you in legal trouble for anything publicly available, commercially or not.

12/30/2010 3:16:59 PM

qntmfred
retired
40439 Posts
user info
edit post

as a person who has also scraped a few websites in my day, i have not heard of anybody actually getting in real legal trouble for this kindof activity, and would be interested if you could cite any examples

12/30/2010 3:22:25 PM

Noen
All American
31346 Posts
user info
edit post

It's data theft. A database is protected intellectual property just like anything else.

There are plenty of legal precidents for this, going back to the beginnings of google maps before they releases a public access API, there were hundreds of cease and desist orders from mashups and commercial sites leveraging their data without permission.

12/30/2010 3:37:10 PM

Shivan Bird
Football time
11094 Posts
user info
edit post

I figured it couldn't be done but it's for personal use and was worth a shot to ask.

Of course I plan to comply with all terms of use.

12/30/2010 4:11:06 PM

qntmfred
retired
40439 Posts
user info
edit post

^^ a c&d is no big deal if you comply with it. i'm talking about a scenario where pro-football-reference.com sues Shivan Bird for $1,000,000

web scraping happens every day, and it would be a pity imo if the legal ramifications were so prevalent that nobody dared try to mash up data from some other site

of course, in situations like this, where the site has made a good faith effort (based on synapse's link) to convince people not to try to hammer their site, and also to provide access to the data on a commercial basis, it would be polite to respect their wishes

iow, i think "quickly get you in legal trouble" is a little more FUD than is realistic

12/30/2010 4:41:46 PM

Ernie
All American
45943 Posts
user info
edit post

Here's a better idea, click the damn about link

http://www.pro-football-reference.com/download/

12/30/2010 4:43:33 PM

robster
All American
3545 Posts
user info
edit post

^ well done ...


and of course, in terms of scraping others data from sites ... you put yourself in front of legal issues if you use it for commercial purposes

Don't do it (or just dont get caught doing it) ... all the same

[Edited on December 30, 2010 at 4:49 PM. Reason : .]

12/30/2010 4:47:45 PM

1985
All American
2175 Posts
user info
edit post

I know some websites like this where you can search for good customer data: http://www.BankOfAmerica.com

Clearly the web page queries some database when you run that. Is there any other way to grab a bunch of data from that database?

I'm sure that I could write a script in SAS or Excel but that could be long, tedious, and have mistakes.

12/30/2010 5:59:17 PM

evan
All American
27701 Posts
user info
edit post

^^

12/30/2010 8:27:28 PM

lewisje
All American
9196 Posts
user info
edit post

^^

12/30/2010 10:03:37 PM

 Message Boards » Tech Talk » Accessing database behind web query? Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.38 - our disclaimer.