User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » Hadoop anyone? Page [1]  
neodata686
All American
11577 Posts
user info
edit post

So my company is moving from a Postgres/Greenplum infrastructure to Hadoop over the next 6-8 months. Very excited about this but it's a learning process for everyone in the company.

Saw there's no threads on the topic. Anyone have any exposure to HDFS, Map->reduce, etc? We've chosen Hortonworks as our vendor and I've been going through some of the tutorials on Hive, Pig, etc.

1/17/2014 3:45:41 PM

0EPII1
All American
42525 Posts
user info
edit post

Quite a few MSA graduates here. They have done it in their program.

1/17/2014 3:49:13 PM

neodata686
All American
11577 Posts
user info
edit post

Neato yeah my company provides an analytics software platform with a lot of big hosted data so it's a big leap for us in terms of processing power.

I've been interested in going back to school. Unfortunately I moved to Denver so it would be something locally.

1/17/2014 3:52:12 PM

smoothcrim
Universal Magnetic!
18914 Posts
user info
edit post

you should look into EMR if you want to do this cost effectively. hadoop on your own hardware in this day is hard to justify unless you're doing steady state jobs 24x7. even then, it's still going to be hard without significant scale
http://aws.amazon.com/elasticmapreduce/
http://www.bigdatahpc.com

1/17/2014 4:13:34 PM

neodata686
All American
11577 Posts
user info
edit post

Quote :
"hadoop on your own hardware in this day is hard to justify unless you're doing steady state jobs 24x7."


We already have the hardware infrastructure, nodes, etc because the underlying infrastructure is very similar to Greenplum in regard to nodes, distribution, etc. Not to mention we have many of the top Fortune 50 companies and data integrity and security are always a concern. Much simpler to host our own data or have our clients host it themselves.

1/17/2014 4:20:57 PM

smoothcrim
Universal Magnetic!
18914 Posts
user info
edit post

I was suggesting you host in amazon..

1/17/2014 4:50:49 PM

neodata686
All American
11577 Posts
user info
edit post

That's my point. We already have the infrastructure and it's harder to get contracts where data is hosted elsewhere. From what I understand it makes more sense to host it ourselves.

1/17/2014 6:02:38 PM

Noen
All American
31346 Posts
user info
edit post

^Definitely, and if he WERE going to go with a public cloud infrastructure for hadoop, he would be using http://www.windowsazure.com/en-us/solutions/big-data/ anyway

back to the OP, yes, the product I design (http://blogs.msdn.com/b/visualstudioalm/archive/2013/11/13/announcing-application-insights-preview.aspx) has been building on hadoop. The biggest continual problem is finding the happy medium between data latency and compute costs. We want to deliver as close to realtime data as possible, but that starts costing insane $$$ once you hit certain thresholds.

[Edited on January 17, 2014 at 8:47 PM. Reason : .]

1/17/2014 8:43:50 PM

Tarun
almost
11687 Posts
user info
edit post

i am interested in learning about hadoop too. Any good tutorials/online courses out there? I looked into MSA program a few years ago but cannot afford to go back to school full time

1/21/2014 9:21:45 AM

Tarun
almost
11687 Posts
user info
edit post

I know not a lot of you are in DC area but still posting it if anyone is interested

IBM Big Data Developer Day

https://www-950.ibm.com/events/wwe/grp/grp004.nsf/v17_agenda?openform&seminar=FDDQVFES&locale=en_US

1/27/2014 8:48:09 AM

neodata686
All American
11577 Posts
user info
edit post

^^I've been going through these:

http://hortonworks.com/tutorials/

They're pretty good for a basic understanding of the different components of Hadoop.

1/27/2014 11:01:25 AM

y0willy0
All American
7863 Posts
user info
edit post

i read this as hard poop

1/28/2014 10:59:12 AM

neodata686
All American
11577 Posts
user info
edit post

Update here. So we're advancing with both Cloudera and Hortonworks. Once we complete our Hadoop lake and fully convert our software over to Hadoop we're told we're going to be the largest Hadoop lake that either distributor is helping deploy/support. Pretty neato!

3/20/2015 4:44:57 PM

neodata686
All American
11577 Posts
user info
edit post

In Cloudera training this week! Woohoo.

4/27/2015 12:51:33 PM

CaelNCSU
All American
6883 Posts
user info
edit post

We had a sales pitch/training for Amazon Kinesis/red shift and EMR. They have ways to run Hive and or Pig directly on S3 or Dynamo.

I'm rewriting some of our analytics ranking algorithms and the last step will be to use one of those platforms.

4/30/2015 10:29:09 AM

smoothcrim
Universal Magnetic!
18914 Posts
user info
edit post

kinesis + spark is the new realtime hotness imo
https://spark.apache.org/docs/latest/streaming-kinesis-integration.html

then pass the data to s3 for later redshift ingest or EMR processing

4/30/2015 12:51:17 PM

 Message Boards » Tech Talk » Hadoop anyone? Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.38 - our disclaimer.