Introduction to Apache Hadoop

Roman Shaposhnik , LinuxFoundationX

Unlock the power of big data with an overview of Apache Hadoop and get hands-on practice setting up your own Hadoop instance.

Everywhere you look today, enterprises are embracing big data-driven customer relationships and building innovative solutions based on insights gained from data. According to IBM, every day we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals, just to name a few. This data is big data.

The demand for storing this unprecedented amount of information is enough of a challenge, but when you add the need for analytics, the technology requirements truly start pushing the envelope on state-of-the-art IT infrastructures. Fortunately, the Open Source community has stepped up to this challenge and developed a storage and processing layer called Apache Hadoop. Add the dozens of other projects integrating with Apache Hadoop and you have the whole Hadoop ecosystem.

The Hadoop ecosystem, along with the data management architectures it enables, is growing at an unprecedented rate, with 73% of Hadoop cluster deployments now in production — a number which continues to rise.

The demand for individuals who have experience managing this platform is also accelerating. According to the IT Skills and Certifications Pay Index research from Foote Partners, “the need for big data skills also continues to lead to pay increases — about 8% over the last year.” Now is exactly the right time to build an exciting and rewarding career managing big data with Apache Hadoop.

This introductory course is taught by Hadoop experts from The Linux Foundation’s ODPi collaborative project. As host to some of the world's leading open source projects, The Linux Foundation provides training and networking opportunities to help you advance your career.

This course is perfect for IT professionals seeking a high-level overview of Hadoop, and who want to find out if a Hadoop-driven big data strategy is the right solution to meet their data retention and analytics needs. This course will also help anyone who wants to set up a small-scale Hadoop test environment to gain experience working with this exciting open source technology.

What will you learn

  • The origins of Apache Hadoop and its big data ecosystem
  • Deploying Hadoop in a clustered environment of a modern day enterprise IT
  • Building data lake management architectures around Apache Hadoop
  • Leveraging the YARN framework to effectively enable heterogeneous  analytical workloads on Hadoop clusters
  • Leveraging Apache Hive for an SQL-centric view into the enterprise data lake
  • An introduction to managing key Hadoop components (HDFS, YARN and Hive) from the command line
  • Securing and scaling your data lakes in multi-tenant enterprise environments

Dates:
  • 8 June 2017, 15 weeks
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb

Reviews

No reviews yet. Want to be the first?

Register to leave a review

Show?id=n3eliycplgk&bids=695438
NVIDIA
More on this topic:
215838_6a46_2 Big Data And HBase Essential Training by QScutter Tutorials
HBase Shell, API, Data Models, Table Designs, Distributed, Replication And...
Cat Learning Apache Hadoop
In this Introduction to Hadoop training course, expert author Rich Morrow will...
274434_7711_4 Big Data & Hadoop Fundamentals by EduCBA IT Academy
Extremely useful basic concepts on Big Data and Apache Hadoop for beginners...
331376_c273 Certified Big Data & Hadoop Developer Training - Udemy
30 PDUs Offered, 60 Hrs of Real Time Industry based Projects, 2 Big Data &amp...
409724_a161 Introduction to Big Data & Hadoop - Udemy
A brief overview to BigData and BigData systems like Hadoop We bring you the...
More from 'Computer Science':
Maxresdefault CS 282: Principles of Operating Systems II: Systems Programming for Android
Developing high quality distributed systems software is hard; developing high...
Banner_ruby Ruby on Rails Tutorial: Learn From Scratch
This post is part of our “Getting Started” series of free text tutorials on...
Logo-30-128x128 NYU Course on Deep Learning (Spring 2014)
Lectures from the NYU Course on Deep Learning (Spring 2014) This is a graduate...
Cppgm C++ Grandmaster Certification
The C++ Grandmaster Certification is an online course in which participants...
Umnchem Computational Chemistry (CHEM 4021/8021)
Modern theoretical methods used in study of molecular structure, bonding, and...
More from 'edX':
6e8a49e3-e74b-4a74-81b7-ebaf9c82c620-e20771d7a2a2.small Derivatives Markets: Advanced Modeling and Strategies
Financial derivatives are ubiquitous in global capital markets. Students will...
H20_new_262x136 CTB3365x: Introduction to Water Treatment
Learn about urban water services, focusing on basic drinking water and wastewater...
Solar-energy_262x136 ET3034TUx: Solar Energy
Discover the power of solar energy and design a complete photovoltaic system...
Edx_262x136 edXDEMO101: edX Demo
A fun and interactive course designed to help you explore the edX learning experience...
Bio-465x_262x136 BIO465X: Neuronal Dynamics - Computational Neuroscience of Single Neurons
The activity of neurons in the brain and the code used by these neurons is described...

© 2013-2019