New Features in Hadoop 2.0 summary

I live-tweeted yesterday’s New Features in Hadoop 2.0 session at the Toronto Hadoop User Group. I think it’s a pretty good summary. If nothing else, it helped me absorb the information.

Spoiler: The iPad raffle was won by yours truly.

Hadoop 2.0

  • At the Hadoop 2.0 session ‪#TorontoHUG
  • Got here just before they figured out how to get the projector to emerge. It’s shy and hides in the ceiling. ‪#ToHUG‬
  • Also, just after pizza got here. Maybe I shouldn’t have grabbed that emergency bagel. ‪#ToHUG‬
  • Hadoop 0.20 -> Hadoop 0.20.2 -> Hadoop 2.0 ‪#ToHUG‬
  • Hive, which is SQL on Hadoop w/ JDBC drivers, now has Binary, Timestamp data types and bitmap indexes ‪#ToHUG‬
  • Pig gets Javascript UDFs and can be embedded in JS or Python ‪#ToHUG‬
  • Cool, Sqoop getting an IBM DB2 connector is a bullet point in a Cloudera presentation ‪#ToHUG‬

MapReduce v2

  • MapReduce v2=YARN=Yet Another Resource Negotiator #ToHUG‬
  • MRv2 can support other processing frameworks. Eg graph processing, Santa Fe Institute simulations, etc ‪#ToHUG‬
  • MRv2 is not about old API/new API. Unrelated ‪#ToHUG‬
  • MRv2 good for research but not ready for production, even by startup standards ‪#ToHUG‬
  • MRv2 allows Hamster MPI on Hadoop, Hama bulk synchronous processing, Giraph graph processing — none are MapReduce ‪#ToHUG‬
  • In MRv1, JobTracker ran on Master, TaskTrackers ran on child nodes ‪#ToHUG‬
  • In MRv1, JobTracker managed resouurces, scheduled, monitored
  • Hadoop 2.0 can run either MRv1 or MRv2, but v2 not recommended for production ‪#ToHUG‬
  • MRv2 has 1 Resource Manager, many Node Managers instead ‪#ToHUG‬
  • But unlike JT, RM not a single point of failure because it now delegates App Managers to Node Managers per job ‪#ToHUG‬
  • Resource management still central, but job management now decentralized ‪#ToHUG‬
  • App Manager is like a library injected by RM into Node Managers ‪#ToHUG‬
  • RM can die with low risk of losing jobs ‪#ToHUG‬
  • App Master manages app lifecycle, negotiates resource containers with Resource Manager, monitors tasks on other nodes ‪#ToHUG‬
  • In principle, no issue with running MRv2 on either HDFS or GPFS ‪#ToHUG‬

HDFS Federation

  • The old Secondary NameNode is a terrible misnomer — not a backup NN ‪#ToHUG‬
  • NameNode keeps track of all the data on all the DataNodes ‪#ToHUG‬
  • HDFS Federation now allows for multiple NameNodes ‪#ToHUG‬
  • In federation, each NN manages a namespace volume ‪#ToHUG‬
  • HDFS Federation is not High Availability, is not Disaster Recovery ‪#ToHUG‬
  • NNs in Federation do not communicate ‪#ToHUG‬
  • Federation improves scalability, perf, isolation ‪#ToHUG‬
  • A fed namespace volume consists of metadata, block pool (corresponding to files)‪#ToHUG‬
  • All Data Nodes are used by all the federated NNs ‪#ToHUG‬

HDFS High Availability

  • NameNode High Availability is a new feature different from NN Fed ‪#ToHUG‬
  • Two NNs: one active, one standy. Standby takes over on failure. Fencing to prevent split brain by killing old one on takeover. ‪#ToHUG‬
  • Non-HA NN can fail via crash or planned maintenance. ‪#ToHUG‬
  • Clients and DNs only talk to active NN. Standy maintains a copy of active’s state. Purpose of NN unchanged. ‪#ToHUG‬
  • Active NN writes state to shared filesystem — NFS. ‪#ToHUG‬
  • HDFS fences to kill splitbrain via SSH or shell scripts. ‪#ToHUG‬
  • NFS is the new single point of failure — yay! But NFS has proven HA solutions as well. ‪#ToHUG‬
  • Failover can be auto or manual. Use hdfs haadmiin -failover nn01 nn02 command ‪#ToHUG‬
  • HA is in Hadoop 2.0 regardless of MRv1 or v2. ‪#ToHUG‬
  • NameNodes runs on your beefiest machine. Upwards of 16gb of ram typical. Limit is JVM memory management, Java garbage collector. ‪#ToHUG‬
  • OMGWTFBBQ I just won the iPad3 raffle ‪#ToHUG‬
  • The IBM guy got it. “I swear they don’t pay me very much”. Heh. ‪#embarrassed‬ ‪#woohoo‬ ‪#ToHUG‬

HBase

  • HBase is a distributed, versioned, column-oriented, denormalized database ‪#ToHUG‬
  • Horizontally scalable for fast random r/w. HBase is backend for FB Messages. Either source or sink for Hadoop jobs. ‪#ToHUG‬
  • HBase is good for locality when used with Hadoop because data is stored near where it’s processed. ‪#ToHUG‬
  • HB table consists of regions which consist of 1+ column family. Regions are the storage unit. Really good for sparse dbs. ‪#ToHUG‬
  • Sparse=lots of empty fields=columns mostly empty=varying numbers of columns. ‪#ToHUG‬
  • @ianhakes Ha! How about I drop the iPad off in your old office (aka mine)? ;-) ‪#ToHUG‬
  • 1 column family is stored as 1 HFile. ‪#ToHUG‬
  • HBase CRUD=Put Get Scan Delete ‪#ToHUG‬
  • Google BigTable uses com-google, com-google-images, com-ibm, etc as keys for efficient scantables. Similar to what you want in Hbase ‪#ToHUG‬
  • HMaster talks to HRegionServers contains HLog and HRegion contains MemStore contains HFile talks to DFS Client talks to DataNodes ‪#ToHUG‬
  • ZooKeeper(s) stores config, logs, determines HMaster for HMaster, clients ‪#ToHUG‬
  • HBase compaction=force write to disk. Relates to locality. ‪#ToHUG‬
  • If you’re smart, don’t integrate HBase and Hive. Hive is map-reduce SQL job, HBase is a database. Hive best for regular HDFS data ‪#ToHUG‬
  • HBase replication assumes column family exists in both clusters. No config required in child cluster. ‪#ToHUG‬
  • BTW, I’m subbing adjective “child” for adj “slave” as a stylistic preference. ‪#ToHUG‬
  • ZooKeeper quorum doesn’t have to be the same for HB replication. ZK quorum is an odd number of ZKs that agree on something. ‪#ToHUG‬
  • HBase Coprocessor Observers=database triggers ‪#ToHUG‬
  • HBase Co-pro Endpoints=stored procedures. In Java. Custom RPC protocol. Invoked by client on row or rowset. ‪#ToHUG‬
  • HBase security has auth per table, per column family, per column qualifier. Stored in_acl_ table. ‪#ToHUG‬