hadoop - When to use Hcatalog and what are its benefits -
i'm new hcatlog (hcat), know in usecases/scenario's use hcat, benefits of making use of hcat, there performance improvement can gain hcatlog. can 1 provide information on when use hcatlog
apache hcatalog table , storage management layer hadoop enables users different data processing tools – apache pig, apache map/reduce, , apache hive – more read , write data on grid.
hcatalog creates table abstraction layer on data stored on hdfs cluster. table abstraction layer presents data in familiar relational format , makes easier read , write data using familiar query language concepts.
hcatalog data structures defined using hive's data definition language (ddl) , hive metastore stores hcatalog data structures. using command-line interface (cli), users can create, alter, , drop tables. tables organized databases or placed in default database if none defined table. once tables created, can explore metadata of tables using commands such show table , describe table. hcatalog commands same hive's ddl commands.
hcatalog’s ensures users need not worry or in format data stored. hcatalog displays data rcfile format, text files, or sequence files in tabular view. provides rest apis external systems can access these tables’ metadata.
hcatalog opens hive metadata other map/reduce tools. every map/reduce tools has own notion hdfs data (example pig sees hdfs data set of files, hive sees tables) hcatalog supported map/reduce tools not need care data stored, in format , storage location.
- it assist integration other tools , supplies read , write interfaces pig, hive , map/reduce.
- it provide shared schema , data types hadoop tools.you not have explicitly type data structures in each program.
- it expose information rest interface external data access.
- it integrates sqoop, tool designed transfer data , forth between hadoop , relational databases such sql server , oracle
- it provide apis , webservice wrapper accessing metadata in hive metastore.
- hcatalog exposes rest interface can create custom tools , applications interact hadoop data structures.
this allows use right tool right job. example, can load data hadoop using hcatalog, perform etl on data using pig, , aggregate data using hive. after processing, send data data warehouse housed in sql server using sqoop. can automate process using oozie.
how works:
- pig- hcatloader , hcatstore interface
- map/reduce- hcatinputformat , hcatoutputformat interface
- hive- no interface necessary. direct access metadata
references:
http://hortonworks.com/hadoop/hcatalog/
answer question:
as described earlier hcatalog provides shared schema , data types hadoop tools simplifies work during data processing. if have created table using hcatalog, can directly access hive table through pig or map/reduce (you cannot access hive table through pig or map reduce).you don't need create schema every tool.
if working shared data can used multiple users(some team using hive, team using pig, team using map/reduce) hcatalog useful need table access data processing.
it not replacement of tool facility provide single access many tools.
performance depends on hadoop cluster. should performance benchmarking in hadoop cluster major performance.
Comments
Post a Comment