From classic RDBMS (MySql) to Hadoop – BI tutorial using Talend and Cloudera

From classic RDBMS (MySql) to Hadoop – BI tutorial using Talend and Cloudera

What products do our customers like to buy?

In this scenario, we will try to answer the question: What products do our customers like to buy? To answer this question, the first thought might be to look at the transaction data, which should indicate what customers actually do buy and like to buy, right?

This is probably something you can do in your regular RDBMS environment, but a benefit with Cloudera’s big data platform ( Hadoop ) using Talend is that you can do it at greater scale at lower cost, on the same system that you may also use for many other types of analysis.

To analyze the transaction data in the new platform, we need to ingest it into the Hadoop Distributed File System (HDFS). We need to find a tool that easily transfers structured data from a RDBMS to HDFS, while preserving structure. That enables us to query the data, but not interfere with or break any regular workload on it.

1st Methode : Import data with a traditional way 😉 copy file by file or table by using Talend tHDSFPut from Mysql to HDFS

this way might be useful if you need to add some transformations before importing into HDFS

Mysql to HDFS
Mysql to HDFS

 

 

 

 

 

 

 

 

Second and a better way using Talend with Sqoop component : 

 

Leave a Reply