Big data | Santhosh's Professional Learning

Tags

Definition
Big Data is a term applied to data sets whose size is beyond the ability of commonly used tools to capture, manage and process the data within a tolerable elapsed time.
But Data-warehouse is a collection of data marts representing historical data from different operations in the company.

Big data came into picture when a data warehouse and data mining tools are not able to process things.

Data collection
It means Big Data is collection of large data in a particular manner
but Data-warehouse collect data from different department of a organization.
However Data-warehouse require efficient managing technique.

Input: Raw data for Big Data (can be anything like web log to data warehouse), Operational data of an organization (structured data should be there)

Processing: Distributed processing (Map and Reduce technique) for Big data where as several BI tools available for Data warehouse.

Output: Meaningful data, visualization of data (Both produces same)

Similarity
Conceptually these are same only at one factor that they collect large amount of information.

Economic (storage) scalability-
since you can use commodity hardware, you’re not limited to the size (and up front cost) of a traditional data warehouse and you can expand incrementally instead of by replacing the entire box with a larger one when it fills up. In case of Hadoop, raw data is loaded directly to low cost commodity servers in distributed manner.

Hadoop is not an Extract-Transform-Load (ETL) tool. It is a platform that supports running ETL processes in parallel. That is, Moving all the big data to one storage area network (SAN) or ETL server becomes infeasible with big data volumes. Even if you can move the data, processing it is slow, limited to SAN bandwidth, and often fails to meet batch processing windows. With Hadoop, raw data is loaded directly to low cost commodity servers one time, and only the higher value refined results are passed to other systems. ETL processing runs in parallel across the entire cluster resulting in much faster operations than can be achieved pulling data from a SAN into a collection of ETL servers

Hadoop retains the raw data so that this can be used for re-processing. But Data warehouse will Extract the required data and stores in specific format which may results in losing of data.

Big data can process a data warehouse but not vice verse (in terms of data)

In short term, Big data and Data-warehouse can be used for same problem based on the situation and data availability. If we require more distributed and in-memory analysis of data, Big data solutions are perfect. Data-warehouse will be suitable solution when the organized data are in place and the most interactive BI tools are required for the analysis.

Reference: http://assets.teradata.com/resourceCenter/downloads/WhitePapers/EB-6448.pdf?processed=1

	Shaheena Sk on AEM 6.2 – Issues and…
	Rani on Dynamic options using Javascri…
	Java 1.8 Features… on Inner classes in Java –…
	Java 1.8 Features… on AEM: Sling API vs JCR API
	Java 1.8 Features… on What is a Bundle in OSGi …

Santhosh's Professional Learning

~ Learn, Unlearn and Share

Tag Archives: Big data

Big data (Hadoop) vs Data Warehouse