Data Mart and Data Warehouse

As I mentioned in the previous post, Data warehouses are for data analysis, which is becoming a hot topic these days. So, in this post, let me explain about Data Marts which are part of data warehouse.

A data warehouse can have many data marts, and each data mart can answer one general analytics question. Data from various sources are cleaned, and prepared in well-structured data warehouse. However, these cleaned yet raw data cannot explain the business questions. Therefore, transforming data into information is required and this is where raw data from data warehouse is converted into meaningful information which will be stored in the data mart.

Usually, each data mart has one data schema. In each schema, there is one fact table and multiple dimension tables. The fact table contains the measures for business question and the dimension tables are the dimensions for the measure from the fact table.

There are 2 types of data schemas: Star schema and Snowflake Schema.

Star Schema
Screen Shot 2015-01-26 at 7.20.06 pm

Star schema is simple. so it is fast in loading data. However, it is impossible to use hierarchical dimensions. Since the hierarchy is not allowed, the data contains duplicates.

For example, compare these 2 data of sales at a departmental store.
order001 | $23.9 | Branch A | New York | USA
order002 | $99.99 | Branch B | New York | USA

The location data is duplicated. If there are a lot of branches in New York, and as well as in the whole USA, there will be a lot of New Yorks… and USA, which takes a certain amount of space in each data line.

Snowflake Schema

The snowflake schema is like its name says, it is in snowflake shape with branches for the hierarchies.

snowflake

It basically saves the space of data like New York and USA in the previous example. So, it will save space. However, it has primary keys to link to hierarchical dimensions. This will delays the response time because of joints but there are less data to go through.

So, which schema to use and which is better?

My only ansewer is depends.

it is based on the business needs and nature of your business question that this fact table has to answer. So, if the requirement is only the general questions, star scheme is a better choice but again it just depends on how you will use this data mart.

Hope this post explains you well. If you have any question, please email me or comment below.

 

 

Data Warehouse

What is Data warehouse?

Data warehouse

is a long term storage of data, which is ready for business analysis(BI).

Actually, data warehousing is the basic of Business Analysis, since it has done the low level tasks such as data storing, managing, organizing and maintaining. The data in the data warehouse is ready to use in business Analysis and get meaningful trends to improve the business.

Basically, data warehouse is like a warehouse that stores all the enterprise data systematically so that it is easier in retrieval. Data warehouse is different from the databases used for regular daily transactions of a business process. It is at the enterprise level that integrates data from different data sources: can be structured databases of different applications, or unstructured like social media sources. Data warehouses handles a huge amount of data from long term while regular application databases clears in every 6 months or 2 years.

Data warehouses are used for analysis purpose, so the structure and design is very important in maintaining the large amount of data.

Indexing in Data Warehouse

Indexing is almost a must in regular app databases because of its ability to improve the performance of the system. However, in Data Warehousing, whether we should use indexing or not and when to use are very difficult and dubious questions. Unlike the day-to-day recording databases, regular update is not necessary, joining is not required. In such cases, indexing can hinder the process because of loading the unnecessary indexing process to a large amount of data. So it depends on the business requirement where to use indexing and where not. Therefore, heavily indexing like in application databases is something to avoid in data warehousing.

Generally, Data warehouses are structured in relational design.