Tuesday, February 28, 2023

9 Types of Dimensions with examples in Datawarehouse.

 There are mainly 9 types of dimensions as follows:

  1. Conformed Dimension: A conformed dimension is a dimension that is consistent and identical across multiple data marts or data warehouses. It is used to ensure that the same dimension is used across different data marts or data warehouses, ensuring consistency in the data. Examples of conformed dimensions include time, geography, and product dimensions.

  2. Outrigger Dimension: An outrigger dimension is a dimension table that is attached to another dimension table to provide additional details about a specific attribute in the parent dimension. For example, a customer dimension may have an outrigger dimension that provides additional information about the customer's address.

  3. Shrunken Dimension: A shrunken dimension is a subset of a larger dimension that is used in a specific part of the data warehouse. For example, a time dimension may be shrunk to only include the year and quarter attributes when used in a sales data mart.

  4. Role-Playing Dimension: A role-playing dimension is a dimension that is used in multiple ways within a single fact table. For example, a date dimension may be used to represent both an order date and a ship date within a sales fact table.

  5. Dimension-to-Dimension Table: A dimension-to-dimension table is a table that contains the relationships between two dimensions. For example, a product hierarchy table may contain the relationships between different product categories and subcategories.

  6. Junk Dimension: A junk dimension is a dimension that is created to store low-cardinality attributes that do not fit well into any other dimension. For example, a "promotion flag" dimension may be created to store the different types of promotions that apply to sales transactions.

  7. Degenerate Dimension: A degenerate dimension is a dimension that is derived from a fact table and does not have a separate dimension table. For example, an order number may be used as a degenerate dimension in a sales fact table.

  8. Swappable Dimension: A swappable dimension is a dimension that can be swapped with another dimension without affecting the fact table. For example, a customer dimension and a product dimension may be swappable within a sales fact table.

  9. Step Dimension: A step dimension is a dimension that is used to represent a process or workflow. For example, a manufacturing process may be represented as a step dimension in a manufacturing data warehouse.

Note: These types of dimensions are not mutually exclusive, and a single dimension can exhibit characteristics of multiple types of dimensions.

Sunday, February 26, 2023

Performance tuning of SQL

How to perform performance tuning of SQL?

 Performance tuning of SQL is the process of optimizing the performance of SQL queries to improve the speed of retrieving and manipulating data from a database. The goal of performance tuning is to reduce the execution time of SQL queries while minimizing the use of system resources such as CPU, memory, and disk I/O.

Here are some best practices for performance tuning of SQL:

  1. Optimize database design: A well-designed database can improve query performance by reducing the amount of data that needs to be scanned.

  2. Use indexes wisely: Indexes can speed up query performance by allowing the database engine to locate data more quickly. However, too many indexes can slow down data modifications.

  3. Use the appropriate data types: Using appropriate data types can improve query performance by reducing storage requirements and I/O operations.

  4. Minimize data retrieval: Retrieving only the necessary data can improve query performance by reducing the amount of data transferred over the network.

  5. Use efficient SQL statements: Efficient SQL statements can improve query performance by reducing the number of times the database needs to access data.

  6. Optimize SQL queries: SQL queries can be optimized by using appropriate joins, reducing subqueries, and grouping data efficiently.

  7. Monitor database performance: Regular monitoring of database performance can identify performance issues and help in tuning SQL queries.

  8. Use query optimization tools: Query optimization tools can help identify inefficient queries and suggest improvements.

By following these best practices, it is possible to significantly improve the performance of SQL queries and optimize the performance of a database.

Here are some examples of SQL performance tuning techniques:

Use Indexes: Indexes are used to improve the performance of SQL queries. They help to quickly locate the rows that match the query conditions. Indexes can be created on columns that are frequently used in WHERE, JOIN, or ORDER BY clauses.

For example, if you have a table called "users" with columns "id", "name", and "email", and you frequently search for users by their name, you can create an index on the "name" column like this:

CREATE INDEX idx_name ON users (name); Optimize Query Design: The design of SQL queries can have a significant impact on their performance. Complex queries with multiple joins and subqueries can be slow to execute. You can optimize your queries by:

  • Using simple join conditions
  • Using EXISTS instead of IN for subqueries
  • Using UNION ALL instead of UNION to avoid duplicate elimination
  • Avoiding the use of wildcard characters at the beginning of LIKE clauses
For example, instead of using a subquery in the WHERE clause like this:

SELECT *
FROM orders WHERE customer_id IN ( SELECT id FROM customers WHERE country = 'USA' );
You can rewrite it using EXISTS like this: SELECT * FROM orders o WHERE EXISTS ( SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.country = 'USA' );
Use Stored Procedures: Stored procedures are precompiled SQL statements that are stored in the database. They can be executed repeatedly without the need to recompile the SQL code each time. Stored procedures can improve the performance of database operations by reducing the amount of data transferred between the database server and client.
For example, you can create a stored procedure to insert a new record into the "users" table like this:
CREATE PROCEDURE insert_user @name varchar(50), @email varchar(50) AS BEGIN INSERT INTO users (name, email) VALUES (@name, @email); END
Optimize Database Configuration: The configuration settings of the database server can also affect its performance. You can optimize the configuration by:
  • Allocating enough memory for the database server
  • Configuring the database cache size
  • Adjusting the database server parameters, such as max_connections and max_allowed_packet.
For example, you can set the database cache size to 2GB like this:
SET GLOBAL innodb_buffer_pool_size = 2G;
These are just a few examples of SQL performance tuning techniques. There are many other techniques that you can use depending on your specific database system and workload.

Structured Query Language(SQL -Basics)

 What is SQL?

SQL (Structured Query Language) is a standard programming language used to manage and manipulate relational databases. Here are some basic concepts and commands of SQL:

  • Databases and Tables: A database is a collection of tables, and each table consists of columns and rows.
  • Data Types: Each column in a table has a data type, which defines the type of data that can be stored in that column. Common data types include integers, strings, dates, and booleans.
  • SELECT statement: The SELECT statement is used to query data from a database. It allows you to specify which columns you want to retrieve from a table and can be combined with other statements like WHERE, GROUP BY, and ORDER BY to filter, aggregate, and sort the data.
  • INSERT statement: The INSERT statement is used to add new data to a table. It specifies which columns you want to add data to and the values to be inserted.
  • UPDATE statement: The UPDATE statement is used to modify existing data in a table. It specifies which columns you want to update and the new values to be set.
  • DELETE statement: The DELETE statement is used to remove data from a table. It specifies which rows you want to delete.
  • JOINs: JOINs are used to combine data from multiple tables into a single result set based on a common column. There are several types of JOINs, including INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
  • GROUP BY statement: The GROUP BY statement is used to group data in a table based on one or more columns. It is often used with aggregate functions like SUM, COUNT, AVG, and MAX to calculate summary statistics for each group.
  • ORDER BY statement: The ORDER BY statement is used to sort the result set based on one or more columns. It can sort the data in ascending or descending order.

These are some of the basic concepts and commands of SQL. With these commands, you can retrieve, insert, update, and delete data from a database, as well as join tables and calculate summary statistics.


SQL Basic Syntax:

SQL commands are written as statements that start with a keyword like SELECT, INSERT, UPDATE, or DELETE, followed by one or more clauses that define the action to be taken. For example:

sql:

SELECT column_name1, column_name2

FROM table_name

WHERE condition;

This statement selects specific columns from a table based on a given condition.

Creating and modifying tables:

To create a table, use the CREATE TABLE statement followed by the table name and column definitions, like so:

sql:

CREATE TABLE table_name (

    column1 datatype,

    column2 datatype,

    ...

);

To add/remove a column to/from an existing table, use the ALTER TABLE statement:

sql:

ALTER TABLE table_name

ADD/DROP column_name datatype;

Retrieving data:

The SELECT statement is used to retrieve data from a database. It can be used with various clauses, including WHERE, ORDER BY, GROUP BY, and JOIN. For example:

sql:

SELECT column_name1, column_name2

FROM table_name

WHERE condition

ORDER BY column_name ASC/DESC;

This statement selects specific columns from a table based on a given condition and orders them by a specified column in ascending or descending order.


Updating data:

The UPDATE statement is used to modify data in a table. For example:

sql:

UPDATE table_name

SET column_name = value

WHERE condition;

This statement updates the value of a specified column in a table based on a given condition.


Deleting data:

The DELETE statement is used to remove data from a table. For example:

sql:

DELETE FROM table_name

WHERE condition;

This statement removes rows from a table based on a given condition.


These are just the basics of SQL, but it should give you a good foundation to start exploring the language and building your own queries.

Star and Snowflake Schema

 What is a Star Schema?

Star Schema in data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. This Schema is well suited for OLAP Systems


In the following Star Schema example, the fact table is at the center which contains keys to every dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID & other attributes like Units sold and revenue.


Example of Star Schema
Example of Star Schema Diagram



What is a Snowflake Schema?

Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables. This Schema is well suited for OLTP systems.


.

In the following Snowflake Schema example, Country is further normalized into an individual table.


Example of Snowflake Schema

Example of Snowflake Schema


Difference between Star Schema and Snowflake Schema

Following is a key difference between Snowflake schema vs Star schema:

Star SchemaSnowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.One fact table surrounded by dimension table which are in turn surrounded by dimension table.
In a star schema, only single join creates the relationship between the fact table and any dimension tables.A snowflake schema requires many joins to fetch the data.
Simple DB Design.Very Complex DB Design.
Denormalized Data structure and query also run faster.Normalized Data Structure.
High level of Data redundancyVery low-level data redundancy
Single Dimension table contains aggregated data.Data Split into different Dimension Tables.
Cube processing is faster.Cube processing might be slow because of the complex join.
Offers higher performing queries using Star Join Query Optimization.
Tables may be connected with multiple dimensions.
The Snowflake schema is represented by centralized fact table which unlikely connected with multiple dimensions.

Dimensional Modeling and Elements of Dimensional Data Model

 

What is Dimension Modelling?

Dimensional Modeling (DM) is a data structure technique optimized for data storage in a Data warehouse. The Purpose is to optimize the database for faster retrieval of data. The concept of Dimensional Modelling was developed by Ralph Kimball and consists of “fact” and “dimension” tables.


Elements of Dimensional Data Model

Fact

Facts are the measurements/metrics or facts from your business process. For a sales business process, a measurement would be quarterly sales number.

There are three types of facts.

  • Additive --total sales of a Company.
  • Non-additive --averages and percentages and ratios.
  • Semi- additive --the number of items in the warehouse for each day.


Dimension

Dimension provides the context surrounding a business process event. In simple terms, they give who, what, where of a fact. In the Sales business process, for the fact quarterly sales number, dimensions would be.


  • Who – Customer Names
  • Where – Location
  • What – Product Names

In other words, a dimension is a window to view information in the facts.

Types of Dimensions are. 

  • Conformed, 
  • Outrigger, 
  • Shrunken, 
  • Role-playing, 
  • Dimension to Dimension Table, 
  • Junk, 
  • Degenerate, 
  • Swappable and 
  • Step Dimensions.


Attributes

The Attributes are the various characteristics of the dimension in dimensional data modeling.

In the Location dimension, the attributes can be.

  • State
  • Country
  • Zip Code etc.

Attributes are used to search, filter, or classify facts. Dimension Tables contain Attributes.


Fact Table

A fact table is a primary table in dimension modelling.


A Fact Table contains.

  1. Measurements/facts
  2. Foreign key to dimension table

Dimension Table

  • A dimension table contains dimensions of a fact.
  • They are joined to fact table via a foreign key.
  • Dimension tables are de-normalized tables.
  • The Dimension Attributes are the various columns in a dimension table.
  • Dimensions offers descriptive characteristics of the facts with the help of their attributes.
  • No set limit set for given for number of dimensions.
  • The dimension can also contain one or more hierarchical relationships.



Distinguish between OLAP vs OLTP

 

What is OLAP?

Online Analytical Processing, a category of software tools which provide analysis of data for business decisions. OLAP Systems allow users to analyze database information from multiple database systems at one time.


Note: The primary objective is data analysis and not data processing.


What is OLTP?

Online transaction processing shortly known as OLTP supports transaction-oriented applications in a 3-tier architecture. OLTP administers day to day transaction of an organization.


Note: The primary objective is data processing and not data analysis.


The Key difference between OLTP AND OLAP are as follows.


ParametersOLTPOLAP
ProcessIt is an online transactional system. It manages database modification.OLAP is an online analysis and data retrieving process.
CharacteristicIt is characterized by large numbers of short online transactions.It is characterized by a large volume of data.
FunctionalityOLTP is an online database modifying system.OLAP is an online database query management system.
MethodOLTP uses traditional DBMS.OLAP uses the data warehouse.
QueryInsert, Update, and Delete information from the database.Mostly select operations.
TableTables in OLTP database are normalized.Tables in OLAP database are not normalized.
SourceOLTP and its transactions are the sources of data.Different OLTP databases become the source of data for OLAP.
Data IntegrityOLTP database must maintain data integrity constraint.OLAP database does not get frequently modified. Hence, data integrity is not an issue.
Response timeIts response time is in millisecond.Response time in seconds to minutes.
Data qualitythe data in the OLTP database is always detailed and organized.The data in OLAP process might not be organized.
UsefulnessIt helps to control and run fundamental business tasks.It helps with planning, problem-solving, and decision support.
OperationAllow read/write operations.Only read and rarely write.
AudienceIt is a market orientated process.It is a customer orientated process.
Query TypeQueries in this process are standardized and simple.Complex queries involving aggregations.
Back-upComplete backup of the data combined with incremental backups.OLAP only need a backup from time to time. Backup is not important compared to OLTP.
DesignDB design is application oriented. Example: Database design changes with industry like Retail, Airline, Banking, etc.DB design is subject oriented. Example: Database design changes with subjects like sales, marketing, purchasing, etc.
User typeIt is used by Data critical users like clerk, DBA & Data Base professionals.Used by Data knowledge users like workers, managers, and CEO.
PurposeDesigned for real time business operations.Designed for analysis of business measures by category and attributes.
Performance metricTransaction throughput is the performance metricQuery throughput is the performance metric.
Number of usersThis kind of Database users allows thousands of users.This kind of DB allows only hundreds of users.
ProductivityIt helps to Increase user’s self-service and productivity.Help to Increase productivity of the business analysts.
ChallengeData Warehouses historically have been a development project which may prove costly to build.An OLAP cube is not an open SQL server data warehouse. Therefore, technical knowledge and experience is essential to manage the OLAP server.
ProcessIt provides fast result for daily used data.It ensures that response to the query is quicker consistently.
CharacteristicIt is easy to create and maintain.It lets the user create a view with the help of a spreadsheet.
StyleOLTP is designed to have fast response time, low data redundancy and is normalized.A data warehouse is created uniquely so that it can integrate different data sources for building a consolidated DB




Introduction to Database Management System

 A Database Management System (DBMS) is a collection of programs that enables user to create and maintain a database. The DBMS is a general-...