8 Week SQL Challenge - Case Study 4 (Customer Nodes Exploration)
Case Study #4 - Data Bank We are on the fourth (4th) out of 8 case studies: 🎊🥂. We are halfway through the marathon.
This time around Danny has a company which is an intersection between “AWS Bucket and your favorite online bank”. Danny does have a lot of ideas when it comes to establishing companies. My job here is to answer questions he has so here we go with the first focus area: A. Customer Nodes Exploration
How many unique nodes are there on the Data Bank system?
SELECT
COUNT(DISTINCT node_id) AS total_nodes
FROM customer_nodes;
What is the number of nodes per region?
SELECT
region_id,
COUNT(DISTINCT node_id) AS total_nodes_per_region
FROM customer_nodes
GROUP BY region_id
ORDER BY region_id;
How many customers are allocated to each region?
SELECT
region_id,
COUNT(DISTINCT customer_id) AS total_customers_per_region
FROM customer_nodes
GROUP BY region_id
ORDER BY region_id;
How many days on average are customers reallocated to a different node?
SELECT
AVG(DATEDIFF(end_date, start_date)) AS avg_days_per_node
FROM customer_nodes
WHERE YEAR(start_date) <> 9999 AND YEAR(end_date) <> 9999;
What is the median, 80th and 95th percentile for this same reallocation days metric for each region?
WITH percent_rank_days AS (
SELECT
region_id,
DATEDIFF(end_date, start_date) AS diff,
PERCENT_RANK() OVER (PARTITION BY region_id ORDER BY DATEDIFF(end_date, start_date)) AS p_rank
FROM customer_nodes
WHERE YEAR(end_date) <> 9999
)
SELECT
region_id,
MIN(CASE WHEN p_rank >= 0.5 THEN diff END) AS median_days,
MIN(CASE WHEN p_rank >= 0.8 THEN diff END) AS percentile_80,
MIN(CASE WHEN p_rank >= 0.95 THEN diff END) AS percentile_95
FROM percent_rank_days
GROUP BY region_id
ORDER BY region_id;