Skip to main content

Command Palette

Search for a command to run...

8 Week SQL Challenge - Case Study 4 (Customer Nodes Exploration)

Updated
2 min read
M
Mary has been writing about Data Science on her blog (http://maryjonah.me) for years. In her free time, she likes to read books and have a good rest.

Case Study #4 - Data Bank We are on the fourth (4th) out of 8 case studies: 🎊🥂. We are halfway through the marathon.
This time around Danny has a company which is an intersection between “AWS Bucket and your favorite online bank”. Danny does have a lot of ideas when it comes to establishing companies. My job here is to answer questions he has so here we go with the first focus area: A. Customer Nodes Exploration

How many unique nodes are there on the Data Bank system?

SELECT
    COUNT(DISTINCT node_id) AS total_nodes
FROM customer_nodes;

What is the number of nodes per region?

SELECT 
    region_id,
    COUNT(DISTINCT node_id) AS total_nodes_per_region
FROM customer_nodes
GROUP BY region_id
ORDER BY region_id;

How many customers are allocated to each region?

SELECT
    region_id,
    COUNT(DISTINCT customer_id) AS total_customers_per_region
FROM customer_nodes
GROUP BY region_id
ORDER BY region_id;

How many days on average are customers reallocated to a different node?

SELECT
    AVG(DATEDIFF(end_date, start_date)) AS avg_days_per_node
FROM customer_nodes
WHERE YEAR(start_date) <> 9999 AND YEAR(end_date) <> 9999;

What is the median, 80th and 95th percentile for this same reallocation days metric for each region?

WITH percent_rank_days AS (
    SELECT
        region_id,
        DATEDIFF(end_date, start_date) AS diff,
        PERCENT_RANK() OVER (PARTITION BY region_id ORDER BY DATEDIFF(end_date, start_date)) AS p_rank
    FROM customer_nodes
    WHERE YEAR(end_date) <> 9999
)
SELECT 
    region_id,
    MIN(CASE WHEN p_rank >= 0.5 THEN diff END) AS median_days,
    MIN(CASE WHEN p_rank >= 0.8 THEN diff END) AS percentile_80,
    MIN(CASE WHEN p_rank >= 0.95 THEN diff END) AS percentile_95
FROM percent_rank_days
GROUP BY region_id
ORDER BY region_id;