8 Week SQL Challenge - Case Study 5 (Data Cleansing Steps)
Welcome to Case Study #5 - Data Mart and the section 1. Data Cleansing Steps
In a single query, perform the following operations and generate a new table in the data_mart schema named clean_weekly_sales:
Convert the
week_dateto aDATEformatAdd a
week_numberas the second column for eachweek_datevalue, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etcAdd a
month_numberwith the calendar month for eachweek_datevalue as the 3rd columnAdd a
calendar_yearcolumn as the 4th column containing either 2018, 2019 or 2020 valuesAdd a new column called
age_bandafter the originalsegmentcolumn using the following mapping on the number inside thesegmentvalue
segment | age_band |
|---|---|
1 | Young Adults |
2 | Middle Aged |
3 or 4 | Retirees |
- Add a new
demographiccolumn using the following mapping for the first letter in thesegmentvalues:
segment | demographic |
C | Couples |
F | Families |
Ensure all
nullstring values with an"unknown"string value in the originalsegmentcolumn as well as the newage_bandanddemographiccolumnsGenerate a new
avg_transactioncolumn as thesalesvalue divided bytransactionsrounded to 2 decimal places for each record
CREATE TABLE clean_weekly_sales AS
SELECT
STR_TO_DATE(week_date, '%d/%m/%y') AS week_date,
CEIL(DAY(STR_TO_DATE(week_date, '%d/%m/%y')) / 7) AS week_number,
MONTH(STR_TO_DATE(week_date, '%d/%m/%y')) AS month_number,
YEAR(STR_TO_DATE(week_date, '%d/%m/%y')) AS calendar_year,
CASE
WHEN RIGHT(segment, 1) = '1' THEN 'Young Adults'
WHEN RIGHT(segment, 1) = '2' THEN 'Middle Aged'
WHEN RIGHT(segment, 1) = '3' THEN 'Retirees'
ELSE 'unknown'
END AS age_band,
CASE
WHEN UPPER(LEFT(segment, '1')) = 'C' THEN 'Couples'
WHEN UPPER(LEFT(segment, '1')) = 'F' THEN 'Families'
ELSE 'unknown'
END AS demographic,
ROUND(CAST(sales AS UNSIGNED) / CAST(transactions AS UNSIGNED), 2) AS avg_transaction
FROM weekly_sales;