The below image shows how call_info table and sales table have a left join (the below tables are just an example to explain the concept though in reality these tables would be quite big). It also shows how you extract 1st day of the week from a given date at the end of the image.
The final table that is created at the end is the main table from where all questions are answered except the 1st question
percent of sales from calls
(this can not be solved using the simple left join of these 2 tables) so for now you can assume that question is not solved here.
Coming to understanding the table one important note is that whenever there is 0 in order_id for call_info table it means that there is no purchase happening for that call. (this is true for phone_call_id in sales table as well - check the note in image)
The first doubt on split by week can be clearly understood if you understand the reasoning given for extracting 1st day of week for a given date. In that DAYOFWEEK
is a new function used which just gives day# for a given date, this function usually works in mysql and not postgresql. (but you can also extract day of week # in postgresql as well as shown in this link - sql - Extract day of week from date field in PostgreSQL assuming weeks start on Monday - Stack Overflow). Another functions is DATE_SUB()
which just subtract the given number of days from the date (this is simply shown as `date - INTERVAL DOW# - 1 day in image).
Once 1st day of week is known you will observe that three of the above rows have the same 1st day of week (please also check the original calendar of 2019 to get a complete feeling of these dates), if you groupby on this then these rows will be aggregated into 1 row based on aggregation function applied.
For rest of the question please use the final table in above image and think out how the below queries work (below queries are the actual answers for the given question)
SELECT
-- reformatting date to the 1st day of the week
DATE_SUB(b.date, INTERVAL DAYOFWEEK(b.date)-1 DAY) AS week,
-- b.date - INTERVAL DATE_TRUNCK('week',b.date)-1
-- average call time across all calls
AVG(b.call_length_sec) AS average_call_time,
-- average call time for customers which purchased
AVG(CASE WHEN b.order_id > 0 THEN b.call_length_sec END)
AS average_call_time_purchase,
-- total number of calls which purchased
COUNT(DISTINCT CASE WHEN b.order_id >0 THEN
b.phone_call_id END) AS num_call_purchase
-- pct of phone calls which purcahsed
COUNT(DISTINCT CASE WHEN b.order_id >0 THEN
b.phone_call_id END)*100 /COUNT(b.phone_call_id) AS pct_phone_call_purchase
FROM call_info AS a
LEFT JOIN all_sales AS b
ON a.order_id = b.order_id
AND a.phone_call_id = b.phone_call_id
GROUP BY 1
As an example for the question number of call that purchase
we first did DISTINCT CASE WHEN b.order_id >0 THEN b.phone_call_id END
this gives all those phone call ids where an order or purchase was made (remember order id is 0 when no purchase was made) and then we do count on it. This gives number of phone calls where purchase was made. Finally we divide it by total phone calls to get the percentage.
NOTE: whenever you face confusion with complex join and queries and it is not getting clear very easily. It would be great if you take out pen and paper and draw a toy table to understand how it works.