SQL practice course call conversion question doubt

3. Call conversions


You are given the following tables, containing information about phone calls that a company receives as well as the company’s sales.

all_sales

date order_id customer_id order_source phone_call_id order_total num_items
2019-12-20 00001 0001 online 0 50 1
2019-12-20 00002 0002 phone 4282 100 2
2019-12-20 00003 0003 affiliate 0 200 1
2019-12-20 00004 0004 online 0 100 2
2019-12-20 00005 0004 phone 4285 350 2

call_info

date employee_id phone_call_id order_id call_length_sec
2019-12-20 31331 4282 00002 303
2019-12-20 34611 4285 00005 202
2019-12-20 49760 4286 0 155
2019-12-20 26583 4287 0 130
2019-12-20 20267 4288 0 190

Using the tables above, write a SQL query to calculate the following metrics, split by week:

  • percent of sales from calls

  • percent of calls that made a purchase

  • number of calls that purchase

  • average call time for calls that purchase

  • average call time for the week

Note that you should be able to write a full SQL query for this question just given the table schemas above.

DOUBT: I didnt understand the solution which is provided to these question. and also their is doubt in how to split by week.

The below image shows how call_info table and sales table have a left join (the below tables are just an example to explain the concept though in reality these tables would be quite big). It also shows how you extract 1st day of the week from a given date at the end of the image.


The final table that is created at the end is the main table from where all questions are answered except the 1st question percent of sales from calls (this can not be solved using the simple left join of these 2 tables) so for now you can assume that question is not solved here.

Coming to understanding the table one important note is that whenever there is 0 in order_id for call_info table it means that there is no purchase happening for that call. (this is true for phone_call_id in sales table as well - check the note in image)

The first doubt on split by week can be clearly understood if you understand the reasoning given for extracting 1st day of week for a given date. In that DAYOFWEEK is a new function used which just gives day# for a given date, this function usually works in mysql and not postgresql. (but you can also extract day of week # in postgresql as well as shown in this link - sql - Extract day of week from date field in PostgreSQL assuming weeks start on Monday - Stack Overflow). Another functions is DATE_SUB() which just subtract the given number of days from the date (this is simply shown as `date - INTERVAL DOW# - 1 day in image).

Once 1st day of week is known you will observe that three of the above rows have the same 1st day of week (please also check the original calendar of 2019 to get a complete feeling of these dates), if you groupby on this then these rows will be aggregated into 1 row based on aggregation function applied.

For rest of the question please use the final table in above image and think out how the below queries work (below queries are the actual answers for the given question)

SELECT  
    -- reformatting date to the 1st day of the week
    DATE_SUB(b.date, INTERVAL DAYOFWEEK(b.date)-1 DAY) AS week,
       -- b.date - INTERVAL DATE_TRUNCK('week',b.date)-1
    -- average call time across all calls
    AVG(b.call_length_sec) AS average_call_time,
    -- average call time for customers which purchased
    AVG(CASE  WHEN b.order_id > 0 THEN b.call_length_sec END) 
             AS average_call_time_purchase,
    -- total number of calls which purchased
    COUNT(DISTINCT CASE  WHEN b.order_id >0 THEN 
             b.phone_call_id END) AS num_call_purchase
    -- pct of phone calls which purcahsed
    COUNT(DISTINCT CASE  WHEN b.order_id >0 THEN 
             b.phone_call_id END)*100 /COUNT(b.phone_call_id) AS pct_phone_call_purchase
FROM call_info AS a
LEFT JOIN all_sales AS b
    ON a.order_id = b.order_id
    AND a.phone_call_id = b.phone_call_id

GROUP BY 1

As an example for the question number of call that purchase
we first did DISTINCT CASE WHEN b.order_id >0 THEN b.phone_call_id END
this gives all those phone call ids where an order or purchase was made (remember order id is 0 when no purchase was made) and then we do count on it. This gives number of phone calls where purchase was made. Finally we divide it by total phone calls to get the percentage.

NOTE: whenever you face confusion with complex join and queries and it is not getting clear very easily. It would be great if you take out pen and paper and draw a toy table to understand how it works.

Sir…their is one more doubt why we are taking join condition on phone call id and order id both.
As we are considering left join then whole table of call info is already their…

and one one more thing why we are using Distinct function here??

so using 2 column for joining is like a standard fail-safe practice in cases when you know that there 2 common column in between 2 table. This ensures any edge cases are not left out. You are right we might not require it in this case but usually you should apply it even then as it ensures that practically everywhere phone_id and order_id should match while joining.

Now on the taking distinct phone_id part, i think we should have used distinct everywhere as it might be possible that phone_id is repeated multiple time in call_info table.

Ohk sir…Thanks!!

Sorry sir…their is another doubt why their is need to join both the table as call_info table do all the task…for us…

the same reason that i have given for joining on 2 columns applies here. You need to ensure that only the correct order_id from sales table are matched. May be there are some wrong ids as well. But again this you will be able to check only if you have the complete table so for now we assume that joining ensures this effect. (if there are wrong ids that are only available in call_info table then after joining you will have to ensure that they are not counted towards number of purchases, this you are not doing in current question but that’s ok as you do not have the complete table)

Anyways the major point is joining ensures any issues that might have been occuring with call_info table is handled properly. That’s why joining is preferred here.

Ohk…sir …that clear my all doubts related to these question.