Lesson 44
Occupations | SQL Advanced Select | HackerRank Solution
Summary
SQL Occupations Challenge Summary
The challenge involves processing an occupations
table with name
and occupation
columns to output a table where each column lists the names of individuals sharing the same profession. Here’s how we can achieve that:
Step-by-Step Breakdown
-
Understanding the Input Table:
- We have a table
occupations
that contains names associated with their respective occupations (e.g., doctors, professors, singers, and actors).
- We have a table
-
Desired Output Structure:
- Each column in the output table represents a different occupation, with names listed under the appropriate occupation.
- For example, the first column might show all names with the occupation "Doctor".
-
SQL Techniques Used:
- Window Functions: To count occupations and facilitate ordered outputs.
- Derived Tables: To manage intermediate results.
- Aggregate Functions: To consolidate results into the final table.
-
Using Window Functions:
- Start with basic queries to understand the number of rows and specific counts for each occupation.
- Utilize the
COUNT
function alongsideOVER
andPARTITION BY
clauses to compute counts within each occupation.
-
Creating Columns for Each Occupation:
- Use
IF
statements orCASE
expressions to create individual columns for each occupation, displaying names where the occupation matches, or "no" otherwise.
- Use
-
Removing Unnecessary Rows:
- After creating columns for each occupation, filter the output to remove "no" entries and consolidate results into a clearer table.
-
Finalizing the Output Table:
- Use a derived table to group by the generated occupation-specific columns and use aggregate functions (like
MIN
) to consolidate rows properly.
- Use a derived table to group by the generated occupation-specific columns and use aggregate functions (like
Example SQL Code Structure
SELECT
MIN(CASE WHEN occupation = 'Doctor' THEN name END) AS Doctor,
MIN(CASE WHEN occupation = 'Professor' THEN name END) AS Professor,
MIN(CASE WHEN occupation = 'Singer' THEN name END) AS Singer,
MIN(CASE WHEN occupation = 'Actor' THEN name END) AS Actor
FROM
(SELECT name, occupation, ROW_NUMBER() OVER (PARTITION BY occupation ORDER BY name) as rn
FROM occupations) AS derived_table
GROUP BY rn
Conclusion
- The final output table successfully consolidates names into the respective occupation columns, formatted as required. This process illustrates the powerful use of SQL functions to manipulate and present data efficiently.
Video Transcript
Hey, I will do the challenge occupations from the subsection advanced select of SQL and
hacker rank.
So we're given this table occupations that has the columns name and occupation and we're
asked to produce the following.
The simple output should be a table.
Each column is a list of names that follow the same profession.
For example, in this case here, we see that we got Jenny and Samantha in the output for
the first column.
What is that column about?
Now, if you look at the input, Samantha is a doctor and Jenny is also a doctor.
So we take all the names whose occupation is doctor and place them as the first column
here.
And then we do the same for every other occupation.
For Ashley, that's a professor, Christine and Kedi, we place them in the second column
because they're all with the occupation professor.
Do the same for the actor and the singer one.
I will solve this problem using window functions, derived tables and aggregate functions.
And now we use my SQL.
So to help you understand how much solution is going to go, I want to first talk about
window function.
Let's say we're going to just select name occupation from occupations first so we can
understand what the data looks like.
So as you can see, we got the name there corresponding occupation.
Now let's say I want to count how many rows I have.
So I can do select count star from occupations, right?
And click run code.
As you can see, there are 18 rows.
But what happened here was the 18 rows collapsed into a single row with a number in the output.
What if I wanted to have that 18 for every single row and I wanted to print the original
rows but with the 18 as the third column?
Now to do that, I can do a window function like this.
If I do select count star over like so, and I can even add the name and occupation back
here so you can better visualize it.
So when I say the over there with the parentheses, I can now get that count for every row so
you can see 18 is in every row.
So it's important we understand this concept because we're going to use it to produce the
output for this problem.
And we can go even further.
What if I want to count on a partition discount?
Say I want to count just a number of people in a certain profession.
So I can go here inside the parentheses for over and I can say partition by and I'm going
to give occupation.
So if I do that, it's going to count by occupation.
So we can see Samantha is an actor and now it's no longer 18 rows.
It's four because in total, it found four rows that all have occupation actor.
So we can see every actor now have four.
For doctor, they'll have three because they're in total three doctors and for professor seven.
Okay.
Now we can go further.
What if we want to take these and order by name?
Say I want Eve to come before Samantha.
I can go inside partition by occupation and say order by and I can say name.
So as you can see, it orders by name.
Eve comes before Jennifer, Ked and Samantha.
So we're going to keep these concepts in mind because you're going to use for the solution.
So now I want to create a column for each of these professions.
So I can let's start with just the one, the first one that's going to be doctor.
So I'm going to go here.
We have one, two, three, and then going to create the fourth column.
And for this column, I want to display the value for the name only if it matches the
profession or occupation that I want.
In this case, I'm going to create the column for doctor.
So it should match occupation equals doctor.
So to do that, I'm going to use an if statement.
You can also use case or whatever other thing.
So if the occupation is equal to doctor, what we're going to do, we're going to display
the name here.
Otherwise just going to say no.
Okay.
So what's going to happen if I run this code?
So as you can see, we have the last column that's before and I have Amina, Julia and
Priya.
And these are actually what's going to be in the final results.
We just need to get rid of all these unnecessary nodes before.
Okay.
So there will be some nodes that will appear in case that one of the columns has more rows
than the others.
For example, this one, Jenny, Samantha, no, and then Ashley, Christine and Katie.
So but we don't have to have so many no.
So we're going to get rid of them later.
But for now, this column is for doctor.
Then you repeat the same pattern for all the three other professions.
Okay.
So the same pattern if you can do with one, you can do with all of them.
So we're going to do here, I'm going to create another column.
And this one, if the occupation is equal to professor, we're going to display the name.
Otherwise, we're going to say no.
If I click round code and verify what it looks like, as you can see, it now appeared right
here.
And this is because you can see Ashley is selected because that's the row for Ashley.
Velvet selected, that's because the S row for Velvet.
And the Darault professors, as you can see.
So I left the name and the occupation here, just so we can visualize what's happening,
how the data is being shaped.
But in the end, the final result, you don't need to display or have these two in your
query.
Okay, the name and the occupation.
Let's do the other ones.
So after professor, you're going to say if occupation is equal to singer, we're going
to display the name.
Otherwise, else case, we're going to do no.
Okay.
So the argument is the if condition.
The second is what happens if that is true.
And the third argument is the else case, what happens if that condition is false?
So we return no.
Comma, let's do if occupation is equal to actor.
Then we're going to name.
Otherwise, else we do no.
Okay.
So we got all four of those.
Okay.
So let's run the code and see what we get.
As you can see, now we have the columns for actor here in those rows for doctor in those
rows for professor in those rows and for singer on in those rows.
Now we got a shape this so it's better to the final result here, right?
Got rid of all these no's and remove the first three columns.
So we can get rid of name occupation here is not necessary.
And what I'm going to do is I have to do a derived table here.
So I'm going to select something.
Okay.
I'll do it in a second from this whole result, the stable that we created.
Okay.
Now, because my SQL when you do derive tables as for an A list, we have to say as some name,
whatever I'm going to say derived table.
Now what I'm going to do here is I want to select an any an aggregate function.
It could be mean, for example, I just need an aggregate.
That's the thing because I'm going to do here group by and I need to group by this value
here value of the remember that number the 123 4 based on the actor or 123 based on doctor,
I need that number here.
So I need I'm going to create an alias for this so I can reference in this group by outside.
So I'm going to say here, we can say maybe row numbers and like that.
Okay.
And here outside me is the same name row number.
So I'm going to group by them.
That means I when I have the one here, all the ones you can see one one, I'm going to
group them meaning I collapse them into a single row.
So this one will be combined with this one will be combined with that one will be combined
with that one.
Right.
So if I take them here so you can better visualize and I'll paste them here so you can see this
one.
Do I do actually?
Yeah.
So we're going to take all these combined based on this number because the same number
they all become one.
So you're going to have let me remove those because we don't need them.
So essentially you're going to have a Mina here.
We got Christine here.
We got Ashley here.
So something like this.
Okay.
So that they are each in their proper column without all those extra stuff.
Okay.
So that's what it's doing when you do a group by that.
So to do this I need to do a select here and I'll select using an aggregate.
So I'll use min.
Okay.
I need to reference the value of the column for the doctors.
I need a name is here to make it easy for me to reference there.
So I can see as whatever name you want, you can get colleges doctor and you can say doctor
here and then you do the same thing for every other column.
So I'm going to do a common min professor and at this name professor has to be the alias
for this one as professor.
And then I'm going to do min of singer and this name singer has to match the alias for
this one right here as singer.
And finally I have to do min of actor and this name actor here has to match the alias
for this column here.
So I'm going to say as actor.
Now let's run the code.
Now we get an error and that's because I used row number here and that's probably a keyword.
So I have to say another name that's not exactly row number.
I remove the E and just number.
Let's see if that works.
And there you go.
Now we can see we got those columns consolidated into one and you can see the first column
for doctor, the second for professor and the third for singer, the fourth for actor.
No comments yet (loading...)
No comments yet (loading...)
Did you like the lesson? 😆👍
Consider a donation to support our work: