Loading
Lesson 20
Courses / HackerRank SQL Problem Solving Solution Challenges
Weather Observation Station 20 | SQL Aggregation | HackerRank Solution

Summary

Summary of SQL Lesson on Hacker Rank's Observation Station 20

In this lesson, we explored a SQL task from Hacker Rank focusing on calculating the median latitude value (lat_n) from a table named station, which consists of five columns: ID, city, state, lat_n, and long_w. The primary goal is to determine the median of the lat_n values by sorting them and identifying the midpoint value.

Steps to Find the Median

  1. Count Rows: First, determine the total number of rows in the station table using:

    SELECT COUNT(*) FROM station;
    
  2. Calculate Midpoint: To find the median, calculate the midpoint based on the total row count. If the number of rows is odd, the median is the middle value. If even, it’s the lower of the two middle values. To achieve this, use the FLOOR function to avoid decimal results when dividing by two.

  3. Use Offset: Leverage the SQL OFFSET function to skip all rows before the calculated midpoint.

  4. Limit to One Row: After offsetting, use LIMIT 1 to get just the median value.

  5. Prepare Statement: Since SQL does not allow direct use of variables for the offset in some cases, a prepared statement is used. This involves:

    • Preparing the query with PREPARE.
    • Using placeholders (?) for variables.
    • Executing the prepared statement with the actual variable.
  6. Casting to Integer: Cast the row count to an integer to ensure SQL understands it properly for the OFFSET parameter.

Example of Final Query

Here’s a consolidated example of the SQL query:

SET @row_count = (SELECT COUNT(*) FROM station);
SET @mid_index = FLOOR(@row_count / 2);

PREPARE stmt FROM 'SELECT ROUND(lat_n, 4) FROM station ORDER BY lat_n LIMIT 1 OFFSET ?';
EXECUTE stmt USING @mid_index;

Result

After executing the code, the final result returned is the median latitude value, specifically rounded to four decimal places, yielding an answer of 83.8913.

This lesson effectively demonstrates how to calculate the median using SQL functions and techniques involving counting, offsets, and preparation of statements.


Thus concludes our lesson on the weather observation station problem.

Video Transcript

Hey everybody, welcome to another lesson on whether Observation Station 20 from the subsection aggregation of the SQL section from Hacker Rank. So let's get started. But given this question, it has the table station with five columns ID, city, state, lat, underscore, and longer underscore w. So basically for the purposes of this question, we only care about lat underscore n. So we want to find out what the median value is for all the lat underscore n values. Basically you take all the values for lat underscore n, sort them, ascending, and you got to take the midpoint. If it's odd, for example, if you had like three values, right, the midpoint would be the second one. Okay. So if it's even two values, that's a little bit weird, but we can take the lower one, which for example, if you have two elements, you can take the first one. Okay. So how can we do this? Well, I don't think there is a medium function we can leverage conveniently to use. So we got to do some a little bit more complicated stuff to get this working. So what my line of thinking about this is I'm going to first count how many rows there are. If I know how many rows, I can get the midpoint, right, the mid number. What's the index, right, of the mid row? If I order them by lat underscore n. And then what I do is I leverage a SQL offset function so I can offset all the records before that midpoint so I can eliminate them, right? So instead of querying all these first records before the midpoint, I eliminate them all with the offset parameter. And then after that, I use limit one to just take that very first row after eliminating all the previous ones because I don't want anything after the midpoints, after the midrow, right? So how can we accomplish that? Well, to do that dynamic number for the offset, I cannot just do it in plain SQL. I got to do a prepared statement and put a variable there that references the number of rows, the total number of rows in this table. So let's go one by one. I'm going to use my SQL here. Now first I want to find out how many rows there are. So that's easy, right? You can just do select count star from station, right? Now, that's nice. Is it possible if I could get like the midpoint of that row? Well, if I divide by two, right? If I take that and divide by two, I can get kind of a midpoint. But if it's a not number, it's going to be a decimal, right? So I don't want that. I can either take the floor, which is the number down, or I can round it up, right? With the ceiling. So for the purposes of this, because I'm using offset, and on offset everything before the midpoint, I'm going to use the floor, okay? So it's a little detail. So you got to, if you do it, you would have to play around to see if it's floor or ceiling as you do the problem. But I already know the offset, but I need to take the floor, okay? So that way, for example, if I have three elements, and if I divide three, divide by two is 1.5. If I do floor, that means I drop the five. If it were ceiling, it would go from 1.5 to two. I don't want that in this case, okay? Because I want to offset only one in this case. Because if I offset one, and there's three elements, right? A, B, C. I offset one, meaning I remove only the ones before the B. There's just one, right? And I get B and C, and then it can limit to remove C. Okay? So that's it for that part. Now that's nice. How can we place that in a variable? So we can do set, add, and the name of the variable, I'll call it row underscore count equals this. But because syntax, I have to add a parenthesis around the select, okay? And if I run the code, that should be fine. If the rank answer, we don't get any errors. That means the syntax is correct. Now we got the row count there, okay? We still got to do some stuff in that row count line, but I'll leave that for later, because you can see it's going to error out. Now we got the row count. Now we got to use that. So for the query, the query I want to do is this, select. Let's just start for now from station, order by, lat, underscore, and so I'll take all the rows, order by lat, underscore, and, okay? Now I want to offset, so remove everything from the midpoint upwards, right? I'm going to offset by the row count here. Obviously I cannot use it like this, okay? Because of SQL syntax, but I'm just showing you what my thought process is. Now I want to limit, before I say that I want to limit just one, that means I remove all the rows after the mid row, because I want to only get one, that one with the median value. And finally, I want to get the value for lat, underscore, and, but because it's rounded to four decimal places, I got to do round, lat, underscore, and here, comma four. So that would be the query, but if I do this query, it's not going to work. We got an error in syntax. Now to make this work, you got to do what's called a prepare statement. So to do prepare statement, we do like this. We call prepare SDMT, and then from any quotes, and you got to put this whole query there inside the quotes, and remove the semicolon put outside the quote. Now the, where you see the variable here, you got to take that out, replace it with a question mark, so that in the next line, you're going to say execute SDMT, okay, using that variable. So what this is doing is, okay, I want to execute SDMT. Which is this prepare statement, this query, and using these values. So this first value will replace the very first question mark that appears in the query there. Okay, so that's the value of row count is going to go into the question mark there. That's from the code. It's still going to get an error. Okay, so it's saying there's incorrect arguments to execute line three. So the problem here is the variable value, it's not properly being understood when you put it in the offset here. So you have to explicitly tell my SQL, this is integer value, right? So we have to cast the value, meaning convert it to another type. So where you have the row count definition here, you have to go select whatever this part here, you got to cast it to an integer. So I'll use the cast, whatever this is, as, okay, and sign sign means an integer that has a sign. So I'm going to run code now. And now you can see we got the answer 83.8913. That's the mid the point where you sort all the roles by Latin score and take the very mid row, take the value of the Latin score and round into four, that's what you get. I'm going to submit. And that's weather observation station 20.
No comments yet (loading...)
No comments yet (loading...)
Did you like the lesson? 😆👍
Consider a donation to support our work: