Intro to SQL - Bootcamp Day 11 (2024-09-17)

Lesson 11

Courses / Full Stack Web Dev Bootcamp (August 26 - September 26, 2024)

The lecture introduces Structured Query Language (SQL) and teaches you how to make some basic queries for a relational database management system (RDBMS).

You get to learn how to retrieve data, including with a filter.

You also learn how to eliminate duplicates and count.

The lecture also goes over the insertion of new data into the database tables.

You also learn how to delete data.

The lecture also teaches you how to update existing database table rows.

You also learn about PostgreSQL and pgAdmin. The lecture briefly goes over PSQL and the Query Tool, as well as creating a table using the graphical user interface (GUI).

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16

Video Transcript

Today I will talk about SQL and I will use this website here to demonstrate the examples. So SQL Structure Acquiry Language is a language that we learn to be able to interface with a relational database. You might hear relational database management systems, rdbms, and it's the most popular way today to store data. So when you access any website or web app or application in your mobile devices, it will pull the data through the back end from a database and probably it's likely it's using SQL. And SQL has many flavors. For example, there's Microsoft SQL Server, there's Oracle Database SQL, there's MySQL, popular one, and Postgres, which is the one I usually use, and many others. So it's like you have cars, right? Cars have different makes. In the end of all cars, they kind of behave the same. But we have kind of different flavors or make, right? Toyota makes cars, Ford makes cars. So those are like the specific implementations of SQL, whereas the entity of the car itself, the blueprint for the car is SQL. Anyway, this website here, you can write your SQL queries, as we call them, in this text box here. The left hand side is the definition for each table. And the right hand side are the actual table and their data. So in SQL, we operate with tables and each table has columns. And we have rows. It's like a spreadsheet, like if you ever use Excel, or Google Sheets, same thing. So for every column has a different piece of information. For example, we have a table to store information about the customers. And you would have like a column for the first name, for the last name, for the age for the country the customer is from, and so on. Whereas each row is corresponds to a specific customer. So row, this row with customer ID number one is for John Doe, he's 31 from the USA. And then we got another row for another customer, Robert Luna, 22 from USA, and so on. Okay, so database has many tables, each table has columns, and we got the rows. Okay, so we're going to type some query here, which is what we use to communicate our interface with the database. And the language uses structured query language or SQL. The most basic query is called a select query. If you want to ask information about a specific table, and you're going to say select. And I use capital all caps, because that's the usual convention we use whenever there's a keyword that's specific or reserved to the language of SQL, we capitalize, but it's not really necessary. And I'm going to use star after select space. And usually we select specific columns. So star means all the columns, and then space from space, the name of the table, in this case, customers, and then you finish off the semicolon. And how do I know it's customers because Britain here left hand side and also see on the right hand side. And I click run, I should be able to see all the rows for the table of customers and all the columns appear. So this is a small data set. In general, it would be a lot more, maybe millions of rows, if not more. So obviously you can do by eye, but usually that's not the case. So if you want, you can limit your query to just a few rows. For example, I want to limit for example, I want to limit to three. So add that after the from space limit three, which means I want only want three results, three rows, at most. And in that case, it just captures three. This is useful if you're dealing, you're looking at data in your job. And there's some millions of them you don't want to do without the limit because if I was going to probably freeze and crash, because just too many records. So you want to limit when you're dealing with actual production data, and you're just verifying things. Anyway, I'm going to remove that limit. Let's say I want to capture just people's last names. So I can do that instead of a star, I would write the column name, which is last underscore name. It's important that every character is written with the correct casing, and with the underscore that there's no space there. So you can see last name, we expect to see Del Luna Robinson Reinhardt and Doe. Yeah. If you want to include first name as well, what do you do is separate the column names of a comma. So you can say either before or after last name, you could put first underscore name comma, then round the SQL again, and you should see both columns. People like to add a space after a comma just for the readability. That's not important. Okay. So if I were to know, like, usually, from the point of view, there's different perspectives in SQL, right? You have the perspective of being able to store the data for a specific application. And there's also the perspective of people who are looking to understand the data from the customers. Maybe these people are making SQL queries to find out and answer questions that maybe the business side wants to know. For example, maybe your business wants to know, where are our customers from? So somebody would come and make a SQL query and produce a report of the results. So how would we do that? Well, we got a table of customers. So we can look up the country, a column there, so we can select the country from customers. And if I run the SQL, I should be able to see a list of all the customers countries. Of course, there'll be more than one row with the same country because we've got multiple customers from specific country. And if you're dealing with like actual data sets, going to be millions of pros, that's really hard to kind of count by eye. So what we do is we really can consolidate these records and eliminate duplicates. One way of doing that is if you have the column name here, proceed that with distinct keywords space, you should be able to kind of remove all the duplicate country names. And that will be a lot better. Now you can see there's three countries there. So by eye, you can see USA, UK, UAE. So you can produce this report and send it to whoever asks for that information. And there's that, you know, there's at most 195 or something countries in the world. So that would be okay. If you want to count how many countries there are, you can also use the count function, if you put count parentheses like this. And you I would put the distinct inside if you want to make it not duplicate. You can do that if you want to count there's three, we are operating right now in three countries, if you care about the number of them. So if you want to count anything to put account parentheses in close that there. I usually people do count star, that's okay as well to count how many rows in the table. But this one we want distinct country. Okay. Let's talk about the filtering. Sometimes you might need to find some rows by a specific column value. For example, maybe a customer comes in and you need to look up their account information and for they for some reason forget their account number, which is the customer ID. So how would I do that? Well, let's do select all the column star from customers, but they give us the last name. So the filter keyword is where after from here. And last underscore name the column name equals some value under quotes here. Let's say they come in and say their last name is dough. By the way, it's important I put the capital letter D there. So this would narrow down to specific rows whose last name is dough. But keep in mind in this world, there can be multiple people with the same names. So it's not a unique match. And that's the reason we have this column here customer ID that uniquely identifies each row. So if you go back to this right hand side, we always have for the tables an ID column that we call the primary key. And that primary key is unique for each row. There can be no other row the same primary key. So that that makes us be able to have multiple people the same name and know which one is which. Okay. So here we let's say the customer said my last name is dough. But then you okay, there's two people dough. Obviously, there'll be a lot more in an actual production data. So you ask them, okay, what's your first name? I can narrow it down. How do we add where last name is dough and the first name is maybe john. Well, you just use the end keyword here. And then you just add another column equal value fair. So first name equal. And then john. And then you can run that and you're going to be narrow down even more. Because this data that set is small, we only have one row matches. But I would record that an actual data production, you'd have a lot more. So that wouldn't still be enough. So you'd have to narrow down by the age by the country and many other attributes, which is really bad because usually you ask the customer for the ideal primary key. So that's what you usually filter by. So if they tell okay, I'm a customer ID, let's say I'm David now, David Robinson, customer ID is three. So you can right away look it up. Okay, where customer ID is three, by the way, when you have numbers like this integer, I don't need quotes, like run and shoot exact match, you know, this one is always exact match, there's only one row with that primary key. Now, each column is a data type. If you look at the right left hand side, you can see customer ID on the right hand side is this int. That means the data type for that, which is integers. And then first name, last name are both their care parentheses 100. What that means is the character varying string, a sequence of characters, that could vary in its length, it could be like one character could be two, and so on all the way up to 100 characters. And the other ones pretty much ints for age and via care 100 for country. Okay, so each column has a data type. All right, so one thing I meant, like I backtracking here for the first name, I told you to put a j capitalize, what happens if I put lowercase, it's not going to work, because it's trying to match exactly with the upper lowercase, but there's a way to kind of normalize that. One way is either use the upper or lower function. For example, if I were to lower case, the first name, meaning all the letters from first name are now become lowercase, so I can compare, it would always be lowercase. You can do that. That's one way. I can do upper. And if you do upper, basically, you got to put all uppercase here, because it will always take the first name, make all letters uppercase or capitalize, and then you have to compare the uppercase, kind of normalized version. So that's a technique you can use. Okay, let's see. Let's talk about inserting records now. So we learned how to retrieve data. Now we need to learn how to add new rows to the table. And that's insert into insert spacing to and then you give the table name customers. And the short version is you can just pass the values by saying values parentheses, semicolon, and then you write them in order as they're defined here in the table definition, and separate the values of comma. Now the first one at the top here from the top to the bottom is customer ID. In this case, we have to look what's the next customer ID because usually that's auto incremented. So it's 12345. And then the next one will be six, the new row. Usually that's taken care of. But whenever you try to insert, it takes like a default value that's always the next in the sequence. But since we're in this environment, I have to manually type it. So I'm going to put a six. If you try to put five there, it's not going to work. Let me just show you. I'll put five that it should be six. And then let's put Anna for the first name. And I'll put Del for the last name. And then age. By the way, I'm following the sequence from top to bottom here of the definition. Put 33 and output country, Canada. And if I try to run, you're going to see error, unique constraint failed, customers dot customer ID. So what's happening, like I told you, there can be no other row with the same unique primary key. So the primary key customer ID that belongs to the table customer. So whenever you see this dot, I just think of the column name on the right hand side belongs to the table on the left hand side, which is customers. And there's a constraint that failed, that's called a unique constraint. So if you can apply a unique constraint to a certain column, so that there can be no other value for that column. And usually that's always the case for primary keys, you want it to be unique. Otherwise, like you'd have two records, the same thing, how would you distinguish? Anyway, I'm going to change five to six, because that's the next one. And you can see the right hand side of the data has been inserted. Okay, so let's try. So we can add other people just change the put seven, John, though 47 from split a UAE. And you can see there's no unique constraint for name, because two or more people can have the same name. So this John Doe here is from UAE and he's for seven. Okay, let's let me teach you the explicit way of the longer version of the inserting to because this one is kind of implicit for the columns here. So before values, we're going to have to specify the name of the columns in any order we want. For example, let's do the original order customer ID type the column names, separate by comma, last underscore name, age, and then country. Okay, so here in this case, we got to match the order as they appear. And if you want to change the order, that's fine too. You can, for example, swap last with first name here. That way you would you could type the last name first and first name afterward. But to do that, you got to make sure your right hand side values here, you have to swap your dough with John. So dough has to come first. So you got to match the order as they appear, for example, customer ID has to match the first value on the right hand side and then the second on the left hand side has to match the second right hand side and so on. Okay, I got to change the unique constraint to eight. And let me put another name. There you go. And you can see the first name is still first name last name for last name, despite we having swapped the order here. So that's the more explicit version. If you don't know exactly the order of the columns, which usually you don't right away get like a nice interface like this showing right away usually have to ask the database can you describe the column for the tables. Okay, now like, can we delete records? I'm not teach you how to do that. All right. So delete from is the keyword. And then you're going to say the table name. But then you got to watch out. This by itself will delete all the rows, which is something you never ever want to do usually. So be careful using this. Always want to add a filter here where some column value and typically is always the primary key because we know for sure we're only deleting one thing. So if you try to delete by the last name, for example, you could unintentionally delete all the people whose last name was specified, despite you only wanting to delete one person. Right. If I were to delete, let's say, okay, Betty, Betty wants to be deleted from the system for some reason. Now, if you were to say last name is dough here, it's going to not only delete Betty, but everybody else who has the last name dough, which is unintentional, you never want to do that. I will do it just for the sake of example. You see they're all gone. People last name is dough. So never do that. Let me refresh. Usually you don't get this. So I refresh to reset. So what are you going to do is you want to use all as the primary key customer ID. In this case, Betty is five. So you're very safe to only delete one row. So you know for sure it's not going to have any tennis consequences. All right, so that's pretty much it. Delete from a table name where primary key equals some value. So you're very safe from deleting anything else. Okay, now let's talk about update. Update from, and then you got to give the table name, customer, for example, let's say David Robinson moved from the UK to Canada. So we need to update this country. How do we do that? Well, update from customers, we're going to set country, Canada. But if I do it just like this, everybody will go to Canada, which is not what I want. So like delete from, you got to add a where. Don't forget. I'm going to do it for the sake of example. If I do that, where I messed up the syntax, delete from customers. Let me see. Set country, Canada. Sometimes I forget the syntax. Oh, no, that's not. Oh, there's no from. Let's try that. Sorry about that. Always, we often like forget some details. Anyway, there you go. So everybody's going to Canada now, which is not what we want. So I'm going to reset just for the sake of this. And I want to always add the where here. And then we got it. Usually, it's the primary key to be safe. It's only all going to update one row, right? Customer ID. And was it David that I said? Yeah. It's David Robinson. And now changes to Canada there. Now this nice interface allows me to see the data right away here, obviously in an actual production. There'll be millions of rows, and I wouldn't have it on the side. So I'd have to actually query, right, select star from customers, where customer ID is equal to three to confirm that it was indeed changed. And I can see, okay, Canada is there. And now that now that I'm in that thing about me forgetting the syntax, let me give you a nice resource. This one is very nice. If you're learning SQL, and you want to do some exercises, I recommend you see this one, W3Schools.com. And it gives you all the syntaxes. And even if you've done this for a while, you might forget how to do things. And you can look it up. For example, the one I just forget update. Can I go there? Okay, update. Okay, there's no from set. And then you can see the syntax and where. And you can even try I think demo database somewhere here. So very nice resource. Of course, you can always make a search in your search engine. How do I do this in SQL? Or you can use the now artificial intelligence AI chatbots, and ask them as well. Okay, go back here. Great. Now let's talk about combining tables. So we might have noticed. So we have a costumers and costumers, they are or something that maybe they're right, this is the e-commerce star or something. I have that we have a table of orders here, so we can select star from orders to know what people have ordered. Okay, somebody bought a keyboard, mouse, monitor, and so on. Now I don't know exactly the information about the user who bought this. Well, I do have the customer ID here. But that doesn't tell me their name, possibly the address and the age and so on. Why isn't the customer information available in the orders table? Because when we do SQL and relational databases, we isolate each piece of resource in its own table. So there's only one single source of truth, and there's no like data being copied and pasted to another table. It would be a mess. Imagine if you change the customer data in one table, it would be out of sync of all the other tables that all had also copied the same data. So because of that, we always keep things like we have a table for customers, we have a table called orders, and separate every single resource. Now you might be asked, okay, how do I, I need a way to see a report with order information and customer information alongside. Do I have to do it by I and locate customer ID for, I got to do it to the customer table, look it up for, okay, John, whatever, and then I got, how do I make combine everything? That's called join. So we do joins, SQL joins to combine tables together to provide us a report for the information you're seeking. So let's say we got orders, we need to add information about the customer. But how do we initialize that? Well, first you got to, when you have tables that point to another resource in another table, we have these columns called foreign keys. And for example, the order for the keyboard here has customer ID for this customer ID column is a foreign key that points to the primary key of the customers table, which is customer ID just happens to have the same name, but they don't have to be. And you just look at up, okay, customer for this is the row that's supposedly going to match as we're going to join. Now this kind of join where we match the foreign key of the primary key of the other table, it's called the inner join. Okay, so we're going to do here select start from orders. And they're going to enter join, that's the keyword. And then the other table customers, now I've got to say on keyword, and you got to match the columns. Okay, on, let's say we want the customer ID of the customer stable to be equal to the customer ID of the order stable. But you might have noticed there have the same name, this is going to be ambiguous. If you click run, you can ever ambiguous column name, because they have the same name in the different tables, we have to specify that which table you're talking about. So you've got to say here, customers dot to say this is the customer ID column of the customer stable, I want to match with the orders table dot customer ID column. Okay, so that's how you get around that. You want to be explicit. Now you're going to see this here, which is the combined tables. And I'm going to annotate the way I think about this is I highlight this here, the left hand side, that's comes from orders. And the right hand side is the actual table for customers. So that's how I think about it, if you can see my annotation. So visually, I combine the table side by side, but but it's actually matching on the customer ID here. If you notice, they always match on the customer ID. And that's how you got information for customers and their orders. Great. Let me, you might be asked, how do I eliminate the customer ID that's twice appearing? Well, you have to actually manually type all the column names that you want to see here. So you would have to manually type or ID item amount customer ID, but it's going to be ambiguous. So you got to say either one of them maybe customers dot doesn't really matter, because it's the same. And then first name, last name and so on. I think there's age and country like that. And now you can see there's one less customer ID. And then you can like print this out or send this to your the person who asked for that information. Obviously, it's going to be a lot more than this in the actual data set. So you might want to limit or not depends how many orders there are. Okay, and you can do by date as well. If there's like a date, like you don't want to deal with past orders, you would add a where the date like is like last 30 days or something. It basically the the you put the dates is greater than some like I'm 30 days before the date of 30 days, something like that. Alright, so that's the inner join let me talk about secret van diagram because I always show this very useful. We not only have inner joins but left and right and other kind of joins. Let me copy image link and open a new tab face to the zoom chat. So this kind of join here that we did was the one in the middle here, which is called the inner join. If you look at this, it has a van diagram, which is basically a set A and set B. And think of the set a table A and set B is the table B in this case, like orders and customers. And if you want to match them like customers who have orders, it'll be the intersection, which is the highlight here. And then you read the code here, which is the query that would do that select and the table column names from the table A inner join table B on a dot key equals B dot key. So you can copy that. Mind you, this kind of inner join doesn't give you customers who didn't order anything, right? So if you're looking to generate a report of all the customers and their orders, there are some customers who might not have ordered yet. Since there is no matching ID, they'll be excluded from the report. But if you want to include them, you could maybe do this a left join depends on how you think about it could be this left or the right, right? If customers A is A, you can do the left join, right? If customers A and B is orders. So this would mean, okay, the one in the intersection would be the customers who have orders, right? Because they're both in the customer stable and in the order stable. But there's some customers might not have an order that will be exclusively in A. That's why the A is highlighted here. You include that as well. So that's a left join to just change the keyword from inner to left. So you got to think about your problem first in terms of set in order to select which if it's left, inner, or right, okay, and there's even full outer join and full outer join like this with everything and everything by the intersection and so on. Okay, but I would say the most common one is the one in the middle that people have to know at least. Okay, great. Now let's see. I told you how to count already. So let's talk about creating tables. You might know, okay, these tables have already been created for us. Is there a way we can create our own tables? Yes. So you can do create table statement like this and then give it a name. For example, let's keep track of cars. I can create a table called cars or lowercase and then parentheses. And typically we write it with readability. We could break a line like this semicolon at the end and then for each line, we write the column description. For example, we need to keep track of the car make. So we write the column make, and then we need a type. Car make is just a sequence of characters could be fair care of 255 characters. If you don't know how many characters at most, it's going to have just put 255 and you can change later if you need. Okay, then I can do model, car model as well, keep track of that's bear care as well. And you can go on there's so many attributes of a car that we can keep keep track of maybe the year year is an integer, right? And finally like that. And then you can click run SQL. And it should create a new table of cars here. Now if you want to comment out SQL statements, you can press control. Let me see, you can use the dash dash at the beginning of the line and that would be ignore. So if I do that, it's not going to try to create the table again is just ignoring the code is like a comment. I just leave it there for reference. Let's insert into cars, right? What are the columns we got make model year values. So we got let's say, Ford, focus 1992. And then run SQL, it will add that there. And then it can keep adding there's no unique constraints. So this if I repeat that's going to add more and more. And I forget, we always have a ID that's unique, right? So primary key would have to be added as well. But that's the gist of it. And then it can do whatever we did before you can select star from cars and it should give you all of those and it can delete if you want to delete the duplicates. Let me delete from cars. This will delete everything. So maybe it's kind of hard to delete because they're all the same. There's no unique primary key. But anyway, if I do that's going to delete everything. So that's why you always need a unique ID have something to delete for. Yeah, so if you want to alter a table and add new columns, you could also do that. Let's say I were to add a column for the color of the car. So I would say alter table cars, add column. And then it's called a color. And colors is care, very care to 55. And you can do that. And should see a color column there. So you can do inserting to cars. Let me do on the civic 2010 color red semicolon and can see it's there. Okay. Now let me show you how you can play around your own computer if you install postgres. It should also come up with PG admin a graphical user interface client. And let me show you what it looks like. So PG admin looks like this. So you can hear if you don't have a server yet, if you expand, you don't have one by default can right click register server, and give it a name and then connection should be local host here. And you put a password if you have set up a password when they installed postgres. And it should use the default role call or username called postgres and the postgres database. Anyway, after creating that, you should be able to see it here. And you can expand to see all your databases. So okay, so we got this databases, and you can go here, you can try stuffing the default one called postgres. This is the default one. You can expand. And you should go to scheme and expand public expand and then finally tables here. And you can right click and create a new table if you want, or you can run SQL commands like we did. Let me show you two ways of doing that. If you click tools, there's the P SQL tool, which is basically command line. So you can go here, for example, I already have a table cars, I can say select star from cars. And it should run that for me. I had made that before, by the way, if you want to create a new one, you can do it here, create table cars, and they would type the whole thing, let's say make their character 55 like that. And let's put a single simplicity, I'll call it cars three, because I have cars already. And I will create a table if you refresh here, it should see it here here, or you can describe it here if you want to describe you can say backslash D space cars three, and we'll give information about the column for the table. And if you do backslash D, it gives you all the tables and sequences. So this is the command line version, if you want the graphics, of course, you can look here in the side and refresh every time you make something new, and you can expand to see the columns here, and other stuff like constraints. Okay, now the other more graphical way of doing that if you click tools, query tool, that one is like more in line with what we've been doing, right, like in the graphical user interface, I can do select star from cars three, semicolon, press the button play or the keyboard key f five. And I'll see that there, there's no record. So you might want to insert into cars three. Oops values. What's the make Honda. And press play and I should insert. And then you can either select star again, star from car three here again, press play, and you should see Honda there. Or if you want to do the graphical way, you can right click cars three view slash added data, and you can choose, for example, all the rows, it will basically run select star from cars three. By the way, I always add this public dot, which is the public schema, but you don't have to type that explicitly. And it's Honda. So very nice thing. I think this one you can even write, edit on the spot. I think let me try. Maybe not this one. On the two. Yeah, this one doesn't let me but sometimes you can, if you make a query here, I think query tool, select star from cars three, semicolon. And I I like to edit that sometimes I don't know why it's locked. Strange. I wonder why I could click the other stuff here, click this. Don't save this X. Let me close the other ones. So if you do scripts here, you can maybe forget how to do a sequel. You can actually get all this stuff. For example, I forget how to do insert, I can right click scripts, insert script, and it will write everything for me here. It will write all the columns and the values that I just have to change the question mark with here Toyota, for example. And it should insert for me. And there you go. All right. What else? So creating tables, if you were to create a table, you can do it like a manually or you can use the graphical user interface, right click create table. For example, that's your car for and choose the owner and then columns and you can add them here, for example, make. And it's going to be in this case, a very care, but here it's called character varying, but it's the same thing. Character varying, don't put this square brackets without. And it can give the 255, which is basically the parentheses. And it can add more click plus and so on and so on. Like if you want to add like a primary key here, you would choose either serial if you want the primary key to be integer, or big serial if you want the primary key to be a big int. Let's do serial. And I'll just call it ID. And you can mark here this not know, which is a constraint meaning this value needs to always be defined. It cannot be empty or no. And then you can check it's a primary key. Yes. And if you do that, it's going to do all this job to create the table. And if you right click properties, click columns again, you can see it's become integer with this next valve of the default, which basically taking for a sequence that's created when you have primary keys. If you expand that, actually, it's outside here sequences, you have a sequence for car street. If you right click that, it's just keeping track of the increments for the primary key and the maximum value. Anyway, what else? Yeah, if you forget how to create a table, you can right click scripts, create script, and it will actually give you how the commands to create the table. Of course, there's some additional stuff you shouldn't really care about like collate and whatever here. And the sequence stuff, usually you don't need that. But you can see the definition of the table in manual SQL here. So very useful if you forget SQL, want to refresh your memory, what's doing. Okay, so basically PG admin is just a front throw for a calling SQL queries behind the scenes. So instead of you manually typing the queries that you just click, make some clicks, type some stuff, and it will ultimately generate the SQL and execute that for you. I didn't mention that but let me try creating a table again, cars five with a table column make here, just so I can show you it's actually doing that. If you go to the last tab and click SQL, it's actually showing you what's going to run when I click save. So it's basically going to run create table cars five with the make column as a character varying, which is basically bear care. So you can play around and change, okay, let me change 255 here, what happens in a SQL? Oh, it adds parentheses 255. Okay, if I add another column here, year, and I put integer, what's going to happen in the SQL part, click SQL, oh, it adds year here and integer. So you can play around, see what's actually going on behind the scenes and learn SQL along the way. So very powerful tool to use PG admin. Okay, so I hope that's interesting to you. And I'll call it a day.