Advanced MySQL help new administrators with manipulating data either through inserting records from other queries or CSV files. CSV files are flat, comma-delimited files that contain data that can be read across numerous different platforms. If you need to transfer data to or from a MySQL database, you can do so without worry of unsupported platforms using CSV files. You also need to know how to replace data or delete records across multiple tables. This article focuses on these advanced concepts.
Inserting Data Using Queries
Just to recap, let's take a look at an INSERT statement to remember its syntax.
INSERT INTO Customer (first_name, last_name) VALUES (‘John', ‘Smith');
If you remember, the above statement inserts a new record into the Customer table. The data inserted is a first and last name, and we assume that there is an auto-incrementing column that keeps a unique column ID for the record.
The above INSERT statement is a basic SQL statement to insert data, but you can also use other queries to add new records to your tables. These queries are SELECT statements that pull records from an external source and insert the records into your database. This often happens when you import data from another source. You first create a new table from this external source, and then you use a SELECT statement to import data to your production tables.
Let's take an example of importing new customers. Your company acquires data from an external MySQL database. The external database has a Customer table, but the structure isn't the same as your Customer table. The external database has a customer's first and last name in the Customer table, but it stores the address in a second table. You want to create a temporary table that contains the customer's first name, last name, and the address from the second table. You could then use your temporary table to filter data and import it into your current production Customer table.
The following SQL code is an example of creating a table from a SELECT query:
CREATE TABLE TempCustomer AS
SELECT first_name, last_name
FROM ExternalCustomer AS c
WHERE c.create_date >= ‘8/1/2015';
In the above query, a new table named TempCustomer is created from the ExternalCustomer table. Only customers created after August 2015 are included, but you can also import all customers from ExternalCustomer with no WHERE clause. We're creating a TempCustomer table, because we don't want to import data directly into a production table before we review the query data and sample the information.
We mentioned, though, that we also want a customer's address, which is in a separate table. In the above query, we only get the first_name and last_name columns. We want to join the ExternalAddress table in the query to build a temporary table that includes the customer address. You can include a JOIN statement in your SQL statements that create tables.
Take a look at the following query:
CREATE TABLE TempCustomer AS
SELECT c.first_name, c.last_name, a.address
FROM ExternalCustomer AS c
INNER JOIN ExternalAddress AS a
ON c.customer_id = a.customer_id
WHERE c.create_date >= ‘8/1/2015';
In the above query, we still transform only data that was created in August 2015, but we included a JOIN statement that joins the customer table to the address table. We used the INNER JOIN statement. This means any customers that don't have a recorded address will be excluded. We don't want that, so let's change the JOIN statement to a LEFT JOIN. This will export all customers in the customer table and use the default NULL value for any customers with no addresses.
CREATE TABLE TempCustomer AS
SELECT c.first_name, c.last_name, a.address
FROM ExternalCustomer AS c
LEFT JOIN ExternalAddress AS a
ON c.customer_id = a.customer_id
WHERE c.create_date >= ‘8/1/2015';
One issue with continually running the same CREATE TABLE query is that once the table is created, you must drop the table before you can create it again. If you are unhappy with a new table that's created, run the DROP TABLE command to delete the table and recreate it.
With the table created, you can slice and dice the data any way you want, and then import the data into your main Customer table. Since we've already created a temporary table with the required customer fields, we can now import the data from the temporary table to the main production table. To insert records into an existing table, the syntax is similar but you use the INSERT statement instead of the CREATE TABLE statement.
When you work with the INSERT and SELECT statements, you must have the same number of columns in your SELECT statement that you have in the INSERT statement. Let's look at an example.
INSERT INTO Customer (first_name, last_name, address)
SELECT first_name, last_name, address FROM TempCustomer;
If we want to only import August 2015 records, we don't need to worry about the WHERE clause since we already filtered records in the original CREATE TABLE statement. Notice that we use the same number of columns in the SELECT statement as we have in our INSERT column. They also need to be in the same order as what is in your SELECT statement. If you reverse the order of columns in the SELECT statement, the data inserted into your Customer table is incorrect, which ruins data integrity and accuracy.
Also notice that the VALUES keyword isn't used in the INSERT statement when it uses a SELECT statement to import data. You can also use a WHERE clause to further filter data when you import into your production database.
Updating Tables from a SELECT
You know how to insert new records into a table, but there are times when you'll need to update records from an external database. For instance, suppose your table is a list of old customer records. You find out that the database was not properly updating data, but you have an external table that was made from logs. These logs have the customer first and last name along with the customer's address. You need to update your current customer table with the new information. With an UPDATE statement, you need some kind of unique linking information. You can't use first and last name, because several people can have the same name. In this example, the external log file table has the customer ID. This is the unique ID that we can use to update the customer's data.
If you recall, we mentioned that using an UPDATE query can cause data corruption if you don't carefully craft it with the right WHERE clause. The same is true for using the UPDATE statement with a SELECT statement.
You have two options when using the UPDATE statement with the SELECT statement. You can either join two tables together and use the SET statement, or you can use a subquery. We'll show you both directions.
This first example uses the JOIN statement.
UPDATE Customer AS c
LEFT JOIN TempCustomer AS tc ON c.customer_id = tc.customer_id
SET c.first_name = tc.first_name, c.last_name = tc.last_name, c.address = tc.address
WHERE c.customer_id > 5;
In the above statement, we use a new WHERE clause to filter the updated records. In this example, we only update customers with an ID greater than 5 in the original Customer table. Without the WHERE clause, any customers found in the temporary table are updated in the main production table. Using an UPDATE statement without a WHERE clause is a dangerous resolution to production data problems.
Let's look at the second SQL option.
UPDATE Customer
SET first_name = (SELECT first_name FROM TempCustomer where TempCustomer.customer_id = Customer.customer_id);
In this solution, we've removed the WHERE clause to focus on the subquery used to update customer records. First, look at the SELECT statement. The SELECT statement is the first statement to run. If a customer is found in the TempCustomer table with a matching ID in the Customer table, the record is returned. With a list of customers returned, the UPDATE statement then runs and sets the customer's first name. This second option is a lot less efficient for performance, but it's an option for people who don't want to use the JOIN option.
Deleting Data with Constraints
From previous chapters, we mentioned constraints and foreign keys associated with the InnoDB storage engine. We also mentioned that deleting records from a parent table would throw an error, since the deletion would leave orphaned records. For instance, with the Customer table linking to an Order table, deleting the customer would create orphaned records in the Order table since orders would no longer be associated with a customer.
But sometimes you must be able to delete data including customers. While deleting records is usually frowned on in the database management world, there are times when you need to delete records. For instance, suppose you accidentally imported duplicate customer records from the temporary table discussed in the earlier sections.
You have two options when you need to delete records with foreign key constraints. The first option is to use a standard DELETE statement on child records. For instance, with the Customer and Order example, you first run a DELETE statement on the Order table, and then you can delete any customers associated with these records. This can be tedious since you normally have several constraints in a relational database.
The other option is the set the ON DELETE CASCADE option when you create your tables. With this option, when you delete a customer, all associated child records are also deleted. In this case, any Order records are deleted along with the customer record to avoid any orphaned records.
Since we already have a Customer table, we need to use the ALTER TABLE statement to add the constraint and set the ON DELETE CASCADE option.
Take a look at the following statement.
ALTER TABLE Order
ADD CONSTRAINT fk_order1
FOREIGN KEY (customer_id)
REFERENCES Customer (customer_id)
ON DELETE CASCADE;
In the above statement, we create an actual foreign key constraint named fk_order1. The foreign key is on the customer_id column located in the Order table. The Order table is linked to the main Customer table on its customer_id.
The ON DELETE CASCADE statement tells the MySQL database engine to delete any records linked to the record you're deleting. It makes it easier to delete records without leaving orphaned records. One issue with this option is that you can accidentally delete records that you don't want to delete. For instance, if you don't realize that your database has an Order record, and you just want to delete customers and no other records, you would accidentally delete orders when you run the DELETE statement.
Importing Data from a CSV File
We've mentioned before that CSV files are universally understood formats. You can transfer from several different database systems if you can export data into a CSV. The MySQL database natively works with CSV files, so you don't need any external plugins or programs to import data from the flat files.
You need a valid comma-delimited file. Any errors in structure could cause a problem during import either in the form of malformed data, errors thrown by MySQL, or incorrect data input into certain fields.
Let's assume you want to import customer data from a CSV file. The following is an example of a properly formatted CSV file.
customer_id, first_name, last_name
5, "Jennifer", "Smith"
6, "Paul", "Smith"
7, "Andrew", "Johnson"
Notice that each field is separated by a comma, but also strings are enclosed in quotes. The quotes are important with format in case you have commas within your data. For instance, the following would cause an invalid CSV structure.
customer_id, first_name, last_name, address
5, "Jennifer", "Smith", 111 Comman, Law Ave
6, "Paul", "Smith", Note Street
7, "Andrew", "Johnson", Uncommon Location
The above CSV file contains an address field. In the first record, the address has a comma. Since the address column isn't enclosed in quotes, the comma in the address would cause an error. The MySQL server will truncate the data or throw an error. This is why you need all string values enclosed in quotes.
Let's take a look at the import function for the customer CSV file.
LOAD DATA INFILE 'c:/customer/customer.csv'
INTO TABLE Customer
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
The first SQL line gives the MySQL engine the location of your file. In this example, the CSV file is named "customer.csv." The next SQL line is the name of the table you want to insert into. In this case, we want to import a list of customers from the CSV file into the Customer table. The next line specifies what delimits each field. The comma character is common for delimited data, but you can have any number of characters set to delimit your data.
You'll recognize the ENCLOSED BY statement. We need to tell MySQL what encloses string value. In this case, it's the quote character. The "\n" termination means each record is terminated by a carriage return, which is created when you use the Enter key on your keyboard. This is also a common way to terminate records in a CSV file.
Because the first row in our file contains the names of each column field, you don't want to import the first line. The IGNORE 1 ROWS tells MySQL to ignore the first row and skip to line 2 to import the real data.
In most cases, your field names in the CSV file won't match up with the column names in your Customer table. Since CSV files are usually imports of an external data source, you can a various column names in a CSV file format. For instance, some column headers in a CSV file have spaces. You usually don't have spaces in a table column name, so this is an issue when you want to import the data. To overcome this hurdle, you can use the SET statement in your CSV import statement.
Let's assume you have different column names in your CSV file. The following is our new CSV file format.
customer_id, first name, last_name, address
5, "Jennifer", "Smith", "111 Comman, Law Ave"
6, "Paul", "Smith", "Note Street"
7, "Andrew", "Johnson", "Uncommon Location"
Notice that the "first name" column doesn't have the underscore just like our table column names. We need to specify the match between the "first_name" column in the Customer table and the "first name" column name in the CSV file.
The following is an example of using the SET statement with your CSV import files.
LOAD DATA INFILE 'c:/customer/customer.csv'
INTO TABLE Customer
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(customer_id, @firstname, last_name, address)
SET first_name = @firstname;
In the above SQL statement, everything from the previous import is the same except for the SET statement. Notice the column names in parenthesis are the same except for the @firstname user variable. Remember that the @ symbol indicates that the variable is a user variable that can contain any number of custom values.
The SET value then tells the MySQL server to use the field name contained in the @firstname variable as the value for the first_name column. Since we have a first_name column in the Customer table, MySQL knows to import the variable into the first_name column in our Customer table. If you have multiple CSV columns that don't match the target MySQL table, you must specify how to match the CSV column with the MySQL server tables.
Exporting Data to a CSV
You learned how to import data, but you will eventually need to export data into a CSV file. This happens when you need to send data to another department or a customer. You might need to send data to another platform such as Oracle or SQL Server. Since a CSV file is a universally understand format, you can export your data into a CSV file that can then be passed to another database server platform. MySQL has a native command that you can use to export data into a readable CSV format.
You use the SELECT SQL statement with the MySQL OUTFILE command. It's similar to the import statement, but it is slightly different since you need to send data to a flat file instead of importing it to your tables.
SELECT customer_id, first_name, last_name
FROM Customer
WHERE create_date >= ‘8/1/2015'
INTO OUTFILE 'C:/customers/customers.csv'
FIELDS ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY ';';
You'll recognize the statements that indicate format such as enclosed quotes and termination characters. Let's dissect this statement.
The first part of the MySQL export statement is the SELECT statement. You can use any SELECT statement including more advanced statements with joins and other clauses. In this example, we only want to export table data that has a create date greater than August 2015.
The OUTFILE statement tells MySQL where to store the file. In this case, we're making a copy of our data in the customer directory. The file's name is customer.csv. The rest of the SQL statement tells MySQL how to format the file. If you recall from the previous section, we needed to specify the characters that enclose strings. We use the quote character in this export statement as well.
There is one difference in our import statement compared to this export statement. In the import statement, we used the "\n" or carriage return as the termination character. In this export statement, we use the semicolon. This means that the flat file contains records that are terminated with a semicolon and won't have the standard carriage return or line feed to terminate records. This can sometimes be difficult if you need to read the data in the flat file before you send it to a third-party or review the data before you import it. However, it makes translation of formats between different platforms easier since carriage returns and line feed terminations are different in Windows and Linux.
After you run the file, you should get the following file format.
customer_id, first name, last_name, address
5, "Jennifer", "Smith", "111 Comman, Law Ave"; 6, "Paul", "Smith", "Note Street"; 7, "Andrew", "Johnson", "Uncommon Location"
This article showed you some more advanced MySQL concepts. You must be able to import and export data and use the SELECT query to filter records. You can export or important thousands of records, but a subset of your data is typical when you're working with large data sets. You can also use CSV import and export commands to build temporary tables that you then use to import into your production tables. For database administrators, these activities are common when you manage large databases of data across several different platforms.