It is one of the old commands implemented in MySQL. As such it is very fast, and it has been optimized for both MyISAM and InnoDB.
All is well when you are loading data into a standalone server. All is almost well when you are using replication. LOAD DATA used to be a problem in old versions of MYSQL, prior to 4.1. With recent versions of MySQL, it is replicated correctly, and sometimes efficiently.
The trouble starts when the data file is big. The definition of big and the amount of trouble that you can get depends on many factors. That's why users may not realize that this problem exists, even with fairly large files, and then being hit by this disaster when the file is only a little larger than the previous ones.
First, let me explain what happens when you replicate LOAD DATA.
- The LOAD DATA query runs in the master.
- When the query is finished, the master starts pumping data to the binary log.
- The slave receives the binary log, and it will create a copy of the data file in the default temporary directory.
- The slave executes the LOAD DATA query using the temporary file.
- When the slave is done loading the data, the temporary file is deleted
- The data from the relay log is deleted
At the end of the exercise, your data is only in the database table, both in the master and in the slaves. However, during the loading, each slave needs THREE TIMES THE STORAGE of the initial data file size, not counting the indexes.
If your data is 10 GB, then you will need 20 GB on the master (10 for the table, 10 for the binary log, and eventually 10 more for the indexes).
On the slave, you will need 30 GB: 10 for the table (+ indexes if requested), 10 for the relay logs, and 10 for the file in the temporary directory. The last part is the tricky one. The temporary directory is whatever is indicated in the
TMPDIRsystem variable. If that directory is in a partition with less than 10 GB free, your replication will break, even if your data directory has terabytes of free space.