MySQL :: MySQL 8.0 Reference Manual :: 22.4.13 ndb_import

5.7

MySQL 8.0 Reference Manual / ... / ndb_import — Import CSV Data Into NDB

22.4.13 ndb_import — Import CSV Data Into NDB

ndb_import imports CSV-formatted data, such as that produced by mysqldump --tab, directly into NDB using the NDB API. ndb_import requires a connection to an NDB management server (ndb_mgmd) to function; it does not require a connection to a MySQL Server.

Usage

ndb_import db_name file_name options

ndb_import requires two arguments. db_name is the name of the database where the table into which to import the data is found; file_name is the name of the CSV file from which to read the data; this must include the path to this file if it is not in the current directory. The name of the file must match that of the table; the file's extension, if any, is not taken into consideration. Options supported by ndb_import include those for specifying field separators, escapes, and line terminators, and are described later in this section. ndb_import must be able to connect to an NDB Cluster management server; for this reason, there must be an unused [api] slot in the cluster config.ini file.

To duplicate an existing table that uses a different storage engine, such as InnoDB, as an NDB table, use the mysql client to perform a SELECT INTO OUTFILE statement to export the existing table to a CSV file, then to execute a CREATE TABLE LIKE statement to create a new table having the same structure as the existing table, then perform ALTER TABLE ... ENGINE=NDB on the new table; after this, from the system shell, invoke ndb_import to load the data into the new NDB table. For example, an existing InnoDB table named myinnodb_table in a database named myinnodb can be exported into an NDB table named myndb_table in a database named myndb as shown here, assuming that you are already logged in as a MySQL user with the appropriate privileges:

In the mysql client:

mysql> USE myinnodb;

mysql> SELECT * INTO OUTFILE '/tmp/myndb_table.csv'
     >  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\'
     >  LINES TERMINATED BY '\n'
     >  FROM myinnodbtable;

mysql> CREATE DATABASE myndb;

mysql> USE myndb;

mysql> CREATE TABLE myndb_table LIKE myinnodb.myinnodb_table;

mysql> ALTER TABLE myndb_table ENGINE=NDB;

mysql> EXIT;
Bye
shell>

Once the target database and table have been created, a running mysqld is no longer required. You can stop it using mysqladmin shutdown or another method before proceeding, if you wish.

In the system shell:

# if you are not already in the MySQL bin directory:
shell> cd path-to-mysql-bin-dir

shell> ndb_import myndb /tmp/myndb_table.csv --fields-optionally-enclosed-by='"' \
    --fields-terminated-by="," --fields-escaped-by='\\'

The output should resemble what is shown here:

job-1 import myndb.myndb_table from /tmp/myndb_table.csv
job-1 [running] import myndb.myndb_table from /tmp/myndb_table.csv
job-1 [success] import myndb.myndb_table from /tmp/myndb_table.csv
job-1 imported 19984 rows in 0h0m9s at 2277 rows/s
jobs summary: defined: 1 run: 1 with success: 1 with failure: 0
shell>

The following table includes options that are specific to ndb_import. Additional descriptions follow the table. For options common to most NDB Cluster programs (including ndb_import), see Section 22.4.31, “Options Common to NDB Cluster Programs — Options Common to NDB Cluster Programs”.

Table 22.345 Command-line options for the ndb_import program

Format	Description	Added, Deprecated, or Removed
`--abort-on-error`	Dump core on any fatal error; used for debugging	All NDB 8.0 releases
`--ai-increment=#`	For table with hidden PK, specify autoincrement increment. See mysqld	All NDB 8.0 releases
`--ai-offset=#`	For table with hidden PK, specify autoincrement offset. See mysqld	All NDB 8.0 releases
`--ai-prefetch-sz=#`	For table with hidden PK, specify number of autoincrement values that are prefetched. See mysqld	All NDB 8.0 releases
`--connections=#`	Number of cluster connections to create	All NDB 8.0 releases
`--continue`	When job fails, continue to next job	All NDB 8.0 releases
`--db-workers=#`	Number of threads, per data node, executing database operations	All NDB 8.0 releases
`--errins-type=name`	Error insert type, for testing purposes; use "list" to obtain all possible values	All NDB 8.0 releases
`--errins-delay=#`	Error insert delay in milliseconds; random variation is added	All NDB 8.0 releases
`--fields-enclosed-by=char`	Same as FIELDS ENCLOSED BY option for LOAD DATA statements. For CSV input this is same as using --fields-optionally-enclosed-by	All NDB 8.0 releases
`--fields-escaped-by=name`	Same as FIELDS ESCAPED BY option for LOAD DATA statements	All NDB 8.0 releases
`--fields-optionally-enclosed-by=char`	Same as FIELDS OPTIONALLY ENCLOSED BY option for LOAD DATA statements	All NDB 8.0 releases
`--fields-terminated-by=char`	Same as FIELDS TERMINATED BY option for LOAD DATA statements.	All NDB 8.0 releases
`--idlesleep=#`	Number of milliseconds to sleep waiting for more to do	All NDB 8.0 releases
`--idlespin=#`	Number of times to re-try before idlesleep	All NDB 8.0 releases
`--ignore-lines=#`	Ignore first # lines in input file. Used to skip a non-data header.	All NDB 8.0 releases
`--input-type=name`	Input type: random or csv	All NDB 8.0 releases
`--input-workers=#`	Number of threads processing input. Must be 2 or more if --input-type is csv.	All NDB 8.0 releases
`--keep-state`	Preserve state files	All NDB 8.0 releases
`--lines-terminated-by=name`	Same as LINES TERMINATED BY option for LOAD DATA statements	All NDB 8.0 releases
`--log-level=#`	Set internal logging level; for debugging and development	All NDB 8.0 releases
`--max-rows=#`	Import only this number of input data rows; default is 0, which imports all rows	All NDB 8.0 releases
`--monitor=#`	Periodically print status of running job if something has changed (status, rejected rows, temporary errors). Value 0 disables. Value 1 prints any change seen. Higher values reduce status printing exponentially up to some pre-defined limit.	All NDB 8.0 releases
`--no-asynch`	Run database operations as batches, in single transactions	All NDB 8.0 releases
`--no-hint`	Do not use distribution key hint to select data node (TC)	All NDB 8.0 releases
`--opbatch=#`	A db execution batch is a set of transactions and operations sent to NDB kernel. This option limits NDB operations (including blob operations) in a db execution batch. Therefore it also limits number of asynch transactions. Value 0 is not valid	All NDB 8.0 releases
`--opbytes=#`	Limit bytes in execution batch (default 0 = no limit)	All NDB 8.0 releases
`--output-type=name`	Output type: ndb is default, null used for testing	All NDB 8.0 releases
`--output-workers=#`	Number of threads processing output or relaying database operations	All NDB 8.0 releases
`--pagesize=#`	Align I/O buffers to given size	All NDB 8.0 releases
`--pagecnt=#`	Size of I/O buffers as multiple of page size. CSV input worker allocates a double-sized buffer	All NDB 8.0 releases
`--polltimeout=#`	Timeout per poll for completed asynchonous transactions; polling continues until all polls are completed, or error occurs	All NDB 8.0 releases
`--rejects=#`	Limit number of rejected rows (rows with permanent error) in data load. Default is 0 which means that any rejected row causes a fatal error. The row exceeding the limit is also added to *.rej	All NDB 8.0 releases
`--resume`	If job aborted (temporary error, user interrupt), resume with rows not yet processed	All NDB 8.0 releases
`--rowbatch=#`	Limit rows in row queues (default 0 = no limit); must be 1 or more if --input-type is random	All NDB 8.0 releases
`--rowbytes=#`	Limit bytes in row queues (0 = no limit)	All NDB 8.0 releases
`--state-dir=name`	Where to write state files; currect directory is default	All NDB 8.0 releases
`--stats`	Save performance and statistics information in `.sto` and `.stt` files	All NDB 8.0 releases
`--tempdelay=#`	Number of milliseconds to sleep between temporary errors	All NDB 8.0 releases
`--temperrors=#`	Number of times a transaction can fail due to a temporary error, per execution batch; 0 means any temporary error is fatal. Such errors do not cause any rows to be written to .rej file	All NDB 8.0 releases
`--verbose=#`, `-v`	Enable verbose output	All NDB 8.0 releases

--abort-on-error

Property Value
Command-Line Format --abort-on-error
Type Boolean
Default Value FALSE

Dump core on any fatal error; used for debugging only.
--ai-increment=#

Property Value
Command-Line Format --ai-increment=#
Type Integer
Default Value 1
Minimum Value 1
Maximum Value 4294967295

For a table with a hidden primary key, specify the autoincrement increment, like the auto_increment_increment system variable does in the MySQL Server.
--ai-offset=#

Property Value
Command-Line Format --ai-offset=#
Type Integer
Default Value 1
Minimum Value 1
Maximum Value 4294967295

For a table with hidden primary key, specify the autoincrement offset. Similar to the auto_increment_offset system variable.
--ai-prefetch-sz=#

Property Value
Command-Line Format --ai-prefetch-sz=#
Type Integer
Default Value 1024
Minimum Value 1
Maximum Value 4294967295

For a table with a hidden primary key, specify the number of autoincrement values that are prefetched. Behaves like the ndb_autoincrement_prefetch_sz system variable does in the MySQL Server.
--connections=#

Property Value
Command-Line Format --connections=#
Type Integer
Default Value 1
Minimum Value 1
Maximum Value 4294967295

Number of cluster connections to create.
--continue

Property Value
Command-Line Format --continue
Type Boolean
Default Value FALSE

When a job fails, continue to the next job.
--db-workers=#

Property Value
Command-Line Format --db-workers=#
Type Integer
Default Value 4
Minimum Value 1
Maximum Value 4294967295

Number of threads, per data node, executing database operations.
--errins-type=name

Property Value
Command-Line Format --errins-type=name
Type Enumeration
Default Value [none]
Valid Values
stopjob
stopall
sighup
sigint
list

Error insert type; use list as the name value to obtain all possible values. This option is used for testing purposes only.
--errins-delay=#

Property Value
Command-Line Format --errins-delay=#
Type Integer
Default Value 1000
Minimum Value 0
Maximum Value 4294967295

Error insert delay in milliseconds; random variation is added. This option is used for testing purposes only.
--fields-enclosed-by=char

Property Value
Command-Line Format --fields-enclosed-by=char
Type String
Default Value [none]

This works in the same way as the FIELDS ENCLOSED BY option does for the LOAD DATA statement, specifying a character to be interpeted as quoting field values. For CSV input, this is the same as --fields-optionally-enclosed-by.
--fields-escaped-by=name

Property Value
Command-Line Format --fields-escaped-by=name
Type String
Default Value \

Specify an escape character in the same way as the FIELDS ESCAPED BY option does for the SQL LOAD DATA statement.
--fields-optionally-enclosed-by=char

Property Value
Command-Line Format --fields-optionally-enclosed-by=char
Type String
Default Value [none]

This works in the same way as the FIELDS OPTIONALLY ENCLOSED BY option does for the LOAD DATA statement, specifying a character to be interpeted as optionally quoting field values. For CSV input, this is the same as --fields-enclosed-by.
--fields-terminated-by=char

Property Value
Command-Line Format --fields-terminated-by=char
Type String
Default Value \t

This works in the same way as the FIELDS TERMINATED BY option does for the LOAD DATA statement, specifying a character to be interpeted as the field separator.
--idlesleep=#

Property Value
Command-Line Format --idlesleep=#
Type Integer
Default Value 1
Minimum Value 1
Maximum Value 4294967295

Number of milliseconds to sleep waiting for more work to perform.
--idlespin=#

Property Value
Command-Line Format --idlespin=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Number of times to retry before sleeping.
--ignore-lines=#

Property Value
Command-Line Format --ignore-lines=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Cause ndb_import to ignore the first # lines of the input file. This can be employed to skip a file header that does not contain any data.
--input-type=name

Property Value
Command-Line Format --input-type=name
Type Enumeration
Default Value csv
Valid Values
random
csv

Set the type of input type. The default is csv; random is intended for testing purposes only. .
--input-workers=#

Property Value
Command-Line Format --input-workers=#
Type Integer
Default Value 4
Minimum Value 1
Maximum Value 4294967295

Set the number of threads processing input.
--keep-state

Property Value
Command-Line Format --keep-state
Type Boolean
Default Value false

By default, ndb_import removes all state files (except non-empty *.rej files) when it completes a job. Specify this option (nor argument is required) to force the program to retain all state files instead.
--lines-terminated-by=name

Property Value
Command-Line Format --lines-terminated-by=name
Type String
Default Value \n

This works in the same way as the LINES TERMINATED BY option does for the LOAD DATA statement, specifying a character to be interpeted as end-of-line.
--log-level=#

Property Value
Command-Line Format --log-level=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 2

Performs internal logging at the given level. This option is intended primarily for internal and development use.
In debug builds of NDB only, the logging level can be set using this option to a maximum of 4.
--max-rows=#

Property Value
Command-Line Format --max-rows=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Import only this number of input data rows; the default is 0, which imports all rows.
--monitor=#

Property Value
Command-Line Format --monitor=#
Type Integer
Default Value 2
Minimum Value 0
Maximum Value 4294967295

Periodically print the status of a running job if something has changed (status, rejected rows, temporary errors). Set to 0 to disable this reporting. Setting to 1 prints any change that is seen. Higher values reduce the frequency of this status reporting.
--no-asynch

Property Value
Command-Line Format --no-asynch
Type Boolean
Default Value FALSE

Run database operations as batches, in single transactions.
--no-hint

Property Value
Command-Line Format --no-hint
Type Boolean
Default Value FALSE

Do not use distribution key hinting to select a data node.
--opbatch=#

Property Value
Command-Line Format --opbatch=#
Type Integer
Default Value 256
Minimum Value 1
Maximum Value 4294967295

Set a limit on the number of operations (including blob operations), and thus the number of asynchronous transactions, per execution batch.
--opbytes=#

Property Value
Command-Line Format --opbytes=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Set a limit on the number of bytes per execution batch. Use 0 for no limit.
--output-type=name

Property Value
Command-Line Format --output-type=name
Type Enumeration
Default Value ndb
Valid Values null

Set the output type. ndb is the default. null is used only for testing.
--output-workers=#

Property Value
Command-Line Format --output-workers=#
Type Integer
Default Value 2
Minimum Value 1
Maximum Value 4294967295

Set the number of threads processing output or relaying database operations.
--pagesize=#

Property Value
Command-Line Format --pagesize=#
Type Integer
Default Value 4096
Minimum Value 1
Maximum Value 4294967295

Align I/O buffers to the given size.
--pagecnt=#

Property Value
Command-Line Format --pagecnt=#
Type Integer
Default Value 64
Minimum Value 1
Maximum Value 4294967295

Set the size of I/O buffers as multiple of page size. The CSV input worker allocates buffer that is doubled in size.
--polltimeout=#

Property Value
Command-Line Format --polltimeout=#
Type Integer
Default Value 1000
Minimum Value 1
Maximum Value 4294967295

Set a timeout per poll for completed asynchonous transactions; polling continues until all polls are completed, or until an error occurs.
--rejects=#

Property Value
Command-Line Format --rejects=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Limit the number of rejected rows (rows with permanent errors) in the data load. The default is 0, which means that any rejected row causes a fatal error. Any rows causing the limit to be exceeded are added to the .rej file.
The limit imposed by this option is effective for the duration of the current run. A run restarted using --resume is considered a “new” run for this purpose.
--resume

Property Value
Command-Line Format --resume
Type Boolean
Default Value FALSE

If a job is aborted (due to a temporary db error or when interrupted by the user), resume with any rows not yet processed.
--rowbatch=#

Property Value
Command-Line Format --rowbatch=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Set a limit on the number of rows per row queue. Use 0 for no limit.
--rowbytes=#

Property Value
Command-Line Format --rowbytes=#
Type Integer
Default Value 262144
Minimum Value 0
Maximum Value 4294967295

Set a limit on the number of bytes per row queue. Use 0 for no limit.
--stats

Property Value
Command-Line Format --stats
Type Boolean
Default Value false

Save information about options related to performance and other internal statistics in files named *.sto and *.stt. These files are always kept on successful completion (even if --keep-state is not also specified).
--state-dir=name

Property Value
Command-Line Format --state-dir=name
Type String
Default Value .

Where to write the state files (tbl_name.map, tbl_name.rej, tbl_name.res, and tbl_name.stt) produced by a run of the program; the default is the current directory.
--tempdelay=#

Property Value
Command-Line Format --tempdelay=#
Type Integer
Default Value 10
Minimum Value 0
Maximum Value 4294967295

Number of milliseconds to sleep between temporary errors.
--temperrors=#

Property Value
Command-Line Format --temperrors=#
Type Integer
Default Value 0
Minimum Value 0
Maximum Value 4294967295

Number of times a transaction can fail due to a temporary error, per execution batch. The default is 0, which means that any temporary error is fatal. Temporary errors do not cause any rows to be added to the .rej file.
--verbose, -v

Property Value
Command-Line Format --verbose
Type Boolean
Default Value false

Enable verbose output.

Property	Value
Command-Line Format	`--abort-on-error`
Type	Boolean
Default Value	`FALSE`

Property	Value
Command-Line Format	`--ai-increment=#`
Type	Integer
Default Value	`1`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--ai-offset=#`
Type	Integer
Default Value	`1`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--ai-prefetch-sz=#`
Type	Integer
Default Value	`1024`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--connections=#`
Type	Integer
Default Value	`1`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--continue`
Type	Boolean
Default Value	`FALSE`

Property	Value
Command-Line Format	`--db-workers=#`
Type	Integer
Default Value	`4`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--errins-type=name`
Type	Enumeration
Default Value	`[none]`
Valid Values	`stopjob` `stopall` `sighup` `sigint` `list`

Property	Value
Command-Line Format	`--errins-delay=#`
Type	Integer
Default Value	`1000`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--fields-enclosed-by=char`
Type	String
Default Value	`[none]`

Property	Value
Command-Line Format	`--fields-escaped-by=name`
Type	String
Default Value	`\`

Property	Value
Command-Line Format	`--fields-optionally-enclosed-by=char`
Type	String
Default Value	`[none]`

Property	Value
Command-Line Format	`--fields-terminated-by=char`
Type	String
Default Value	`\t`

Property	Value
Command-Line Format	`--idlesleep=#`
Type	Integer
Default Value	`1`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--idlespin=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--ignore-lines=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--input-type=name`
Type	Enumeration
Default Value	`csv`
Valid Values	`random` `csv`

Property	Value
Command-Line Format	`--input-workers=#`
Type	Integer
Default Value	`4`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--keep-state`
Type	Boolean
Default Value	`false`

Property	Value
Command-Line Format	`--lines-terminated-by=name`
Type	String
Default Value	`\n`

Property	Value
Command-Line Format	`--log-level=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`2`

Property	Value
Command-Line Format	`--max-rows=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--monitor=#`
Type	Integer
Default Value	`2`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--no-asynch`
Type	Boolean
Default Value	`FALSE`

Property	Value
Command-Line Format	`--no-hint`
Type	Boolean
Default Value	`FALSE`

Property	Value
Command-Line Format	`--opbatch=#`
Type	Integer
Default Value	`256`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--opbytes=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--output-type=name`
Type	Enumeration
Default Value	`ndb`
Valid Values	`null`

Property	Value
Command-Line Format	`--output-workers=#`
Type	Integer
Default Value	`2`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--pagesize=#`
Type	Integer
Default Value	`4096`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--pagecnt=#`
Type	Integer
Default Value	`64`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--polltimeout=#`
Type	Integer
Default Value	`1000`
Minimum Value	`1`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--rejects=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--resume`
Type	Boolean
Default Value	`FALSE`

Property	Value
Command-Line Format	`--rowbatch=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--rowbytes=#`
Type	Integer
Default Value	`262144`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--stats`
Type	Boolean
Default Value	`false`

Property	Value
Command-Line Format	`--state-dir=name`
Type	String
Default Value	`.`

Property	Value
Command-Line Format	`--tempdelay=#`
Type	Integer
Default Value	`10`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--temperrors=#`
Type	Integer
Default Value	`0`
Minimum Value	`0`
Maximum Value	`4294967295`

Property	Value
Command-Line Format	`--verbose`
Type	Boolean
Default Value	`false`

As with LOAD DATA, options for field and line formatting much match those used to create the CSV file, whether this was done using SELECT INTO ... OUTFILE, or by some other means. There is no equivalent to the LOAD DATA statement STARTING WITH option.

ndb_import was added in NDB 7.6.2.

PREV HOME UP NEXT