Products

TiDB Service

2022-02-21 09:14:13

Large data amount processing

Import Data

If there is a unique key and the business side can ensure that there is no conflict in the data, this switch can be enabled in the session:


SET @@session.tidb_skip_constraint_check=1;

In addition, in order to improve the write performance, the parameters of TiKV can be tuned. Pay special attention to this parameter:


[raftstore]
# true by default, meaning to copy the date to the disk compulsorily For the business scenarios of non-finance security level, it is recommended to set as false,
# To obtain higher performance.
sync-log = true

Write Data

As mentioned above, TiDB has a limit on the size of a single transaction. This limit is at KV level. If it is reflected at the SQL level, simply put, a row of data will be mapped to a KV entry, and one KV entry will be added for each additional index. Therefore, this limit is reflected at the SQL level as:

  • A single row of data is no larger than 6MB
  • Total row count *(1 + index count) < 30w
  • All data submitted at one lime is less than 100MB

In addition, it should be noted that regardless of the size limit or rows count limit, the overhead of TiDB coding and additional keys of transaction should also be considered. When using it, it is recommended that the rows count of each transaction should not exceed 1w, otherwise it may exceed the limit or the performance may be poor.

It is recommended that the Insert, Update and Delete statements should be limited by Batch or adding limit.

When deleting a large amount of data, it is recommended to use Delete * from t where xx limit 5000 to delete through a loop, with Affected Rows == 0 as the end condition of the loop, so as to avoid the limit on transaction size.

It a large amount of data is deleted at one time, this loop method will be slower and slower, because each deletion is traversed from front to back. After the previous deletion, many deletion marks will remain in a short time (which will be collected as garbage later), affecting subsequent Delete statements. If possible, you are recommended to refine the Where condition. For example, suppose you want to delete all data on May 26, 2017, then you can do this,


for i from 0 to 23:
    while affected_rows > 0:
	delete * from t where insert_time >= i:00:00 and insert_time < (i+1):00:00 limit 5000;
	affected_rows = select affected_rows()

The above is a piece of pseudo code, which means to split large pieces of data into small pieces for deletion, so as to avoid the previous Delete statement affecting subsequent Delete statement during the deletion.

Feedback

开始与售前顾问沟通

可直接拨打电话 400-098-8505转1

我们的产品专家为您找到最合适的产品/解决⽅案

在线咨询 5*8⼩时

1v1线上咨询获取售前专业咨询

点击咨询
企微服务助手

专业产品顾问,随时随地沟通