Data Migration Tool

Products

Object Storage Service

2022-02-18 08:33:08

: Data Migration Tool Osstransfer

Function description

Osstransfer tool can migrate data stored locally and data stored by other objects to OSS, with the following characteristics:

  • Support rich data sources:
    • Local Data: Migrate data stored locally to OSS;
    • Other Object Storage Service: Currently support AWS S3, migrate Alibaba Cloud OSS, Tencent Cloud OSS, Baidu OSS, Huawei OBS and Qiniu Cloud Storage to JD Cloud OSS, which will be expanded continuously;
    • URL List: Download according to URL download list according to designated URL download list, migrate it to JD Cloud OSS;
    • Mutual Bucket Replication: Bucket data of JD Cloud OSS are replicated mutually to support data replication across regions, across accounts and in the same region.
  • Support breakpoint upload;
  • Support traffic control;
  • Support migrating the file with specific prefix;
  • Support parallel data download and upload;
  • Migration verification: verification after object migration.

Use Environment

System Environment

  • linux Environment

Software Dependence

  • jdk1.8

Use Method

1. Get tool

Download link: transfer-tools

2. Get Configuration File

Create or download configuration file under the same directory application.yml Download link: [Example yml File](https://downloads.oss.cn-north-1.jcloudcs.com/application.yml

3. Modify application.yml configuration file

Before performing migration start script, you are required to modify application.yml configuration file according to your own demands.

3.1 The fields descriptions of Osstransfer configuration file are shown as below:

Name Description Default Value
jobType the type of job: listObject,transfer。 listObject
sourceType Types of data source are urlfile, diskfile, s3file(AWS S3, Tencent Cloud COS, Baidu BOS, Huawei OBS, JD Cloud OSS) aliyunfile,disklistfile (List of Local Files), respectively. s3file
urlType When sourceType is urlfile, if the file list is not generated by migration tool, and it only has url information, it is required to configure urlType to onlyUrl. Null
filePath Address of Read File. When sourceType is urlfile, diskfile, filePath is a compulsory field. Null
urlFilePrefix When the file list is a url, we obtain the user's key value as part of the url address, and the user needs to configure the number of cut urls. Null, if it is configured, it is recommended that it is set as 7 at least, namely, the length of http://
ContentDispositionTooLongContinue When the Content-Disposition of the link exceeds 100, which is limited by the JD Cloud OSS limit, if the header value is not obtained and continues to be passed, it will be set to true; otherwise, the url will be printed in the error log, and the header value of the url will be modified by the user. false
task.limit.threadCount The number of files read simultaneously by the task limit. 20
task.limit.qps qps count limited by task, because put usage bandwidth is relatively low. The total bandwidth is partsize * qps. 50
transfer.coverFile Whether the migration will overwrite the file, the default is overwritten. true
transfer.put.maxsize The demarcation value of migration put and multipartupload, the unit is Byte, the default is 33554432, if modified, it is recommended to be a multiple of 4M. 33554432
transfer.multipart.partsize If block replication is used, the size of each block is in bytes, and the default is 32M. 33554432
transfer.multipart.threads The maximum number of concurrent shard replication, the default is 5. 5
src.access.id User's key accessKeyId. Null
src.secret.key User's key accessKeySecret. Null
src.endpoint Source endpoint,
Alibaba Cloud: https://help.aliyun.com/document_detail/31837.html?spm=a2c4g.11186623.6.572.6a537f5ewpHZJZ
Tencent Cloud: https://cloud.tencent.com/document/product/436/6224
Baidu Cloud: https://cloud.baidu.com/doc/BOS/S3.html#.E6.9C.8D.E5.8A.A1.E5.9F.9F.E5.90.8D
Huawei Cloud: https://support.huaweicloud.com/api-obs/zh-cn_topic_0136050628.html
None
src.bucket Source bucket name. Null
src.prefix If only part of the file is migrated, prefix shall be configured. If prefix is the digit started with 0, please use single or double quotation marks None
des.access.id The accessKeyId of JD Cloud. Null
des.secret.key The key accessKeySecret of JD Cloud. Null
des.endpoint For the service domain name of JD Cloud OSS, please refer to Server Domain Name. Null
des.bucket The bucket of the Object. Null
des.prefix If the file to be migrated is only migrated to a directory, des.prefix will be configured. If prefix is the digit started with 0, please use single or double quotation marks None

3.2 application.yml example

3.3.1 get file list (jobType: listObject)

3.3.1.1 listS3, get AWS S3, Tencent Cloud COS, Baidu BOS, Huawei OBS, JD Cloud OSS

jobType: listObject
sourceType: s3file
src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://s3.cn-north-1.jdcloud-oss.com
src.bucket : yourbucket
src.prefix :
3.3.1.2 listAliyun, get Alibaba Cloud OSS
jobType: listObject
sourceType: aliyunfile
src.access.id : AAAAAAAAAAAAAAAAAAAAAAAAA
src.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBB
src.endpoint : https://oss-cn-beijing.aliyuncs.com
src.bucket : yourbucket
src.prefix :
3.3.1.3 listdiskfile, get local file system
jobType: listObject
sourceType: diskfile
filePath: /yourpath

3.3.2 Configure migration task (jobType:transfer)

3.3.2.1 Migrate from AWS S3, Tencent Cloud COS, Baidu BOS, Huawei OBS to JD Cloud OSS

jobType: transfer
sourceType: s3file

src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://s3.cn-north-1.jdcloud-oss.com
src.bucket : yourbucket
src.prefix :

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

#Optional Field

task.limit.threadCount: 20
task.limit.qps: 50

transfer.coverFile: true
transfer.put.maxsize: 33554432
transfer.multipart.partsize: 33554432
transfer.multipart.threads: 5

3.3.2.2 Migrate from Alibaba Cloud OSS to JD Cloud OSS
jobType: transfer
sourceType: aliyunfile

src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://oss-cn-beijing.aliyuncs.com
src.bucket : yourbucket
src.prefix :

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

#Optional Field

task.limit.threadCount: 20
task.limit.qps: 50

transfer.coverFile: true
transfer.put.maxsize: 33554432
transfer.multipart.partsize: 33554432
transfer.multipart.threads: 5

3.3.2.3 Migrate locally to JD Cloud OSS

jobType: transfer
sourceType: diskfile

filePath: /yourpath

 

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint :https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

urlFilePrefix: 1

 #urlFilePrefix is set as 1, because if file system key is started with "/", then JD Cloud OSS does not support
3.3.2.4 Configure URL Migrate list data source to JD Cloud OSS
jobType: transfer
sourceType: urlfile
filePath: /path/onlyurl.txt
urlType: onlyUrl
urlFilePrefix: 35

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket

Configuration item Description
urlType When sourceType is urlfile, if the file list is not generated by migration tool, and it only has url information, it is required to configure urlType to onlyUrl.
filePath Address of Read File. If it migrates from designated URL list to JD Cloud OSS, then it must perform configuration of this part. The address, content of URL list are URL text, one row with one URL original address (such as https://abc.abc.com/xxx/yyy.txt, there is no need to add any double quotation marks or other symbols). The address of URL list is required to be absolute path: delimiter under Linux is a single slash, such as /a/b/c.txt. It only supports filling in file, and does not support directory

3.3.2.5 Configure JD Cloud Bucket Mutual Replication

If it migrates from one designated JD Cloud OSS to another Bucket, the designated value of sourceType is s3file.

jobType: transfer
sourceType: s3file

src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://s3.cn-north-1.jdcloud-oss.com
src.bucket : yourbucket
src.prefix :

 

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

4. Running Migration Tool

Linux

java -jar transfer-tools-java-1.0.0.jar --Dspring.config.location=application.yml

### Osstransfer Instructions
  1. Osstransfer Migration Tool is generally divided into three types of jobs, including listObject, transfer, respectively. During migration, each file put uploaded and multipart uploaded is subject to md5 comparison.

  2. The single file in file migration is 19.5T at largest.

  3. When implementing Data Migration Service from the cloud vendor, only the permission of public reading of the source Bucket is supported; otherwise, it will result in migration failure.

  4. The maximum length of JD Cloud OSS is 1022 bytes. Therefore, files larger than 1022 bytes cannot be migrated.

  5. AWS S3 endpoint only supports https.

  6. object key includes char(10) line feed, char(13) enter does not migrate.

  7. The maximum of JD Cloud OSS Content-Disposition is 100 bytes, it is recommended to set ContentDispositionTooLongContinue configuration item as true when using Osstransfer, In this way, when Content-Disposition exceeds 100 bytes, it is guaranteed that migration of files is successful.

Migration Principles and Process

Principle Description

Osstransfer Migration Tool uses sdk first to implement list for each data source to get the list of object. The goal is that if there are object changes during the migration, the migration tool will not be affected.

Migration Process

  1. During the migration process, the migration log will be printed to the./log directory by default.

All migrated files will be printed to audit-0.log, the files that migrated successfully will be printed to audit.success log, if the files that fail in migration need to be screened out, please use the command:

grep "1$" audit-0.log*
Implement screening.
  1. Audit Log Description
Name Description
version The version number of audit log is currently 1.
message If migration fails, the content is the reason of migration failure.
readline Read content of object list.
time Time of Migration.
url url of Source of Migration.
key Name of object of Migration.
messageFormat 0 represents successful formatting, 1 represents failure.
headHttpCode Status Code of head url.
objectSize object size.
jssMethod The used upload method includes PUT or MULTIPART.
getAmazonS3Client Get the status of s3client, 0 represents success, 1 represents failure.
getHttpCode The status code of get url.
responseEntity 0 represents that responseEntity is not null, 1 represents null.
uploadStatus 0represents put upload success, 1 represents upload failure.
checkStatus 0 indicates that the check succeeded after the put was uploaded, and 1 indicates that the check failed.
retryCount Time of Upload Try Again
abortMultipartUpload 0 indicates that the multipart upload succeeded, 1 indicates that the multipart upload failed, and aborting the multipart upload.
checkMultipartUpload 0 indicates that the file subject to check multipart upload is successful, and 1 indicates that the file subject to check multipart upload is failed.
responseTime Time-consuming of Migration.
result 0 represents upload success, 1 represents upload failure.
Feedback

开始与售前顾问沟通

可直接拨打电话 400-098-8505转1

我们的产品专家为您找到最合适的产品/解决⽅案

在线咨询 5*8⼩时

1v1线上咨询获取售前专业咨询

点击咨询
企微服务助手

专业产品顾问,随时随地沟通