Object Storage Service

Products

Object Storage Service

Documentation > Products > Object Storage Service > Data Migration Tool Osstransfer (Recommended)

Data Migration Tool Osstransfer

Function Description

Osstransfer tool can migrate data stored locally and data stored by other objects to OSS, with the following characteristics:

  • Support rich data sources:
    • Local Data: Migrate data stored locally to OSS;
    • Other Object Storage Service: Currently support AWS S3, Alibaba Cloud OSS, Tencent Cloud OSS, Baidu BOS, Huawei OBS, Qiniu Cloud Storage and others to be migrated to JD Cloud OSS, which will be expanded continuously;
    • URL List: Download according to URL download list according to designated URL download list, migrate it to JD Cloud OSS;
    • Mutual Bucket Replication: Bucket data of JD Cloud OSS are replicated mutually to support data replication across regions, across accounts and in the same region.
  • Support breakpoint upload;
  • Support traffic control;
  • Support migrating the file with specific prefix;
  • Support parallel data download and upload;
  • Migration verification: verification after object migration.

Use Environment

System Environment

  • linux Environment

Software Dependence

  • jdk1.8

Use Method

1. Get tool

Download link: transfer-tools

2. Get configuration file

Create or download configuration file under the same directory application.yml Download link: [Example yml File](https://downloads.oss.cn-north-1.jcloudcs.com/application.yml

3. Modify application.yml configuration file

Before performing migration start script, you are required to modify application.yml configuration file according to your own demands.

3.1 The fields descriptions of Osstransfer configuration file are shown as below:

Name Description Default Value
jobType Types of job, are listObject and transfer, respectively. listObject
sourceType Types of data source, are urlfile, diskfile, s3file(AWS S3, Tencent Cloud COS, Baidu BOS, Huawei OBS, JD Cloud OSS) aliyunfile,disklistfile (List of Local Files), respectively. s3file
urlType When sourceType is urlfile, if the file list is not generated by migration tool, and it only has url information, it is required to configure urlType to onlyUrl. Null
filePath Address of Read File. When sourceType is urlfile, diskfile, filePath is a compulsory field. Null
urlFilePrefix When the file list is url, we get the user’s key value as part of the url address, the user is required to configure the count of cut url. Null, if it is configured, it is recommended that it is set as 7 at least, namely, the length of http://
ContentDispositionTooLongContinue When Content-Disposition of the link exceeds 100 limited by JD Cloud OSS limit, if the header value is not gotten for continuous upload, it is configured as true. Otherwise, the url is printed in Error Logs, and the header value of url is modified by the use himself/herself. false
task.limit.threadCount The number of files read simultaneousness while the task is limited. 20
task.limit.qps qps count limited by task, because put usage bandwidth is relatively low. The total bandwidth is partsize * qps. 50
transfer.coverFile Whether migration replaces file, replacement in default. true
transfer.put.maxsize The boundary value of migration put and multipartupload, the unit is Byte, the default is 33554432, if it is required to be modify, it is recommended as multiple of 4M. 33554432
transfer.multipart.partsize If it is replicated by block, for the size of each multipart, the unit is Byte, the default is 32M. 33554432
transfer.multipart.threads For the maximum count of concurrency upon multipart replication, the default is 5. 5
src.access.id User’s Key accessKeyId. Null
src.secret.key User’s Key accessKeySecret. Null
src.endpoint Source endpoint,
Alibaba Cloud: https://help.aliyun.com/document_detail/31837.html?spm=a2c4g.11186623.6.572.6a537f5ewpHZJZ
Tencent Cloud: https://cloud.tencent.com/document/product/436/6224
Baidu Cloud: https://cloud.baidu.com/doc/BOS/S3.html#.E6.9C.8D.E5.8A.A1.E5.9F.9F.E5.90.8D
Huawei Cloud: https://support.huaweicloud.com/api-obs/zh-cn_topic_0136050628.html
Null
src.bucket Source bucket Name. Null
src.prefix If only part of file is migrated, it is required to configure prefix. If prefix is the digit started with 0, please use single or double quotation marks Null
des.access.id JD Cloud accessKeyId. Null
des.secret.key JD Cloud Key accessKeySecret. Null
des.endpoint Service domain of JD Cloud OSS, please refer to Server Domain. Null
des.bucket Target bucket. Null
des.prefix If the migrated file is only migrated to certain directory, it is required to configure des.prefix. If prefix is the digit started with 0, please use single or double quotation marks Null

3.2 application.yml example

3.3.1 Get File List (jobType: listObject)

3.3.1.1 listS3, get AWS S3, Tencent Cloud COS, Baidu BOS, Huawei OBS, JD Cloud OSS

jobType: listObject
sourceType: s3file
src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://s3.cn-north-1.jdcloud-oss.com
src.bucket : yourbucket
src.prefix :
3.3.1.2 listAliyun, get Alibaba Cloud OSS
jobType: listObject
sourceType: aliyunfile
src.access.id : AAAAAAAAAAAAAAAAAAAAAAAAA
src.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBB
src.endpoint : https://oss-cn-beijing.aliyuncs.com
src.bucket : yourbucket
src.prefix :
3.3.1.3 listdiskfile, get local file system
jobType: listObject
sourceType: diskfile
filePath: /yourpath

3.3.2 Configure migration task (jobType:transfer)

3.3.2.1 Migrate from AWS S3, Tencent Cloud COS, Baidu BOS, Huawei OBS to JD Cloud OSS

jobType: transfer
sourceType: s3file

src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://s3.cn-north-1.jdcloud-oss.com
src.bucket : yourbucket
src.prefix :

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

#Optional Field

task.limit.threadCount: 20
task.limit.qps: 50

transfer.coverFile: true
transfer.put.maxsize: 33554432
transfer.multipart.partsize: 33554432
transfer.multipart.threads: 5

3.3.2.2 Migrate from Alibaba Cloud OSS to JD Cloud OSS
jobType: transfer
sourceType: aliyunfile

src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://oss-cn-beijing.aliyuncs.com
src.bucket : yourbucket
src.prefix :

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

#Optional Field

task.limit.threadCount: 20
task.limit.qps: 50

transfer.coverFile: true
transfer.put.maxsize: 33554432
transfer.multipart.partsize: 33554432
transfer.multipart.threads: 5

3.3.2.3 Migrate locally to JD Cloud OSS


jobType: transfer
sourceType: diskfile

filePath: /yourpath

 

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint :https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

urlFilePrefix: 1

 #urlFilePrefix is set as 1, because if file system key is started with "/", then JD Cloud OSS does not support
3.3.2.4 Configure URL Migrate list data source to JD Cloud OSS
jobType: transfer
sourceType: urlfile
filePath: /path/onlyurl.txt
urlType: onlyUrl
urlFilePrefix: 35

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket

Configuration Item Description
urlType When sourceType is urlfile, if the file list is not generated by migration tool, and it only has url information, it is required to configure urlType to onlyUrl.
filePath Address of Read File. If it migrates from designated URL list to JD Cloud OSS, then it must perform configuration of this part. The address, content of URL list are URL text, one row with one URL original address (such as https://abc.abc.com/xxx/yyy.txt, there is no need to add any double quotation marks or other symbols). The address of URL list is required to be absolute path, delimiter under Linux is a single slash, such as /a/b/c.txt. Only support fill in file, not support directory

3.3.2.5 Configure JD Cloud Bucket mutual replication

If it migrates from one designated JD Cloud OSS to another Bucket, the designated value of sourceType is s3file.

jobType: transfer
sourceType: s3file

src.access.id : XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
src.secret.key: YYYYYYYYYYYYYYYYYYYYYYYYYYYYY
src.endpoint : https://s3.cn-north-1.jdcloud-oss.com
src.bucket : yourbucket
src.prefix :

 

des.access.id : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
des.secret.key: BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
des.endpoint : https://s3.cn-north-1.jdcloud-oss.com
des.bucket : yourbucket
des.prefix:

4. Running Migration Tool

Linux

java -jar transfer-tools-java-1.0.0.jar --Dspring.config.location=application.yml


Osstransfer Instructions
  1. Osstransfer Migration Tool is generally divided into three types of jobs, including listObject, transfer, respectively. During migration, each file put uploaded and multipart uploaded is subject to md5 comparison.

  2. The single file in file migration is 19.5T at largest.

  3. When implementing Data Migration Service from the cloud vendor, only the permission of public reading of the source Bucket is supported; otherwise, it will result in migration failure.

  4. The maximum length of JD Cloud OSS is 1022 bytes. Therefore, files larger than 1022 bytes cannot be migrated.

  5. AWS S3 endpoint only supports https.

  6. object key includes char(10) line feed, char(13) enter does not migrate.

  7. The maximum of JD Cloud OSS Content-Disposition is 100 bytes, it is recommended to set ContentDispositionTooLongContinue configuration item as true when using Osstransfer, In this way, when Content-Disposition exceeds 100 bytes, it is guaranteed that migration of files is successful.

Migration Principles and Process

Principle Description

Osstransfer Migration Tool uses sdk first to implement list for each data source to get the list of object. The goal is that if there are object changes during the migration, the migration tool will not be affected.

Migration Process

  1. During the migration process, the migration log will be printed to the./log directory by default.

All migrated files will be printed to audit-0.log, the files that migrated successfully will be printed to audit.success log, if the files that fail in migration need to be screened out, please use the command:

grep "1$" audit-0.log*
Implement screening.
  1. Audit Log Description
Name Description
version The version number of audit log is currently 1.
message If migration fails, the content is the reason of migration failure.
readline Read content of object list.
time Time of Migration.
url url of Source of Migration.
key Name of object of Migration.
messageFormat 0 represents successful formatting, 1 represents failure.
headHttpCode Status Code of head url.
objectSize Size of object.
jssMethod The used upload method includes PUT or MULTIPART.
getAmazonS3Client Get the status of s3client, 0 represents success, 1 represents failure.
getHttpCode Status Code of get url.
responseEntity 0 represents that responseEntity is not null, 1 represents null.
uploadStatus 0 represents put upload success, 1 represents upload failure.
checkStatus 0 represents check success after put is uploaded, 1 represents check failure.
retryCount Time of Upload Try Again.
abortMultipartUpload 0 represents multipart upload success, 1 represents sharding failure, abort such multipart upload.
checkMultipartUpload 0 represents successful check of multipart uploaded files, 1 represents check failure.
responseTime Time-consuming of Migration.
result 0 represents upload success, 1 represents upload failure.
Update Time:2019-10-17 18:47:35