See the following for the concepts and their interpretations used in the JCS for Elasticsearch product.
The cluster comprises one or more modes, providing the unified index and cross-node searching capacity. The ES cluster composed of several nodes has the redundancy capacity, guaranteeing overall service availability when one or more nodes fail at the same time. The cluster can be identified via its unique name, while the node can decide the JCS for Elasticsearch cluster to joint in. Each node belongs to one cluster only. Except features such as the redundancy capacities, the JCS for Elasticsearch cluster with only one node also can realize all storage and search functions. Seen from the external, the JCS for Elasticsearch cluster is a decentralized entity logically.
A running JCS for Elasticsearch instance is called as a node. Nodes in the cluster undertake the data storage and query requests together. When a node joins the cluster or is removed from the cluster, the cluster will re-distribute data among all the nodes evenly. Similarly to the cluster, nodes are identified via their names, and the random Marvel character names which are automatically generated at the starting time are adopted by default. Via the JCS for Elasticsearch cluster name configured to the node, the cluster joined by such node is determined.
JCS for Elasticsearch can store the data in one or more indexes. The indexes refer to the set of documents with similar features, equivalent to a database in SQL or a data storage schema. The indexes are the destinations where the relationship type documents are stored. When a document is stored into an index, such document can be retrieved and searched, and the new document will replace the old document when the document already exists. Indexes are identified via their names and operations of documents such as creation, search, update and deletion can be done by citing the names. Indexes of any quantity can be created as required in one JCS for Elasticsearch cluster.
One index can be sliced into several bottom physical Lucene indexes of the underlayer to finish index data division and storage function. Each physical Lucene index is referred to as one shard. Shards are the data container and documents are kept in shards. One shard is a working unit of underlayer and only keeps a part of all data. The inner part of each shard is a full-featured and independent index, being able to be stored by any host of the cluster. Each shard is stored among these nodes in a distributed way. When creating indexes, users may specify count of shards and the default count is 5. Shards can be divided into the main shards and the replica shards. The primary shards are used for document storage. Each new index will automatically create 5 primary shards. Certainly, the count of shards can be self-defined by configuration at the time of creating indexes. Once the creation is completed, the count of primary shards cannot be changed. The replica shard is the replica of the primary shard, used for redundant data and search performance improvement. Each primary shard is configured with one replica shard by default, and may be configured with several replica shards. The count of replica shards may be dynamically changed and automatically increased or decreased by ES depending on demands.
The type refers to the logical partition in the index, equivalent to a table of a database and with the meaning determined by the user’s demand. One or more types can be determined within an index. For example, one type used for storing user data, one type used for storing log data and one type used for storing comment data can be defined in an index.
The document is the most basic unit of index and can be viewed as one record row of a table of a database. Expressed in the JSON format, the documents of the same type are similar to each other to a certain degree.
The Mapping is the document outline stored in indexes. When creating the index, types and relevant attributes of fields can be predefined at first (for example, whether a field is stored, which analyzer is used and the significance). Users can define how to divide texts into tokens, which tokens can be filtered and which texts require additional processing according to needs.
Representing data recovery or being called as data re-distribution, ES will reassign index shards according to the load of the machines when nodes are added or exit. Data recovery shall be triggered when restarting the failed nodes.