Hi,
If this happen out of the blue and no one was working with system at that time, then most likely there is some problem with disks, network or other infrastructure elements.
In general what I'd recommend in doing is trying to manually assign shard to a node with command like below, where
- <index_name> is the name of the index for rerouting shard
- <shard_id> is the ID/number of the shard
- <node_name> is the name of the node you'll be rerouting shard to
POST /_cluster/reroute?
{
"commands": [
{
"allocate_empty_primary": {
"index": "<index_name>", "shard": <shard_id>, "node": "<node-name>", "accept_data_loss": true
}
}
]
}
Also it is worth noticing, that this is multinode environment, so it is very highly recommended to setup replication for shards - for that exact reasons. Replicas are 1:1 copy of a shard and their main function is to replace primary shard when such is unavailable. Replicas are also active in producing responses for queries, so even if they increase disk consumption in return they can contribute to faster responses to ELS users.