:_mod-docs-content-type: PROCEDURE
[id="migrating-ceph-mds_{context}"]

= Migrating {Ceph} MDS to new nodes within the existing cluster

You can migrate the MDS daemon when {rhos_component_storage_file_first_ref}, deployed with either a cephfs-native or ceph-nfs back end, is part of the overcloud deployment. The MDS migration is performed by `cephadm`, and you move the daemons placement from a hosts-based approach to a label-based approach.
ifeval::["{build}" != "upstream"]
This ensures that you can visualize the status of the cluster and where daemons are placed by using the `ceph orch host` command. You can also have a general view of how the daemons are co-located within a given host, as described in the Red Hat Knowledgebase article https://access.redhat.com/articles/1548993[Red Hat Ceph Storage: Supported configurations].
endif::[]
ifeval::["{build}" != "downstream"]
This ensures that you can visualize the status of the cluster and where daemons are placed by using the `ceph orch host` command, and have a general view of how the daemons are co-located within a given host.
endif::[]

.Prerequisites

* Complete the tasks in your {rhos_prev_long} {rhos_prev_ver} environment. For more information, see xref:red-hat-ceph-storage-prerequisites_configuring-network[{Ceph} prerequisites].

.Procedure

. Verify that the {CephCluster} cluster is healthy and check the MDS status:
+
----
$ sudo cephadm shell -- ceph fs ls
name: cephfs, metadata pool: manila_metadata, data pools: [manila_data ]

$ sudo cephadm shell -- ceph mds stat
cephfs:1 {0=mds.controller-2.oebubl=up:active} 2 up:standby

$ sudo cephadm shell -- ceph fs status cephfs

cephfs - 0 clients
======
RANK  STATE         	MDS           	ACTIVITY 	DNS	INOS   DIRS   CAPS
 0	active  mds.controller-2.oebubl  Reqs:	0 /s   696	196	173  	0
  	POOL     	TYPE 	USED  AVAIL
manila_metadata  metadata   152M   141G
  manila_data  	data	3072M   141G
  	STANDBY MDS
mds.controller-0.anwiwd
mds.controller-1.cwzhog
----

. Retrieve more detailed information on the Ceph File System (CephFS) MDS status:
+
----
$ sudo cephadm shell -- ceph fs dump

e8
enable_multiple, ever_enabled_multiple: 1,1
default compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'cephfs' (1)
fs_name cephfs
epoch   5
flags   12 joinable allow_snaps allow_multimds_snaps
created 2024-01-18T19:04:01.633820+0000
modified    	2024-01-18T19:04:05.393046+0000
tableserver 	0
root	0
session_timeout 60
session_autoclose   	300
max_file_size   1099511627776
required_client_features    	{}
last_failure	0
last_failure_osd_epoch  0
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2}
max_mds 1
in  	0
up  	{0=24553}
failed
damaged
stopped
data_pools  	[7]
metadata_pool   9
inline_data 	disabled
balancer
standby_count_wanted	1
[mds.mds.controller-2.oebubl{0:24553} state up:active seq 2 addr [v2:172.17.3.114:6800/680266012,v1:172.17.3.114:6801/680266012] compat {c=[1],r=[1],i=[7ff]}]


Standby daemons:

[mds.mds.controller-0.anwiwd{-1:14715} state up:standby seq 1 addr [v2:172.17.3.20:6802/3969145800,v1:172.17.3.20:6803/3969145800] compat {c=[1],r=[1],i=[7ff]}]
[mds.mds.controller-1.cwzhog{-1:24566} state up:standby seq 1 addr [v2:172.17.3.43:6800/2227381308,v1:172.17.3.43:6801/2227381308] compat {c=[1],r=[1],i=[7ff]}]
dumped fsmap epoch 8
----

. Check the OSD blocklist and clean up the client list:
+
----
$ sudo cephadm shell -- ceph osd blocklist ls
..
..
for item in $(sudo cephadm shell -- ceph osd blocklist ls | awk '{print $1}'); do
     sudo cephadm shell -- ceph osd blocklist rm $item;
done
----
+
[NOTE]
====
When a file system client is unresponsive or misbehaving, the access to the file system might be forcibly terminated. This process is called eviction. Evicting a CephFS client prevents it from communicating further with MDS daemons and OSD daemons.

Ordinarily, a blocklisted client cannot reconnect to the servers; you must unmount and then remount the client. However, permitting a client that was evicted to attempt to reconnect can be useful. Because CephFS uses the RADOS OSD blocklist to control client eviction, you can permit CephFS clients to reconnect by removing them from the blocklist.
====

. Get the hosts that are currently part of the {Ceph} cluster:
+
----
[ceph: root@controller-0 /]# ceph orch host ls
HOST                        ADDR           LABELS          STATUS
cephstorage-0.redhat.local  192.168.24.25  osd
cephstorage-1.redhat.local  192.168.24.50  osd
cephstorage-2.redhat.local  192.168.24.47  osd
controller-0.redhat.local   192.168.24.24  _admin mgr mon
controller-1.redhat.local   192.168.24.42  mgr _admin mon
controller-2.redhat.local   192.168.24.37  mgr _admin mon
6 hosts in cluster
----

. Apply the MDS labels to the target nodes:
+
----
for item in $(sudo cephadm shell --  ceph orch host ls --format json | jq -r '.[].hostname'); do
    sudo cephadm shell -- ceph orch host label add  $item mds;
done
----

. Verify that all the hosts have the MDS label:
+
----
$ sudo cephadm shell -- ceph orch host ls

HOST                    	ADDR       	   LABELS
cephstorage-0.redhat.local  192.168.24.11  osd mds
cephstorage-1.redhat.local  192.168.24.12  osd mds
cephstorage-2.redhat.local  192.168.24.47  osd mds
controller-0.redhat.local   192.168.24.35  _admin mon mgr mds
controller-1.redhat.local   192.168.24.53  mon _admin mgr mds
controller-2.redhat.local   192.168.24.10  mon _admin mgr mds
----

. Dump the current MDS spec:
+
----

$ SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"}
$ mkdir -p ${SPEC_DIR}
$ sudo cephadm shell -- ceph orch ls --export mds > ${SPEC_DIR}/mds
----

. Edit the retrieved spec and replace the `placement.hosts` section with
`placement.label`:
+
----
service_type: mds
service_id: mds
service_name: mds.mds
placement:
  label: mds
----

. Use the `ceph orchestrator` to apply the new MDS spec:
+
----
$ SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"}
$ sudo cephadm shell -m ${SPEC_DIR}/mds -- ceph orch apply -i /mnt/mds

Scheduling new mds deployment ...
----
+
This results in an increased number of MDS daemons.

. Check the new standby daemons that are temporarily added to the CephFS:
+
----
$ sudo cephadm shell -- ceph fs dump

Active

standby_count_wanted    1
[mds.mds.controller-0.awzplm{0:463158} state up:active seq 307 join_fscid=1 addr [v2:172.17.3.20:6802/51565420,v1:172.17.3.20:6803/51565420] compat {c=[1],r=[1],i=[7ff]}]


Standby daemons:

[mds.mds.cephstorage-1.jkvomp{-1:463800} state up:standby seq 1 join_fscid=1 addr [v2:172.17.3.135:6820/2075903648,v1:172.17.3.135:6821/2075903648] compat {c=[1],r=[1],i=[7ff]}]
[mds.mds.controller-2.gfrqvc{-1:475945} state up:standby seq 1 addr [v2:172.17.3.114:6800/2452517189,v1:172.17.3.114:6801/2452517189] compat {c=[1],r=[1],i=[7ff]}]
[mds.mds.cephstorage-0.fqcshx{-1:476503} state up:standby seq 1 join_fscid=1 addr [v2:172.17.3.92:6820/4120523799,v1:172.17.3.92:6821/4120523799] compat {c=[1],r=[1],i=[7ff]}]
[mds.mds.cephstorage-2.gnfhfe{-1:499067} state up:standby seq 1 addr [v2:172.17.3.79:6820/2448613348,v1:172.17.3.79:6821/2448613348] compat {c=[1],r=[1],i=[7ff]}]
[mds.mds.controller-1.tyiziq{-1:499136} state up:standby seq 1 addr [v2:172.17.3.43:6800/3615018301,v1:172.17.3.43:6801/3615018301] compat {c=[1],r=[1],i=[7ff]}]
----

. To migrate MDS to the target nodes, set the MDS affinity that manages the MDS failover:
+
[NOTE]
It is possible to elect a dedicated MDS as "active" for a particular file system. To configure this preference, `CephFS` provides a configuration option for MDS called `mds_join_fs`, which enforces this affinity.
When failing over MDS daemons, cluster monitors prefer standby daemons with `mds_join_fs` equal to the file system name with the failed rank. If no standby exists with `mds_join_fs` equal to the file system name, it chooses an unqualified standby as a replacement.
+
----
$ sudo cephadm shell -- ceph config set mds.mds.cephstorage-0.fqcshx mds_join_fs cephfs
----

* Replace `mds.mds.cephstorage-0.fqcshx` with the daemon deployed on
  `cephstorage-0` that was retrieved from the previous step.

. Remove the labels from the Controller nodes and force the MDS failover to the
target node:
+
----
$ for i in 0 1 2; do sudo cephadm shell -- ceph orch host label rm "controller-$i.redhat.local" mds; done

Removed label mds from host controller-0.redhat.local
Removed label mds from host controller-1.redhat.local
Removed label mds from host controller-2.redhat.local
----
+
The switch to the target node happens in the background. The new active MDS is the one that you set by using the `mds_join_fs` command.

. Check the result of the failover and the new deployed daemons:
+
----
$ sudo cephadm shell -- ceph fs dump
…
…
standby_count_wanted    1
[mds.mds.cephstorage-0.fqcshx{0:476503} state up:active seq 168 join_fscid=1 addr [v2:172.17.3.92:6820/4120523799,v1:172.17.3.92:6821/4120523799] compat {c=[1],r=[1],i=[7ff]}]


Standby daemons:

[mds.mds.cephstorage-2.gnfhfe{-1:499067} state up:standby seq 1 addr [v2:172.17.3.79:6820/2448613348,v1:172.17.3.79:6821/2448613348] compat {c=[1],r=[1],i=[7ff]}]
[mds.mds.cephstorage-1.jkvomp{-1:499760} state up:standby seq 1 join_fscid=1 addr [v2:172.17.3.135:6820/452139733,v1:172.17.3.135:6821/452139733] compat {c=[1],r=[1],i=[7ff]}]


$ sudo cephadm shell -- ceph orch ls

NAME                     PORTS   RUNNING  REFRESHED  AGE  PLACEMENT
crash                                6/6  10m ago    10d  *
mds.mds                          3/3  10m ago    32m  label:mds


$ sudo cephadm shell -- ceph orch ps | grep mds


mds.mds.cephstorage-0.fqcshx  cephstorage-0.redhat.local                     running (79m)     3m ago  79m    27.2M        -  17.2.6-100.el9cp  1af7b794f353  2a2dc5ba6d57
mds.mds.cephstorage-1.jkvomp  cephstorage-1.redhat.local                     running (79m)     3m ago  79m    21.5M        -  17.2.6-100.el9cp  1af7b794f353  7198b87104c8
mds.mds.cephstorage-2.gnfhfe  cephstorage-2.redhat.local                     running (79m)     3m ago  79m    24.2M        -  17.2.6-100.el9cp  1af7b794f353  f3cb859e2a15
----

ifeval::["{build}" != "downstream"]
.Useful resources

* https://docs.ceph.com/en/reef/cephfs/eviction[cephfs - eviction]

* https://docs.ceph.com/en/reef/cephfs/standby/#configuring-mds-file-system-affinity[ceph mds - affinity]
endif::[]