How to restart Ambari datanode on failure: Automation

There are so many times where you need to restart the dead datanode immediately after some failure.

Here is a script which will help you to do the same.

Save a list of datanodes to a file.

curl -s -k -u username:password -H “X-Requested-By:ambari” -i -X GET http://<ambari-url>:8080/api/v1/clusters/<clustername>/services/HDFS/components/DATANODE | grep host_name | awk -F: ‘{print $2}’ | sed ‘s/”//g’ > datanode-list.txt

Script to check the datanode status and start if there is any dead datanode. Apply cron if needed.


while read -r line; do

#Get Status of Each datanode with reading hostname line by line from a file

status=`curl -s -k -u username:password -H “X-Requested-By:ambari” -i -X GET http://<ambari-url>:8080/api/v1/clusters/<clustername>/hosts/${line}/host_components/DATANODE | grep -i “desired_state” | awk -F”:” ‘{print $2}’ | sed ‘s/,//g’ | sed ‘s/”//g’`

#Check the status, Eithere datanode is already running or not
if [ $status == “STARTED” ]; then
echo “Datanode ${line} is in Running state”
echo “Datanode ${line} is ${status}”

#Restart dead datanode
curl -u username:password -i -H ‘X-Requested-By: ambari’ -X PUT -d ‘{“HostRoles”: {“state”: “STARTED”}}’ http://<ambari-url>:8080/api/v1/clusters/<clustername>/hosts/${line}/host_components/DATANODE

done < datanode-list.txt

Sr. DevOps Engineer