Sunday, November 17, 2019

Exposing system journal to HTTP REST endpoint.

Journal log naturally printing a lot of useful information, ranging from system kernel, critical, information, and etc log. If I would able to expose those log files onto a HTTP rest endpoint, then I could easily query all of the status of server without login to it. How cool, isn't it? Using the Flask and exposing any port(e.g. 5577).

from flask import Response
from flask import Flask
import json
import subprocess
import socket

app = Flask(__name__)

@app.route("/journal.json", methods = ['GET'])
def get_journal():
  journal_dict = {}
  cmd = "sudo journalctl -k -S today -o json"
  (output, _) = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, encoding='utf-8').communicate()
  json_list = [line.strip() for line in output.split("\n") if len(line) != 0]
  journal_dict["journal"] = json_list
  js = json.dumps(journal_dict)
  resp = Response(js, status=200, mimetype='application/json')
  resp.headers['Link'] = 'http://{}'.format(socket.gethostname())
  return resp

def main(port=5577):
  app.run(host='0.0.0.0', port=port)

if __name__ == '__main__':
  main()

Monday, November 4, 2019

Namenode migration in CDH cluster

  • Stop Namenode when they in the standby mode.
  • Backup the `dfs.name.dir, dfs.namenode.name.dir` directory.
  • Make sure the backup is restored at the same location on the target Namenode host and permission preserved.
  • Usually the permission is `hdfs:hadoop` if in case of missing it.
  • There are 5 important Namenode setting that need to be inherited to the target Namenode
    1. `dfs.ha.automatic-failover.enabled` checked
    2. `NameNode Nameservice`, usually it is `nameservice1`. But however it could vary depends on the case. It is good to record it before hand.
    3. `Mount Points`, ditto as per above
    4. `Quorum-based Storage Journal name` ditto as per above
    5. Java Opts setting
  • Namenode migration will be resonating with failover controller as well. Make sure they are migrated at the same time, same node.
  • Once we are ready for all the backup and recorded information, then we are good to kick the tyre get rolling.
  • First of all, delete the "Namenode(Standby)" role and "Failover Controller" from the host.
  • Add new role > select "Namenode" and "Failover Controller" to the new host.
  • Make sure all above mentioned Namenode settings are in place.
  • Close your palms and pray when you hit the start button on both. Should you start with Failover Controller first, later followed by Namenode.
  • Now you are relief once Namenode and Failover Controller started, you should now followed with a series of services restart.
  • Do a rolling restart on Data nodes 
  • Do a rolling restart on Hive Metastore Server
  • Do a rolling restart on Hive server
  • Do a rolling restart on Node Manager
  • Do a rolling restart on Resource Manager
  • Do a rolling restart on Oozie
  • Do a rolling restart on Httpfs
  • Do a rolling restart on Journal Nodes
  • Lastly Namenode
  • Show is completed. The End