Tuesday, November 20, 2018

Hadoop utils: YML to XML parser

Dealing with XML files especially those from hadoop configuration is really painful. So, I have an idea to keep all the configurations in the YAML format, and write a parser to convert them into XML format.

e.g. a hadoop-site.yml file

---
dfs.name.dir : /var/local/hadoop/hdfs/name
dfs.data.dir : /var/local/hadoop/hdfs/data 
dfs.heartbeat.interval : 3
dfs.datanode.address : 0.0.0.0:1004
dfs.datanode.http.address : 0.0.0.0:1006 | Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.

It will later been converted to hadoop-site.xml


 
    dfs.datanode.http.address
    0.0.0.0:1006
    Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
 
 
    dfs.datanode.address
    0.0.0.0:1004
 
 
    dfs.data.dir
    /var/local/hadoop/hdfs/data
 
 
    dfs.heartbeat.interval
    3
 
 
    dfs.name.dir
    /var/local/hadoop/hdfs/name
 


       
Here is the small python codes that I wrote to do the conversion

import yaml
from xml.etree.ElementTree import Element, SubElement, Comment
from xml.etree import ElementTree
from xml.dom import minidom


def prettify(element):
  rough_string = ElementTree.tostring(element, 'utf-8')
  reparsed = minidom.parseString(rough_string)
  return reparsed.toprettyxml(indent="  ")


def read_yaml_file(yaml_file):
  with open(yaml_file, "r") as file:
    site_config = yaml.load(file)
    return site_config


def generate_xml(yaml_file):
  config = read_yaml_file(yaml_file)
  _top = Element('configuration')
  for key, values in config.iteritems():
    _property = SubElement(_top, 'property')
    _name = SubElement(_property, 'name')
    _value = SubElement(_property, 'value')
    if "|" in str(values).split():
      [value, description] = values.split("|")
    else:
      value = values
      description = ""
    _name.text = key
    _value.text = str(value)
    if description:
      _description = SubElement(_property, 'description')
      _description.text = description
  xml_file = yaml_file.split(".")[0] + ".xml"
  with open(xml_file, "w") as file:
    file.write(prettify(_top))


To test it you can import function and use it like this

from hadoop_xml_parser import generate_xml

if __name__ == '__main__':
  generate_xml("hadoop-site.yml")
                                                                                       

Friday, July 13, 2018

Part 2: Docker networking domain sharing

Following up this post, I am giving out the solution to the problem.

Please revise the alipapa.yml, on the network part

networks:
  papa.com:
    driver: bridge 

If I want to give a name to the network bridge, can I do thing like this?

networks:
  papa.com:
    driver: bridge
    name: papa.com

Let's try out, and docker-compose it up!

ubuntu@ip-172-31-11-243:~$ docker-compose -f alipapa.yml up -d
Creating network "papa.com" with driver "bridge"
Recreating ali01 ... done
Recreating ali02 ... done

I can sense the smell of success. But, let's find out more.

ubuntu@ip-172-31-11-243:~$ docker exec -ti ali01 bash
root@ali01:/# hostname -f
ali01.papa.com
root@ali01:/# ping ali01
PING ali01.papa.com (172.18.0.2) 56(84) bytes of data.
64 bytes from ali01.papa.com (172.18.0.2): icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from ali01.papa.com (172.18.0.2): icmp_seq=2 ttl=64 time=0.034 ms
root@ali01:/# ping ali02
PING ali02 (172.18.0.3) 56(84) bytes of data.
64 bytes from ali02.papa.com (172.18.0.3): icmp_seq=1 ttl=64 time=0.076 ms
64 bytes from ali02.papa.com (172.18.0.3): icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from ali02.papa.com (172.18.0.3): icmp_seq=3 ttl=64 time=0.065 ms 

ubuntu@ip-172-31-11-243:~$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
8e0851dec6fd        papa.com            bridge              local
07cb97e27689        bridge              bridge              local
e81946364c3d        host                host                local
b285c6c7236e        none                null                local

5fb1bdc9f13e        ubuntu_papa.com     bridge              local

 Indeed! Problem has been solved. ali01.papa.com and ali02.papa.com is returning nicely to me as well. Thank you so much for the docker embedded DNS engine! The DNS resolution is working out of the box!

Part 1: Docker networking domain sharing

Take a look at this test yaml file.

ubuntu@ip-172-31-11-243:~$ cat alipapa.yml
version: "3.5"
services:
  ali01:
    image: ubuntu:16.04
    hostname: ali01
    container_name: ali01
    domainname: papa.com
    networks:
      papa.com:
        aliases:
          - ali01.papa.com
    entrypoint: sleep infinity
  ali02:
    image: ubuntu:16.04
    hostname: ali02
    container_name: ali02
    domainname: papa.com
    networks:
      papa.com:
        aliases:
          - ali02.papa.com
    entrypoint: sleep infinity
networks:
  papa.com:
    driver: bridge

When you are bringing up all these containers, considering that you tap into ali01 and ping ali02, what is your expected result.

ubuntu@ip-172-31-11-243:~$ docker-compose -f alipapa.yml up -d
Creating network "ubuntu_papa.com" with driver "bridge"
Creating ali02 ... done
Creating ali01 ... done
ubuntu@ip-172-31-11-243:~$ docker ps -a
CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS                       PORTS               NAMES
fdbfb1a48686        ubuntu:16.04           "sleep infinity"         7 seconds ago       Up 6 seconds                                     ali02
7566e8c7db8b        ubuntu:16.04           "sleep infinity"         7 seconds ago       Up 5 seconds                                     ali01

ubuntu@ip-172-31-11-243:~$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
07cb97e27689        bridge              bridge              local
e81946364c3d        host                host                local
b285c6c7236e        none                null                local
5fb1bdc9f13e        ubuntu_papa.com     bridge              local

Wait... I didn't actually creating ubuntu_papa.com domain? Why it is so? Should it be just papa.com?

root@ali01:/# ping ali02
PING ali02 (172.20.0.3) 56(84) bytes of data.
64 bytes from ali02.ubuntu_papa.com (172.20.0.3): icmp_seq=1 ttl=64 time=0.087 ms
64 bytes from ali02.ubuntu_papa.com (172.20.0.3): icmp_seq=2 ttl=64 time=0.053 ms
64 bytes from ali02.ubuntu_papa.com (172.20.0.3): icmp_seq=3 ttl=64 time=0.062 ms

root@ali01:/# ping ali02.papa.com
PING ali02.baba.com (172.20.0.3) 56(84) bytes of data.
64 bytes from ali02.ubuntu_papa.com (172.20.0.3): icmp_seq=1 ttl=64 time=0.063 ms
64 bytes from ali02.ubuntu_papa.com (172.20.0.3): icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from ali02.ubuntu_papa.com (172.20.0.3): icmp_seq=3 ttl=64 time=0.065 ms

Scratching head moment. 😓😓😓




docker-compose dilemma, don't think twice, just upgrade it!

hi

I was been hitting the same problem on docker-compose lately. The problem sounds like this.

ubuntu@ip-172-31-11-243:~/apache-hadoop-docker$ docker-compose -v

docker-compose version 1.8.0, build unknown

ubuntu@ip-172-31-11-243:~/apache-hadoop-docker$ docker-compose -f hdfs-cluster-nonkerberized.yml  up
ERROR: Version in "./hdfs-cluster-nonkerberized.yml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a version of "2" (or "2.0") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/
ubuntu@ip-172-31-11-243:~/apache-hadoop-docker$


There is nothing wrong when your docker-compose yml file started with version: 3.0 or above. Don't blame on your yml file, even you are the fresh guy to start to code docker-compose yml file. The problem is residing on the docker-compose that comes along with Ubuntu 16.04(Xenial). Don't think twice, please go ahead and upgrade it. Your life will be much better after all.

The steps to do upgrade. Considering that you are on Ubuntu Xenial, here are the steps

> mv /usr/bin/docker-compose /usr/bin/docker-compose-old

>curl -L https://github.com/docker/compose/releases/download/1.20.0/docker-compose-`uname -s`-`uname -m` -o /usr/bin/docker-compose


>chmod +x /usr/bin/docker-compose


Enjoy!