Apache Hadoop-Testbed

Hinweis

Notizen, die während des Setups erstellt worden sind - muss ich irgendwann noch einmal prüfen / ergänzen!

Umfang: Yarn, HDFS, Spark, Airflow, Nifi, Zookeeper, Kafka, Jupyter, Hbase

Plattform: vier virtuelle Server (1x Controller, 3x Worker) unter CentOS 7, je 2 Cores, 4 GB RAM, 64GB HDD, private Network per IPSec angebunden

Interfaces (nicht öffentlich!)

Vorbereitung

  • Update aller Rechner

  • Installation eines normalen Nutzer-Accounts auf allen Knoten (nicht root!)

  • SSH-Keybased authorisation zwischen allen Knoten ermöglichen (ssh Keys verteilen und freischalten)

Spark

https://spark.apache.org/docs/latest/spark-standalone.html

Hadoop

https://hadoop.apache.org/releases.html

Anpassungen:

hadoop/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://testbed.nebel.work:8020</value>
</property>
hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr

JAVA_HOME=/usr/lib/jvm/java-1.8.0/
JRE_HOME=/usr/lib/jvm/java-1.8.0/jre
PATH=$JAVA_HOME/bin:"$PATH"
export JAVA_HOME JRE_HOME
.bashrc:
export JAVA_HOME=/usr
export PATH=$PATH:/<INSTALLATIONSPFAD>/hadoop/bin:/<INSTALLATIONSPFAD>/spark/bin

Hive

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive

http://www.mtitek.com/tutorials/bigdata/hive/install.php

  • download, tar

  • export HIVE_HOME

  • start

Nifi

https://nifi.apache.org/docs.html

  • download, tar, start

Airflow

https://airflow.apache.org/docs/apache-airflow/stable/start.html

Da unter Centos7 sqllite3 veraltet ist, muss eine neue Version per configure, make, makeinstall nachinstalliert werden

Installation von Airflow
mkdir venv
virtualenv venv/
cd venv/
source bin/activate
export AIRFLOW_HOME=~/airflow
pip install apache-airflow

airflow db init
airflow users create --username XXX --firstname Michael --lastname Nebel --role Admin --email ...@.....de
airflow webserver --port 8000 -D &
nohup airflow scheduler &

Kafka

https://kafka.apache.org/quickstart

  • download, tar, start

Kafdrop

https://github.com/obsidiandynamics/kafdrop/

  • Download Release, Tar

  • ggf Java aktualiseren (yum install java-11-openjdk-devel; update-alternatives --config javac; update-alternatives --config java)

  • ./mvwn install

  • java --add-opens=java.base/sun.nio.ch=ALL-UNNAMED -jar target/kafdrop-3.27.0.jar --kafka.brockerConnect=localhost:9092 &

Jupyter

https://jupyter.org

Installation Jupyter
. venv/bin/activate
pip install jupyterlab
jupyter notebook --generate-config
emacs -nw .jupyter/jupyter_notebook_config.py   # c.NotebookApp.ip erweitern
jupyter-server password
mkdir -p jupyter/work
Start-Script Jupyter
cat init.d/jupyter
#!/bin/bash

# Benutzer pruefen
if [ $UID -ne XXXXXX ];then echo "your are not user XXXX!";exit 1;fi


################################
. /<INSTALLATIONSPFAD>/venv/bin/activate

case "$1" in
    start)
        echo -e "\e[1m*** Start $0 ***\e[00m"
        cd /<INSTALLATIONSPFAD>/jupyter/work
        jupyter-server > /<INSTALLATIONSPFAD>/logs/jupyter.log 2>&1 &
        ;;
    stop)
        echo -e "\e[1m*** Stop $0 ***\e[00m"
        cd /<INSTALLATIONSPFAD>/jupyter/work
        killall jupyter-server
        ;;
    restart)
        ${0} stop && ${0} start
        ;;
    status)
        echo -e "\e[1m*** Status $0 ***\e[00m"
        echo ""
        cd /<INSTALLATIONSPFAD>/jupyter/work
        jupyter-server list
        wget -q -O /dev/null -S   http://testbed.nebel.work:8888/  2>&1 | grep  HTTP
        ;;
    *)
        echo "usage: $0 {start|stop|restart|status}"
        exit 1
esac