上海古都建筑设计集团,上海办公室装修设计公司,上海装修公司高质量的内容分享社区,上海装修公司我们不是内容生产者,我们只是上海办公室装修设计公司内容的搬运工平台

大数据工具之Superset

guduadmin11天前

大数据工具之Superset

概述

Apache Superset是一个开源的、现代的、轻量级BI分析工具,能够对接多种数据源、拥有丰富的图标展示形式、支持自定义仪表盘,且拥有友好的用户界面,十分易用。

由于Superset能够对接常用的大数据分析工具,如Trino、Hive、Kylin、Druid等,且支持自定义仪表盘,故可作为数仓的可视化工具,应用于数据仓库的ADS!

大数据工具之Superset,在这里插入图片描述,第1张

官网:https://superset.apache.org/

安装须知

  • Superset 没有对 Windows 的官方支持(这个基本上是废话,谁用Windows做服务器)

  • Superset是由Python语言编写的Web应用,要求Python3.6+ 的环境

  • Superset建议为虚拟机分配至少 8GB 的 RAM,并配置至少 40GB 的硬盘驱动器,以便为操作系统和所有必需的依赖项提供足够的空间

    Python环境

    安装更新依赖环境

    #1、安装相关依赖
    yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
    #2.安装更新gcc:
    yum install gcc
    #3.Python3.7版本之后需要安装libffi-devel
    yum install libffi-devel -y
    

    下载安装Python

    因为我们很多情况下因为财力所制,同一开发服务器会安装多个不同版本的Python以应对不同的”客户“,所以建议安装Miniconda,对不同python版本进行切换,而且Superset官方也强烈建议在虚拟环境中安装 Superset!

    安装Conda

    Miniconda3-latest-Linux-x86_64.sh

    #1、执行以下命令,安装 Miniconda,并按照提示进行操作
    bash Miniconda3-latest-Linux-x86_64.sh
    #2、一直按回车按着别松,出现是否接受协议,输入 yes
    Please answer 'yes' or 'no':'
    >>> yes
    #3、出现确定安装路径,默认是在安装shell脚本目录下
    [/root/miniconda3] >>> /opt/module/miniconda3
    #4、出现是否进行conda的初始化,输入 yes
    Do you wish the installer to initialize Miniconda3
    by running conda init? [yes|no]
    [no] >>> yes
    #5、看到如下表示安装成功
    ==> For changes to take effect, close and re-open your current shell. <==
    If you'd prefer that conda's base environment not be activated on startup,
       set the auto_activate_base parameter to false:
    conda config --set auto_activate_base false
    Thank you for installing Miniconda3!
    #6、取消激活base环境:Miniconda安装完成后每次打开终端都会激活其默认的base环境,我们可通过以下命令,禁止激活默认base环境。
    [root@paratera128 ~]# conda config --set auto_activate_base false
    #7、配置conda国内镜像,多配几个
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    conda config --set show_channel_urls yes
    
    Python环境配置

    conda安装python特别简单,superset最新版本最好选python3.7,python3.8

    #1、Python版本指定安装
    conda create --name superset python=3.7
    #2、激活superset环境,进入conda python3.7环境进行操作,不影响主机的py环境
    conda activate superset
    #3、退出当前环境
    conda deactivatecon
    #4、删除虚拟环境
    conda env remove -n superset
    

    部署Superset(Docker)

    安装启动

    #通过git下载superset包,官网提供了Docker-Compose傻瓜式安装方式(分开发配置和生产配置)
    [root@paratera128 opt]# git clone https://github.com/apache/superset.git
    # 进入项目目录
    [root@paratera128 opt]# cd superset
    #这种安装方式跟Docker-Compose版本,Docker引擎版本关联非常大,我本地Docker-Compose和Docker版本如下,官网下载的docker-compose.yml文件version需要改成3.6及以下,版本对应关系可以百度:docker与docker-compose版本对应关系
    [root@paratera128 ~]# docker --version
    Docker version 18.03.1-ce, build 9ee9f40
    (superset) [root@paratera128 ~]# docker-compose --version
    docker-compose version 1.26.2, build eefe0d31
    #启动脚本赋权
    [root@paratera128 superset]# chmod 777 docker
    [root@paratera128 superset]# cd docker/
    [root@paratera128 docker]# ls
    docker-bootstrap.sh  docker-ci.sh  docker-frontend.sh  docker-init.sh  frontend-mem-nag.sh  pythonpath_dev  README.md  run-server.sh
    [root@paratera128 docker]# chmod 777 *
    #拉取镜像、启动实例(可以一步到位)
    [root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml pull
    [root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml up -d
    superset_cache is up-to-date
    superset_db is up-to-date
    Starting superset_worker_beat ... done
    Starting superset_app         ... done
    Starting superset_worker      ... done
    Starting superset_init        ... done
    #创建管理用户
    [root@paratera128 superset]# docker exec -it superset_app flask fab create-admin
    Username [admin]: admin
    User first name [admin]: admin
    User last name [user]: admin
    Email [admin@fab.org]: admin
    Password:
    Repeat for confirmation:
    Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
    logging was configured successfully
    2022-07-26 04:10:42,285:INFO:superset.utils.logging_configurator:logging was configured successfully
    2022-07-26 04:10:42,293:INFO:root:Configured event logger of type 
    /usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
      warnings.warn(
    Recognized Database Authentications.
    Error! User already exists admin
    #初始化数据库
    [root@paratera128 superset]# docker exec -it superset_app superset db upgrade
    Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
    logging was configured successfully
    2022-07-26 04:11:58,693:INFO:superset.utils.logging_configurator:logging was configured successfully
    2022-07-26 04:11:58,700:INFO:root:Configured event logger of type 
    /usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
      warnings.warn(
    INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
    INFO  [alembic.runtime.migration] Will assume transactional DDL.
    #superset初始化
    [root@paratera128 superset]# docker exec -it superset_app superset init
    Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
    logging was configured successfully
    2022-07-26 04:12:47,375:INFO:superset.utils.logging_configurator:logging was configured successfully
    2022-07-26 04:12:47,382:INFO:root:Configured event logger of type 
    /usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
      warnings.warn(
    Syncing role definition
    2022-07-26 04:12:50,958:INFO:superset.security.manager:Syncing role definition
    Syncing Admin perms
    2022-07-26 04:12:50,980:INFO:superset.security.manager:Syncing Admin perms
    Syncing Alpha perms
    2022-07-26 04:12:51,220:INFO:superset.security.manager:Syncing Alpha perms
    Syncing Gamma perms
    2022-07-26 04:12:51,391:INFO:superset.security.manager:Syncing Gamma perms
    Syncing granter perms
    2022-07-26 04:12:51,554:INFO:superset.security.manager:Syncing granter perms
    Syncing sql_lab perms
    2022-07-26 04:12:51,705:INFO:superset.security.manager:Syncing sql_lab perms
    Fetching a set of all perms to lookup which ones are missing
    2022-07-26 04:12:51,874:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
    Creating missing datasource permissions.
    2022-07-26 04:12:52,034:INFO:superset.security.manager:Creating missing datasource permissions.
    Creating missing database permissions.
    2022-07-26 04:12:52,044:INFO:superset.security.manager:Creating missing database permissions.
    Cleaning faulty perms
    2022-07-26 04:12:52,056:INFO:superset.security.manager:Cleaning faulty perms
    #下载样例数据(可选)
    [root@paratera128 yum]# docker exec -it superset_app superset load_examples
    

    ###DockerCompose 配置

    #docker-compose 版本、用户、挂在卷变量
    x-superset-image: &superset-image apache/superset:latest
    x-superset-user: &superset-user root
    x-superset-depends-on: &superset-depends-on
      - db
      - redis
    x-superset-volumes: &superset-volumes
      # /app/pythonpath_docker will be appended to the PYTHONPATH in the final container
      - ./docker:/app/docker
      - ./superset:/app/superset
      - ./superset-frontend:/app/superset-frontend
      - superset_home:/app/superset_home
      - ./tests:/app/tests
    version: "3.6"
    services:
    #Superset Flask-Caching缓存,其实就是缓存用户用过的一些操作,如:仪表板过滤器状态,探索图表表格数据
      redis:
        image: redis:latest
        container_name: superset_cache
        restart: unless-stopped
        ports:
          - "127.0.0.1:6379:6379"
        volumes:
          - redis:/data
    #PostgreSQL数据库,可选
      db:
        env_file: docker/.env
        image: postgres:14
        container_name: superset_db
        restart: unless-stopped
        ports:
          - "127.0.0.1:5432:5432"
        volumes:
          - db_home:/var/lib/postgresql/data
    #superset server启动实例
      superset:
        env_file: docker/.env
        image: *superset-image
        container_name: superset_app
        command: ["/app/docker/docker-bootstrap.sh", "app"]
        restart: unless-stopped
        ports:
          - 8088:8088
        user: *superset-user
        depends_on: *superset-depends-on
        volumes: *superset-volumes
        environment:
          CYPRESS_CONFIG: "${CYPRESS_CONFIG}"
    volumes:
      superset_home:
        external: false
      db_home:
        external: false
      redis:
        external: false
    

    部署Superset(pip虚拟)

    安装启动

    #激活superset环境
    [root@paratera128 ~]# conda activate superset
    (superset) [root@paratera128 ~]#
    #安装依赖
    yum install -y python-setuptools
    yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
    #安装(更新)setuptools 和 pip
    pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/
    #安装superset
    pip install apache-superset -i https://pypi.douban.com/simple/
    #指定版本安装
    pip install apache-superset –v apache-superset==1.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
    

    看到如下信息表示安装成功了,WARNING信息忽略,只是提示你使用root账号可能造成权限过大,生产环境不会有这个提示

    大数据工具之Superset,在这里插入图片描述,第2张

    初始化管理员

    (superset) [root@paratera128 ~]# export FLASK_APP=superset
    (superset) [root@paratera128 ~]# flask fab create-admin
    Username [admin]: admin
    User first name [admin]: admin
    User last name [user]: admin
    Email [admin@fab.org]: admin
    Password:
    Repeat for confirmation:
    logging was configured successfully
    2022-07-25 18:23:46,139:INFO:superset.utils.logging_configurator:logging was configured successfully
    2022-07-25 18:23:46,156:INFO:root:Configured event logger of type 
    /opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
      "Flask-Caching: CACHE_TYPE is set to null, "
    Recognized Database Authentications.
    Admin User admin created.
    

    初始化数据库

    Superset说到底其实就是一个Web应用程序,自带数据库,需要初始化

    #更新dataclasses,初始化 superset 数据库
    pip install dataclasses
    superset db upgrade
    

    若提示:UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.

    找到python3.7/site-packages/superset/config.py打开编辑:

    搜索:“CACHE_TYPE”,全部改成"simple"

    基础数据初始化

    (superset) [root@paratera128 local]# superset init
    logging was configured successfully
    2022-07-25 02:24:19,136:INFO:superset.utils.logging_configurator:logging was configured successfully
    2022-07-25 02:24:19,148:INFO:root:Configured event logger of type 
    /opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
      "Flask-Caching: CACHE_TYPE is set to null, "
    Syncing role definition
    2022-07-25 02:24:27,821:INFO:superset.security.manager:Syncing role definition
    Syncing Admin perms
    2022-07-25 02:24:27,920:INFO:superset.security.manager:Syncing Admin perms
    Syncing Alpha perms
    2022-07-25 02:24:28,026:INFO:superset.security.manager:Syncing Alpha perms
    Syncing Gamma perms
    2022-07-25 02:24:28,410:INFO:superset.security.manager:Syncing Gamma perms
    Syncing granter perms
    2022-07-25 02:24:28,741:INFO:superset.security.manager:Syncing granter perms
    Syncing sql_lab perms
    2022-07-25 02:24:29,045:INFO:superset.security.manager:Syncing sql_lab perms
    Fetching a set of all perms to lookup which ones are missing
    2022-07-25 02:24:29,687:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
    Creating missing datasource permissions.
    2022-07-25 02:24:29,769:INFO:superset.security.manager:Creating missing datasource permissions.
    Creating missing database permissions.
    2022-07-25 02:24:29,776:INFO:superset.security.manager:Creating missing database permissions.
    Cleaning faulty perms
    2022-07-25 02:24:29,780:INFO:superset.security.manager:Cleaning faulty perms
    

    服务启动

    #通过命令模式启动,并设置五个worker节点进程,统一注册到192.168.137.128:8080
    (superset) [root@paratera128 local]# gunicorn --workers 5 --timeout 120 --bind 192.168.137.128:8080 "superset.app:create_app()" –daemon
    [2022-07-25 02:28:47 -0700] [104753] [INFO] Starting gunicorn 20.0.4
    [2022-07-25 02:28:47 -0700] [104753] [INFO] Listening at: http://192.168.137.128:8080 (104753)
    [2022-07-25 02:28:47 -0700] [104753] [INFO] Using worker: sync
    [2022-07-25 02:28:47 -0700] [104756] [INFO] Booting worker with pid: 104756
    [2022-07-25 02:28:47 -0700] [104757] [INFO] Booting worker with pid: 104757
    [2022-07-25 02:28:47 -0700] [104758] [INFO] Booting worker with pid: 104758
    [2022-07-25 02:28:47 -0700] [104759] [INFO] Booting worker with pid: 104759
    [2022-07-25 02:28:47 -0700] [104760] [INFO] Booting worker with pid: 104760
    logging was configured successfully
    

    问题解决

    补充依赖如下:

    pip install flask -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install wtforms_json -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install flask_appbuilder -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install flask_compress -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install celery -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install flask_migrate -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install flask_talisman -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install flask_caching -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install sqlparse -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install bleach -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install markdown -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install parsedatetime -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install pathlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install simplejson -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install humanize -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install python-geohash -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install polyline -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install geopy -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install sqlalchemy-utils -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install cryptography -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install backoff -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install msgpack -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install pyarrow -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install contextlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install croniter -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install retry -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install isodate -i https://pypi.tuna.tsinghua.edu.cn/simple
    #这个地方markupsafe 2.1.1版本会报错,用低版本的2.0.1覆盖掉
    (superset) [root@paratera128 superset]# pip show markupsafe
    Name: MarkupSafe
    Version: 2.1.1
    Summary: Safely add untrusted strings to HTML/XML markup.
    Home-page: https://palletsprojects.com/p/markupsafe/
    Author: Armin Ronacher
    Author-email: armin.ronacher@active-4.com
    License: BSD-3-Clause
    Location: /opt/module/miniconda3/envs/superset/lib/python3.7/site-packages
    Requires:
    Required-by: Jinja2, Mako, WTForms
    (superset) [root@paratera128 superset]# python -m pip install markupsafe==2.0.1
    

    报错:No PIL installation found 解决

    (superset) [root@paratera128 local]# pip install pillow -i https://pypi.tuna.tsinghua.edu.cn/simple
    (superset) [root@paratera128 local]# superset version
    logging was configured successfully
    2022-07-25 02:20:07,976:INFO:superset.utils.logging_configurator:logging was configured successfully
    2022-07-25 02:20:07,983:INFO:root:Configured event logger of type 
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    Superset 1.3.0
    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    

    到这里Superset conda虚拟环境模式安装完成

    访问Superset

    地址:http://ip:8088

    账号密码:admin/admin

    大数据工具之Superset,第3张

    连接数据库

    大数据工具之Superset,在这里插入图片描述,第4张

    MySQL

    大数据工具之Superset,在这里插入图片描述,第5张

    Trino

    连接Trino需要安装相关驱动:https://superset.apache.org/docs/databases/installing-database-drivers/

    需要先安装pip,并且版本需求比较高,安装后需要更新

    [root@paratera128 yum]# yum -y install epel-release
    [root@paratera128 yum]# yum -y install python-pip
    [root@paratera128 yum]# wget https://bootstrap.pypa.io/pip/2.7/get-pip.py
    [root@paratera128 yum]# python3 get-pip.py
    #下载驱动
    [root@paratera128 yum]# pip install sqlalchemy-trino
    #如果是docker部署的superset,还需要把驱动加载到docker容器
    [root@paratera128 superset] touch ./docker/requirements-local.txt
    [root@paratera128 superset] echo "sqlalchemy-trino" >> ./docker/requirements-local.txt
    [root@paratera128 superset] docker-compose -f docker-compose-non-dev.yml build --force-rm
    [root@paratera128 superset] docker-compose -f docker-compose-non-dev.yml up
    

    大数据工具之Superset,在这里插入图片描述,第6张

    报表设计

    最普通的Table

    看图说话

    大数据工具之Superset,在这里插入图片描述,第7张

    大数据工具之Superset,在这里插入图片描述,第8张

    大数据工具之Superset,在这里插入图片描述,第9张

    大数据工具之Superset,在这里插入图片描述,第10张

    大数据工具之Superset,在这里插入图片描述,第11张

    柱状图

    需求:统计一个月内每天的新老用户数

    大数据工具之Superset,在这里插入图片描述,第12张

    大数据工具之Superset,在这里插入图片描述,第13张

    大数据工具之Superset,在这里插入图片描述,第14张

    大数据工具之Superset,在这里插入图片描述,第15张

    饼图

    统计各个频段数据占比

    大数据工具之Superset,在这里插入图片描述,第16张

    大数据工具之Superset,在这里插入图片描述,第17张

    大数据工具之Superset,在这里插入图片描述,第18张

    面板

    我们可以看到以上创建的Chart组件已经保存到同一个面板了

    大数据工具之Superset,在这里插入图片描述,第19张

    把Chart拖拽进来即可

    大数据工具之Superset,在这里插入图片描述,第20张

    API二次开发

    参考文档:https://superset.apache.org/docs/api

    比如我们想查询上面创建的四个Charts集合,可以使用这个接口

    大数据工具之Superset,在这里插入图片描述,第21张

    不带参数的话就默认输出所有列,所有数据

    大数据工具之Superset,在这里插入图片描述,第22张

网友评论

搜索
最新文章
热门文章
热门标签