大数据工具之Superset
概述
Apache Superset是一个开源的、现代的、轻量级BI分析工具,能够对接多种数据源、拥有丰富的图标展示形式、支持自定义仪表盘,且拥有友好的用户界面,十分易用。
由于Superset能够对接常用的大数据分析工具,如Trino、Hive、Kylin、Druid等,且支持自定义仪表盘,故可作为数仓的可视化工具,应用于数据仓库的ADS!
官网:https://superset.apache.org/
安装须知
-
Superset 没有对 Windows 的官方支持(这个基本上是废话,谁用Windows做服务器)
-
Superset是由Python语言编写的Web应用,要求Python3.6+ 的环境
-
Superset建议为虚拟机分配至少 8GB 的 RAM,并配置至少 40GB 的硬盘驱动器,以便为操作系统和所有必需的依赖项提供足够的空间
Python环境
安装更新依赖环境
#1、安装相关依赖 yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel #2.安装更新gcc: yum install gcc #3.Python3.7版本之后需要安装libffi-devel yum install libffi-devel -y
下载安装Python
因为我们很多情况下因为财力所制,同一开发服务器会安装多个不同版本的Python以应对不同的”客户“,所以建议安装Miniconda,对不同python版本进行切换,而且Superset官方也强烈建议在虚拟环境中安装 Superset!
安装Conda
Miniconda3-latest-Linux-x86_64.sh
#1、执行以下命令,安装 Miniconda,并按照提示进行操作 bash Miniconda3-latest-Linux-x86_64.sh #2、一直按回车按着别松,出现是否接受协议,输入 yes Please answer 'yes' or 'no':' >>> yes #3、出现确定安装路径,默认是在安装shell脚本目录下 [/root/miniconda3] >>> /opt/module/miniconda3 #4、出现是否进行conda的初始化,输入 yes Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no] [no] >>> yes #5、看到如下表示安装成功 ==> For changes to take effect, close and re-open your current shell. <== If you'd prefer that conda's base environment not be activated on startup, set the auto_activate_base parameter to false: conda config --set auto_activate_base false Thank you for installing Miniconda3! #6、取消激活base环境:Miniconda安装完成后每次打开终端都会激活其默认的base环境,我们可通过以下命令,禁止激活默认base环境。 [root@paratera128 ~]# conda config --set auto_activate_base false #7、配置conda国内镜像,多配几个 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main conda config --set show_channel_urls yes
Python环境配置
conda安装python特别简单,superset最新版本最好选python3.7,python3.8
#1、Python版本指定安装 conda create --name superset python=3.7 #2、激活superset环境,进入conda python3.7环境进行操作,不影响主机的py环境 conda activate superset #3、退出当前环境 conda deactivatecon #4、删除虚拟环境 conda env remove -n superset
部署Superset(Docker)
安装启动
#通过git下载superset包,官网提供了Docker-Compose傻瓜式安装方式(分开发配置和生产配置) [root@paratera128 opt]# git clone https://github.com/apache/superset.git # 进入项目目录 [root@paratera128 opt]# cd superset #这种安装方式跟Docker-Compose版本,Docker引擎版本关联非常大,我本地Docker-Compose和Docker版本如下,官网下载的docker-compose.yml文件version需要改成3.6及以下,版本对应关系可以百度:docker与docker-compose版本对应关系 [root@paratera128 ~]# docker --version Docker version 18.03.1-ce, build 9ee9f40 (superset) [root@paratera128 ~]# docker-compose --version docker-compose version 1.26.2, build eefe0d31 #启动脚本赋权 [root@paratera128 superset]# chmod 777 docker [root@paratera128 superset]# cd docker/ [root@paratera128 docker]# ls docker-bootstrap.sh docker-ci.sh docker-frontend.sh docker-init.sh frontend-mem-nag.sh pythonpath_dev README.md run-server.sh [root@paratera128 docker]# chmod 777 * #拉取镜像、启动实例(可以一步到位) [root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml pull [root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml up -d superset_cache is up-to-date superset_db is up-to-date Starting superset_worker_beat ... done Starting superset_app ... done Starting superset_worker ... done Starting superset_init ... done #创建管理用户 [root@paratera128 superset]# docker exec -it superset_app flask fab create-admin Username [admin]: admin User first name [admin]: admin User last name [user]: admin Email [admin@fab.org]: admin Password: Repeat for confirmation: Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py] logging was configured successfully 2022-07-26 04:10:42,285:INFO:superset.utils.logging_configurator:logging was configured successfully 2022-07-26 04:10:42,293:INFO:root:Configured event logger of type
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled. warnings.warn( Recognized Database Authentications. Error! User already exists admin #初始化数据库 [root@paratera128 superset]# docker exec -it superset_app superset db upgrade Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py] logging was configured successfully 2022-07-26 04:11:58,693:INFO:superset.utils.logging_configurator:logging was configured successfully 2022-07-26 04:11:58,700:INFO:root:Configured event logger of type /usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled. warnings.warn( INFO [alembic.runtime.migration] Context impl PostgresqlImpl. INFO [alembic.runtime.migration] Will assume transactional DDL. #superset初始化 [root@paratera128 superset]# docker exec -it superset_app superset init Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py] logging was configured successfully 2022-07-26 04:12:47,375:INFO:superset.utils.logging_configurator:logging was configured successfully 2022-07-26 04:12:47,382:INFO:root:Configured event logger of type /usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled. warnings.warn( Syncing role definition 2022-07-26 04:12:50,958:INFO:superset.security.manager:Syncing role definition Syncing Admin perms 2022-07-26 04:12:50,980:INFO:superset.security.manager:Syncing Admin perms Syncing Alpha perms 2022-07-26 04:12:51,220:INFO:superset.security.manager:Syncing Alpha perms Syncing Gamma perms 2022-07-26 04:12:51,391:INFO:superset.security.manager:Syncing Gamma perms Syncing granter perms 2022-07-26 04:12:51,554:INFO:superset.security.manager:Syncing granter perms Syncing sql_lab perms 2022-07-26 04:12:51,705:INFO:superset.security.manager:Syncing sql_lab perms Fetching a set of all perms to lookup which ones are missing 2022-07-26 04:12:51,874:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing Creating missing datasource permissions. 2022-07-26 04:12:52,034:INFO:superset.security.manager:Creating missing datasource permissions. Creating missing database permissions. 2022-07-26 04:12:52,044:INFO:superset.security.manager:Creating missing database permissions. Cleaning faulty perms 2022-07-26 04:12:52,056:INFO:superset.security.manager:Cleaning faulty perms #下载样例数据(可选) [root@paratera128 yum]# docker exec -it superset_app superset load_examples ###DockerCompose 配置
#docker-compose 版本、用户、挂在卷变量 x-superset-image: &superset-image apache/superset:latest x-superset-user: &superset-user root x-superset-depends-on: &superset-depends-on - db - redis x-superset-volumes: &superset-volumes # /app/pythonpath_docker will be appended to the PYTHONPATH in the final container - ./docker:/app/docker - ./superset:/app/superset - ./superset-frontend:/app/superset-frontend - superset_home:/app/superset_home - ./tests:/app/tests version: "3.6" services: #Superset Flask-Caching缓存,其实就是缓存用户用过的一些操作,如:仪表板过滤器状态,探索图表表格数据 redis: image: redis:latest container_name: superset_cache restart: unless-stopped ports: - "127.0.0.1:6379:6379" volumes: - redis:/data #PostgreSQL数据库,可选 db: env_file: docker/.env image: postgres:14 container_name: superset_db restart: unless-stopped ports: - "127.0.0.1:5432:5432" volumes: - db_home:/var/lib/postgresql/data #superset server启动实例 superset: env_file: docker/.env image: *superset-image container_name: superset_app command: ["/app/docker/docker-bootstrap.sh", "app"] restart: unless-stopped ports: - 8088:8088 user: *superset-user depends_on: *superset-depends-on volumes: *superset-volumes environment: CYPRESS_CONFIG: "${CYPRESS_CONFIG}" volumes: superset_home: external: false db_home: external: false redis: external: false
部署Superset(pip虚拟)
安装启动
#激活superset环境 [root@paratera128 ~]# conda activate superset (superset) [root@paratera128 ~]# #安装依赖 yum install -y python-setuptools yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel #安装(更新)setuptools 和 pip pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/ #安装superset pip install apache-superset -i https://pypi.douban.com/simple/ #指定版本安装 pip install apache-superset –v apache-superset==1.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
看到如下信息表示安装成功了,WARNING信息忽略,只是提示你使用root账号可能造成权限过大,生产环境不会有这个提示
初始化管理员
(superset) [root@paratera128 ~]# export FLASK_APP=superset (superset) [root@paratera128 ~]# flask fab create-admin Username [admin]: admin User first name [admin]: admin User last name [user]: admin Email [admin@fab.org]: admin Password: Repeat for confirmation: logging was configured successfully 2022-07-25 18:23:46,139:INFO:superset.utils.logging_configurator:logging was configured successfully 2022-07-25 18:23:46,156:INFO:root:Configured event logger of type
/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled. "Flask-Caching: CACHE_TYPE is set to null, " Recognized Database Authentications. Admin User admin created. 初始化数据库
Superset说到底其实就是一个Web应用程序,自带数据库,需要初始化
#更新dataclasses,初始化 superset 数据库 pip install dataclasses superset db upgrade
若提示:UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
找到python3.7/site-packages/superset/config.py打开编辑:
搜索:“CACHE_TYPE”,全部改成"simple"
基础数据初始化
(superset) [root@paratera128 local]# superset init logging was configured successfully 2022-07-25 02:24:19,136:INFO:superset.utils.logging_configurator:logging was configured successfully 2022-07-25 02:24:19,148:INFO:root:Configured event logger of type
/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled. "Flask-Caching: CACHE_TYPE is set to null, " Syncing role definition 2022-07-25 02:24:27,821:INFO:superset.security.manager:Syncing role definition Syncing Admin perms 2022-07-25 02:24:27,920:INFO:superset.security.manager:Syncing Admin perms Syncing Alpha perms 2022-07-25 02:24:28,026:INFO:superset.security.manager:Syncing Alpha perms Syncing Gamma perms 2022-07-25 02:24:28,410:INFO:superset.security.manager:Syncing Gamma perms Syncing granter perms 2022-07-25 02:24:28,741:INFO:superset.security.manager:Syncing granter perms Syncing sql_lab perms 2022-07-25 02:24:29,045:INFO:superset.security.manager:Syncing sql_lab perms Fetching a set of all perms to lookup which ones are missing 2022-07-25 02:24:29,687:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing Creating missing datasource permissions. 2022-07-25 02:24:29,769:INFO:superset.security.manager:Creating missing datasource permissions. Creating missing database permissions. 2022-07-25 02:24:29,776:INFO:superset.security.manager:Creating missing database permissions. Cleaning faulty perms 2022-07-25 02:24:29,780:INFO:superset.security.manager:Cleaning faulty perms 服务启动
#通过命令模式启动,并设置五个worker节点进程,统一注册到192.168.137.128:8080 (superset) [root@paratera128 local]# gunicorn --workers 5 --timeout 120 --bind 192.168.137.128:8080 "superset.app:create_app()" –daemon [2022-07-25 02:28:47 -0700] [104753] [INFO] Starting gunicorn 20.0.4 [2022-07-25 02:28:47 -0700] [104753] [INFO] Listening at: http://192.168.137.128:8080 (104753) [2022-07-25 02:28:47 -0700] [104753] [INFO] Using worker: sync [2022-07-25 02:28:47 -0700] [104756] [INFO] Booting worker with pid: 104756 [2022-07-25 02:28:47 -0700] [104757] [INFO] Booting worker with pid: 104757 [2022-07-25 02:28:47 -0700] [104758] [INFO] Booting worker with pid: 104758 [2022-07-25 02:28:47 -0700] [104759] [INFO] Booting worker with pid: 104759 [2022-07-25 02:28:47 -0700] [104760] [INFO] Booting worker with pid: 104760 logging was configured successfully
问题解决
补充依赖如下:
pip install flask -i https://pypi.tuna.tsinghua.edu.cn/simple pip install wtforms_json -i https://pypi.tuna.tsinghua.edu.cn/simple pip install flask_appbuilder -i https://pypi.tuna.tsinghua.edu.cn/simple pip install flask_compress -i https://pypi.tuna.tsinghua.edu.cn/simple pip install celery -i https://pypi.tuna.tsinghua.edu.cn/simple pip install flask_migrate -i https://pypi.tuna.tsinghua.edu.cn/simple pip install flask_talisman -i https://pypi.tuna.tsinghua.edu.cn/simple pip install flask_caching -i https://pypi.tuna.tsinghua.edu.cn/simple pip install sqlparse -i https://pypi.tuna.tsinghua.edu.cn/simple pip install bleach -i https://pypi.tuna.tsinghua.edu.cn/simple pip install markdown -i https://pypi.tuna.tsinghua.edu.cn/simple pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple pip install parsedatetime -i https://pypi.tuna.tsinghua.edu.cn/simple pip install pathlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple pip install simplejson -i https://pypi.tuna.tsinghua.edu.cn/simple pip install humanize -i https://pypi.tuna.tsinghua.edu.cn/simple pip install python-geohash -i https://pypi.tuna.tsinghua.edu.cn/simple pip install polyline -i https://pypi.tuna.tsinghua.edu.cn/simple pip install geopy -i https://pypi.tuna.tsinghua.edu.cn/simple pip install sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simple pip install sqlalchemy-utils -i https://pypi.tuna.tsinghua.edu.cn/simple pip install cryptography -i https://pypi.tuna.tsinghua.edu.cn/simple pip install backoff -i https://pypi.tuna.tsinghua.edu.cn/simple pip install msgpack -i https://pypi.tuna.tsinghua.edu.cn/simple pip install pyarrow -i https://pypi.tuna.tsinghua.edu.cn/simple pip install contextlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple pip install croniter -i https://pypi.tuna.tsinghua.edu.cn/simple pip install retry -i https://pypi.tuna.tsinghua.edu.cn/simple pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple pip install isodate -i https://pypi.tuna.tsinghua.edu.cn/simple #这个地方markupsafe 2.1.1版本会报错,用低版本的2.0.1覆盖掉 (superset) [root@paratera128 superset]# pip show markupsafe Name: MarkupSafe Version: 2.1.1 Summary: Safely add untrusted strings to HTML/XML markup. Home-page: https://palletsprojects.com/p/markupsafe/ Author: Armin Ronacher Author-email: armin.ronacher@active-4.com License: BSD-3-Clause Location: /opt/module/miniconda3/envs/superset/lib/python3.7/site-packages Requires: Required-by: Jinja2, Mako, WTForms (superset) [root@paratera128 superset]# python -m pip install markupsafe==2.0.1
报错:No PIL installation found 解决
(superset) [root@paratera128 local]# pip install pillow -i https://pypi.tuna.tsinghua.edu.cn/simple (superset) [root@paratera128 local]# superset version logging was configured successfully 2022-07-25 02:20:07,976:INFO:superset.utils.logging_configurator:logging was configured successfully 2022-07-25 02:20:07,983:INFO:root:Configured event logger of type
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Superset 1.3.0 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 到这里Superset conda虚拟环境模式安装完成
访问Superset
地址:http://ip:8088
账号密码:admin/admin
连接数据库
MySQL
Trino
连接Trino需要安装相关驱动:https://superset.apache.org/docs/databases/installing-database-drivers/
需要先安装pip,并且版本需求比较高,安装后需要更新
[root@paratera128 yum]# yum -y install epel-release [root@paratera128 yum]# yum -y install python-pip [root@paratera128 yum]# wget https://bootstrap.pypa.io/pip/2.7/get-pip.py [root@paratera128 yum]# python3 get-pip.py #下载驱动 [root@paratera128 yum]# pip install sqlalchemy-trino #如果是docker部署的superset,还需要把驱动加载到docker容器 [root@paratera128 superset] touch ./docker/requirements-local.txt [root@paratera128 superset] echo "sqlalchemy-trino" >> ./docker/requirements-local.txt [root@paratera128 superset] docker-compose -f docker-compose-non-dev.yml build --force-rm [root@paratera128 superset] docker-compose -f docker-compose-non-dev.yml up
报表设计
最普通的Table
看图说话
柱状图
需求:统计一个月内每天的新老用户数
饼图
统计各个频段数据占比
面板
我们可以看到以上创建的Chart组件已经保存到同一个面板了
把Chart拖拽进来即可
API二次开发
参考文档:https://superset.apache.org/docs/api
比如我们想查询上面创建的四个Charts集合,可以使用这个接口
不带参数的话就默认输出所有列,所有数据
猜你喜欢
网友评论
- 搜索
- 最新文章
- 热门文章