Commit 7cdca7bf authored by lenboo's avatar lenboo
Browse files

merge from upstream

parents 8dd7ce29 ad17f789
Loading
Loading
Loading
Loading
+62 −40
Original line number Diff line number Diff line
@@ -4,68 +4,89 @@ Easy Scheduler

> Easy Scheduler for Big Data

**设计特点:** 一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中`开箱即用`
其主要目标如下:
 - 以DAG图的方式将Task按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态
 - 支持丰富的任务类型:Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql),Python,Sub_Process、Procedure等
 - 支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill任务等操作
 - 支持工作流优先级、任务优先级及任务的故障转移及任务超时告警/失败
 - 支持工作流全局参数及节点自定义参数设置
 - 支持资源文件的在线上传/下载,管理等,支持在线文件创建、编辑
 - 支持任务日志在线查看及滚动、在线下载日志等
 - 实现集群HA,通过Zookeeper实现Master集群和Worker集群去中心化
 - 支持对`Master/Worker` cpu load,memory,cpu在线查看
 - 支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计
 - 支持补数
 - 支持多租户
 - 支持国际化
 - 还有更多等待伙伴们探索
### Design features: 

### 与同类调度系统的对比
A distributed and easy-to-expand visual DAG workflow scheduling system. Dedicated to solving the complex dependencies in data processing, making the scheduling system `out of the box` for data processing.
Its main objectives are as follows:

![调度系统对比](http://geek.analysys.cn/static/upload/47/2019-03-01/9609ca82-cf8b-4d91-8dc0-0e2805194747.jpeg)
 - Associate the Tasks according to the dependencies of the tasks in a DAG graph, which can visualize the running state of task in real time.
 - Support for many task types: Shell, MR, Spark, SQL (mysql, postgresql, hive, sparksql), Python, Sub_Process, Procedure, etc.
 - Support process scheduling, dependency scheduling, manual scheduling, manual pause/stop/recovery, support for failed retry/alarm, recovery from specified nodes, Kill task, etc.
 - Support process priority, task priority and task failover and task timeout alarm/failure
 - Support process global parameters and node custom parameter settings
 - Support online upload/download of resource files, management, etc. Support online file creation and editing
 - Support task log online viewing and scrolling, online download log, etc.
 - Implement cluster HA, decentralize Master cluster and Worker cluster through Zookeeper
 - Support online viewing of `Master/Worker` cpu load, memory, cpu
 - Support process running history tree/gantt chart display, support task status statistics, process status statistics
 - Support for complement
 - Support for multi-tenant
 - Support internationalization
 - There are more waiting partners to explore

### 系统部分截图

![](http://geek.analysys.cn/static/upload/221/2019-03-29/0a9dea80-fb02-4fa5-a812-633b67035ffc.jpeg)
### Comparison with similar scheduler systems

![](http://geek.analysys.cn/static/upload/221/2019-04-01/83686def-a54f-4169-8cae-77b1f8300cc1.png)

![](http://geek.analysys.cn/static/upload/221/2019-03-29/83c937c7-1793-4d7a-aa28-b98460329fe0.jpeg)
  | EasyScheduler | Azkaban | Airflow
-- | -- | -- | --
**Stability** |   |   |  
Single point of failure | Decentralized   multi-master and multi-worker | Yes     Single Web and Scheduler Combination Node | Yes.     Single Scheduler
Additional HA requirements | Not   required (HA is supported by itself) | DB | Celery   / Dask / Mesos + Load Balancer + DB
Overload processing | Task   queue mechanism, the number of schedulable tasks on a single machine can be   flexibly configured, when too many tasks will be cached in the task queue,   will not cause machine jam. | Jammed   the server when there are too many tasks | Jammed   the server when there are too many tasks
**Easy to use** |   |   |  
DAG Monitoring Interface | Visualization   process defines key information such as task status, task type, retry times,   task running machine, visual variables and so on at a glance. | Only   task status can be seen | Can't   visually distinguish task types
Visual process definition | Yes     All process definition operations are visualized, dragging tasks to draw   DAGs, configuring data sources and resources. At the same time, for   third-party systems, the api mode operation is provided. | No     DAG and custom upload via custom DSL | No     DAG is drawn through Python code, which is inconvenient to use, especially   for business people who can't write code.
Quick deployment | One-click   deployment | Complex   clustering deployment | Complex   clustering deployment
**Features** |   |   |  
Suspend and resume | Support   pause, recover operation | No     Can only kill the workflow first and then re-run | No     Can only kill the workflow first and then re-run
Whether to support multiple tenants | Users   on easyscheduler can achieve many-to-one or one-to-one mapping relationship   through tenants and Hadoop users, which is very important for scheduling   large data jobs. "     Supports traditional shell tasks, while supporting large data platform task   scheduling: MR, Spark, SQL (mysql, postgresql, hive, sparksql), Python,   Procedure, Sub_Process | No | No
Task type | Supports   traditional shell tasks, and also support big data platform task scheduling:   MR, Spark, SQL (mysql, postgresql, hive, sparksql), Python, Procedure,   Sub_Process | shell、gobblin、hadoopJava、java、hive、pig、spark、hdfsToTeradata、teradataToHdfs | BashOperator、DummyOperator、MySqlOperator、HiveOperator、EmailOperator、HTTPOperator、SqlOperator
Compatibility | Support   the scheduling of big data jobs like spark, hive, Mr. At the same time, it is   more compatible with big data business because it supports multiple tenants. | Because   it does not support multi-tenant, it is not flexible enough to use business   in big data platform. | Because   it does not support multi-tenant, it is not flexible enough to use business   in big data platform.
**Scalability** |   |   |  
Whether to support custom task types | Yes | Yes | Yes
Is Cluster Extension Supported? | Yes     The scheduler uses distributed scheduling, and the overall scheduling   capability will increase linearly with the scale of the cluster. Master and  Worker support dynamic online and offline. | Yes,   but complicated     Executor horizontal extend | Yes,   but complicated     Executor horizontal extend

### 文档

- <a href="https://analysys.github.io/easyscheduler_docs_cn/后端部署文档.html" target="_blank">后端部署文档</a>

- <a href="https://analysys.github.io/easyscheduler_docs_cn/前端部署文档.html" target="_blank">前端部署文档</a>

- [**使用手册**](https://analysys.github.io/easyscheduler_docs_cn/系统使用手册.html?_blank "系统使用手册") 
### System partial screenshot

- [**升级文档**](https://analysys.github.io/easyscheduler_docs_cn/升级文档.html?_blank "升级文档") 
![image](https://user-images.githubusercontent.com/48329107/61368744-1f5f3b00-a8c1-11e9-9cf1-10f8557a6b3b.png)

- <a href="http://52.82.13.76:8888" target="_blank">我要体验</a> 
![image](https://user-images.githubusercontent.com/48329107/61368966-9dbbdd00-a8c1-11e9-8dcc-a9469d33583e.png)

更多文档请参考 <a href="https://analysys.github.io/easyscheduler_docs_cn/" target="_blank">easyscheduler中文在线文档</a>
![image](https://user-images.githubusercontent.com/48329107/61372146-f347b800-a8c8-11e9-8882-66e8934ada23.png)


### 近期研发计划
### Document

EasyScheduler的工作计划:<a href="https://github.com/analysys/EasyScheduler/projects/1" target="_blank">研发计划</a> ,其中 In Develop卡片下是1.0.2版本的功能,TODO卡片是待做事项(包括 feature ideas)
- <a href="https://analysys.github.io/easyscheduler_docs_cn/后端部署文档.html" target="_blank">Backend deployment documentation</a>

### 贡献代码
- <a href="https://analysys.github.io/easyscheduler_docs_cn/前端部署文档.html" target="_blank">Front-end deployment documentation</a>

非常欢迎大家来参与贡献代码,提交代码流程请参考:
https://github.com/analysys/EasyScheduler/blob/master/CONTRIBUTING.md
- [**User manual**](https://analysys.github.io/easyscheduler_docs_cn/系统使用手册.html?_blank "User manual") 

- [**Upgrade document**](https://analysys.github.io/easyscheduler_docs_cn/升级文档.html?_blank "Upgrade document") 

- <a href="http://52.82.13.76:8888" target="_blank">Online Demo</a> 

### 感谢
More documentation please refer to <a href="https://analysys.github.io/easyscheduler_docs_cn/" target="_blank">[EasyScheduler online documentation]</a>

Easy Scheduler使用了很多优秀的开源项目,比如google的guava、guice、grpc,netty,ali的bonecp,quartz,以及apache的众多开源项目等等,
正是由于站在这些开源项目的肩膀上,才有Easy Scheduler的诞生的可能。对此我们对使用的所有开源软件表示非常的感谢!我们也希望自己不仅是开源的受益者,也能成为开源的
贡献者,于是我们决定把易调度贡献出来,并承诺长期维护。也希望对开源有同样热情和信念的伙伴加入进来,一起为开源献出一份力!
### Recent R&D plan
Work plan of Easy Scheduler: [R&D plan](https://github.com/analysys/EasyScheduler/projects/1), where `In Develop` card is the features of 1.1.0 version , TODO card is to be done (including feature ideas)

### How to contribute code

Welcome to participate in contributing code, please refer to the process of submitting the code:
https://github.com/analysys/EasyScheduler/blob/master/CONTRIBUTING.md

### 帮助
### Thanks

Easy Scheduler uses a lot of excellent open source projects, such as google guava, guice, grpc, netty, ali bonecp, quartz, and many open source projects of apache, etc.
It is because of the shoulders of these open source projects that the birth of the Easy Scheduler is possible. We are very grateful for all the open source software used! We also hope that we will not only be the beneficiaries of open source, but also be open source contributors, so we decided to contribute to easy scheduling and promised long-term updates. I also hope that partners who have the same passion and conviction for open source will join in and contribute to open source!

### Help
The fastest way to get response from our developers is to submit issues,  or add our wechat : 510570367
 
 
@@ -75,3 +96,4 @@ The fastest way to get response from our developers is to submit issues, or ad



README_zh_CN.md

0 → 100644
+77 −0
Original line number Diff line number Diff line
Easy Scheduler
============
[![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html)

> Easy Scheduler for Big Data

**设计特点:** 一个分布式易扩展的可视化DAG工作流任务调度系统。致力于解决数据处理流程中错综复杂的依赖关系,使调度系统在数据处理流程中`开箱即用`
其主要目标如下:
 - 以DAG图的方式将Task按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态
 - 支持丰富的任务类型:Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql),Python,Sub_Process、Procedure等
 - 支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill任务等操作
 - 支持工作流优先级、任务优先级及任务的故障转移及任务超时告警/失败
 - 支持工作流全局参数及节点自定义参数设置
 - 支持资源文件的在线上传/下载,管理等,支持在线文件创建、编辑
 - 支持任务日志在线查看及滚动、在线下载日志等
 - 实现集群HA,通过Zookeeper实现Master集群和Worker集群去中心化
 - 支持对`Master/Worker` cpu load,memory,cpu在线查看
 - 支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计
 - 支持补数
 - 支持多租户
 - 支持国际化
 - 还有更多等待伙伴们探索

### 与同类调度系统的对比

![调度系统对比](http://geek.analysys.cn/static/upload/47/2019-03-01/9609ca82-cf8b-4d91-8dc0-0e2805194747.jpeg)

### 系统部分截图

![](http://geek.analysys.cn/static/upload/221/2019-03-29/0a9dea80-fb02-4fa5-a812-633b67035ffc.jpeg)

![](http://geek.analysys.cn/static/upload/221/2019-04-01/83686def-a54f-4169-8cae-77b1f8300cc1.png)

![](http://geek.analysys.cn/static/upload/221/2019-03-29/83c937c7-1793-4d7a-aa28-b98460329fe0.jpeg)

### 文档

- <a href="https://analysys.github.io/easyscheduler_docs_cn/后端部署文档.html" target="_blank">后端部署文档</a>

- <a href="https://analysys.github.io/easyscheduler_docs_cn/前端部署文档.html" target="_blank">前端部署文档</a>

- [**使用手册**](https://analysys.github.io/easyscheduler_docs_cn/系统使用手册.html?_blank "系统使用手册") 

- [**升级文档**](https://analysys.github.io/easyscheduler_docs_cn/升级文档.html?_blank "升级文档") 

- <a href="http://52.82.13.76:8888" target="_blank">我要体验</a> 

更多文档请参考 <a href="https://analysys.github.io/easyscheduler_docs_cn/" target="_blank">easyscheduler中文在线文档</a>


### 近期研发计划

EasyScheduler的工作计划:<a href="https://github.com/analysys/EasyScheduler/projects/1" target="_blank">研发计划</a> ,其中 In Develop卡片下是1.1.0版本的功能,TODO卡片是待做事项(包括 feature ideas)

### 贡献代码

非常欢迎大家来参与贡献代码,提交代码流程请参考:
https://github.com/analysys/EasyScheduler/blob/master/CONTRIBUTING.md


### 感谢

Easy Scheduler使用了很多优秀的开源项目,比如google的guava、guice、grpc,netty,ali的bonecp,quartz,以及apache的众多开源项目等等,
正是由于站在这些开源项目的肩膀上,才有Easy Scheduler的诞生的可能。对此我们对使用的所有开源软件表示非常的感谢!我们也希望自己不仅是开源的受益者,也能成为开源的
贡献者,于是我们决定把易调度贡献出来,并承诺长期维护。也希望对开源有同样热情和信念的伙伴加入进来,一起为开源献出一份力!


### 帮助
The fastest way to get response from our developers is to submit issues,   or add our wechat : 510570367
 






+31 −0
Original line number Diff line number Diff line
Easy Scheduler Release 1.0.4
===
Easy Scheduler 1.0.4是1.x系列中的第五个版本。

增强:
===
- [[EasyScheduler-482]](https://github.com/analysys/EasyScheduler/issues/482)sql任务中的邮件标题增加了对自定义变量的支持
- [[EasyScheduler-483]](https://github.com/analysys/EasyScheduler/issues/483)sql任务中的发邮件失败,则此sql任务为失败
- [[EasyScheduler-484]](https://github.com/analysys/EasyScheduler/issues/484)修改sql任务中自定义变量的替换规则,支持多个单引号和双引号的替换
- [[EasyScheduler-485]](https://github.com/analysys/EasyScheduler/issues/485)创建资源文件时,增加对该资源文件是否在hdfs上已存在的验证
- [[EasyScheduler-486]](https://github.com/analysys/EasyScheduler/issues/486)shell进程退出,yarn状态非终态等待判断

修复
===
- [[EasyScheduler-198]](https://github.com/analysys/EasyScheduler/issues/198) 流程定义列表根据定时状态和更新时间进行排序
- [[EasyScheduler-419]](https://github.com/analysys/EasyScheduler/issues/419) 修复在线创建文件,hdfs文件未创建,却返回成功
- [[EasyScheduler-481]](https://github.com/analysys/EasyScheduler/issues/481)修复job不存在定时无法下线的问题
- [[EasyScheduler-425]](https://github.com/analysys/EasyScheduler/issues/425) kill任务时增加对其子进程的kill
- [[EasyScheduler-422]](https://github.com/analysys/EasyScheduler/issues/422) 修复更新资源文件时更新时间和大小未更新的问题
- [[EasyScheduler-431]](https://github.com/analysys/EasyScheduler/issues/431) 修复删除租户时,如果未启动hdfs,则删除租户失败的问题


感谢:
===
最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生:

Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, feloxx, coding-now, hymzcn, nysyxxg, chgxtony, gj-zhang, xianhu, sunnyingit,
zhengqiangtan

以及微信群里众多的热心伙伴!在此非常感谢!
+23 −0
Original line number Diff line number Diff line
Easy Scheduler Release 1.0.4
===
Easy Scheduler 1.0.4是1.x系列中的第个版本。

增强:
===
- [[EasyScheduler-597]](https://github.com/analysys/EasyScheduler/issues/597)child process cannot extend father's receivers and cc

修复
===
- [[EasyScheduler-516]](https://github.com/analysys/EasyScheduler/issues/516)The task instance of MR cannot stop in some cases
- [[EasyScheduler-594]](https://github.com/analysys/EasyScheduler/issues/594)soft kill task 后 进程依旧存在(父进程 子进程) 


感谢:
===
最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生:

Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, feloxx, coding-now, hymzcn, nysyxxg, chgxtony, gj-zhang, xianhu, sunnyingit,
zhengqiangtan, chinashenkai

以及微信群里众多的热心伙伴!在此非常感谢!
+55 −0
Original line number Diff line number Diff line
Easy Scheduler Release 1.1.0
===
Easy Scheduler 1.1.0是1.x系列中的第六个版本。

新特性:
===
- [[EasyScheduler-391](https://github.com/analysys/EasyScheduler/issues/391)] run a process under a specified tenement user
- [[EasyScheduler-288](https://github.com/analysys/EasyScheduler/issues/288)] Feature/qiye_weixin
- [[EasyScheduler-189](https://github.com/analysys/EasyScheduler/issues/189)] Kerberos等安全支持
- [[EasyScheduler-398](https://github.com/analysys/EasyScheduler/issues/398)]管理员,有租户(install.sh设置默认租户),可以创建资源、项目和数据源(限制有一个管理员)
- [[EasyScheduler-293](https://github.com/analysys/EasyScheduler/issues/293)]点击运行流程时候选择的参数,没有地方可查看,也没有保存
- [[EasyScheduler-401](https://github.com/analysys/EasyScheduler/issues/401)]定时很容易定时每秒一次,定时完成以后可以在页面显示一下下次触发时间
- [[EasyScheduler-493](https://github.com/analysys/EasyScheduler/pull/493)]add datasource kerberos auth and FAQ modify and add resource upload s3


增强:
===
- [[EasyScheduler-227](https://github.com/analysys/EasyScheduler/issues/227)] upgrade spring-boot to 2.1.x and spring to 5.x
- [[EasyScheduler-434](https://github.com/analysys/EasyScheduler/issues/434)] worker节点数量 zk和mysql中不一致
- [[EasyScheduler-435](https://github.com/analysys/EasyScheduler/issues/435)]邮箱格式的验证
- [[EasyScheduler-441](https://github.com/analysys/EasyScheduler/issues/441)] 禁止运行节点加入已完成节点检测
- [[EasyScheduler-400](https://github.com/analysys/EasyScheduler/issues/400)] 首页页面,队列统计不和谐,命令统计无数据
- [[EasyScheduler-395](https://github.com/analysys/EasyScheduler/issues/395)] 对于容错恢复的流程,状态不能为 **正在运行
- [[EasyScheduler-529](https://github.com/analysys/EasyScheduler/issues/529)] optimize poll task from zookeeper
- [[EasyScheduler-242](https://github.com/analysys/EasyScheduler/issues/242)]worker-server节点获取任务性能问题
- [[EasyScheduler-352](https://github.com/analysys/EasyScheduler/issues/352)]worker 分组, 队列消费问题
- [[EasyScheduler-461](https://github.com/analysys/EasyScheduler/issues/461)]查看数据源参数,需要加密账号密码信息
- [[EasyScheduler-396](https://github.com/analysys/EasyScheduler/issues/396)]Dockerfile优化,并关联Dockerfile和github实现自动打镜像
- [[EasyScheduler-389](https://github.com/analysys/EasyScheduler/issues/389)]service monitor cannot find the change of master/worker
- [[EasyScheduler-511](https://github.com/analysys/EasyScheduler/issues/511)]support recovery process from stop/kill nodes.
- [[EasyScheduler-399](https://github.com/analysys/EasyScheduler/issues/399)]HadoopUtils指定用户操作,而不是 **部署用户

修复:
===
- [[EasyScheduler-394](https://github.com/analysys/EasyScheduler/issues/394)] master&worker部署在同一台机器上时,如果重启master&worker服务,会导致之前调度的任务无法继续调度
- [[EasyScheduler-469](https://github.com/analysys/EasyScheduler/issues/469)]Fix naming errors,monitor page
- [[EasyScheduler-392](https://github.com/analysys/EasyScheduler/issues/392)]Feature request: fix email regex check
- [[EasyScheduler-405](https://github.com/analysys/EasyScheduler/issues/405)]定时修改/添加页面,开始时间和结束时间不能相同
- [[EasyScheduler-517](https://github.com/analysys/EasyScheduler/issues/517)]补数 - 子工作流 - 时间参数 
- [[EasyScheduler-532](https://github.com/analysys/EasyScheduler/issues/532)]python节点不执行的问题 
- [[EasyScheduler-543](https://github.com/analysys/EasyScheduler/issues/543)]optimize datasource connection params safety
- [[EasyScheduler-569](https://github.com/analysys/EasyScheduler/issues/569)]定时任务无法真正停止
- [[EasyScheduler-463](https://github.com/analysys/EasyScheduler/issues/463)]邮箱验证不支持非常见后缀邮箱




感谢:
===
最后但最重要的是,没有以下伙伴的贡献就没有新版本的诞生:

Baoqi, jimmy201602, samz406, petersear, millionfor, hyperknob, fanguanqun, yangqinlong, qq389401879, chgxtony, Stanfan, lfyee, thisnew, hujiang75277381, sunnyingit, lgbo-ustc, ivivi, lzy305, JackIllkid, telltime, lipengbo2018, wuchunfu, telltime

以及微信群里众多的热心伙伴!在此非常感谢!
Loading