Data Center Manager Leveraging OpenStack

这是去年的一个基于OpenStack的数据中心管理软件的想法。

Abstract

OpenStack facilates users to provision and manage cloud services in a convenient way, including compute instances, storage and network. Meanwhile, data center requires a converged, uniformed management solution to provision, monitor, manage and diagnostic servers, and even collaborate seamlessly with other existing IT solution.This data center manager addresses this requirement by providing an open-source, easy-to-customizing, converged management solution based on OpenStack technologies, plugin mechanisms.

Problem statement

There are many softwares managing machines in data center but it seems that none of them provide a solution perfectly resolving all the requirements in data center: management softwares provided by hardware vendors always focus on machines manufactured by them; the existing third-party softwares only provides high-level management due to no knowledges of specific hardwares.

As a result, IT department has to use several management sofwares, which brings about disadvantages: increasing learning cost; struggling with different software intefaces; no easy second-development or integration with the existing IT systems.

Our solution

The solution propose a new data center manager(henceforth, called DCM), which is customizable, modular and supporting plugins provided by third-party or hardware vendors and also leverages Ceilometer (telemetering), Ironic (baremetal), Glance (Image), Heat (collaboration) of OpenStack.

Customizable

First, this DCM provides RESTful API, besides traditional CLI and Web UI, which facilitates easy collaboration with other IT system, such as machine lifecycle management. And even user can develop mobile app to manage data center. Second, user can define his monitoring target, event trigger threshold and select corrective actions for this event if necessary in a template. Last but not least important, IT department, hardware vendor or third-party can easily add their specific solution based on their specific requirement or their specific hardware capabilities.

Modular

As OpenStack always does, the communication between components in a project is via AMQP or OSLO messages and that between different project via RESTful API. It will provide good isolation so that the failure of one component doesn’t affect other components in the same project or in other projects.

Pluginable

Plugin interface facilitates hardware vendor to implement failure handling or diagnostics solution based on their own hardware. And it also allow to integrated the brilliant features of current system management software available.

This data center manager provides 3 basic functionalities:

Deploy and Configure

Currently, 2 ways to deploy and configure are in plan: golden images and Puppet plus OpenStack Heat. The former one facilitates users to deploy on a large number of the identical hardwares with their own customized golden image, which leverages Glance; the latter is the current Puppet way and it can be integrated into this data center manager through OpenStack Heat. Also, we are watching for the development of OpenStack TripleO and plan to integrate TripleO into DCM.

Monitor/Manage

All the information and status of hardware devices are displayed and once any failure and the sign of failure occurs, DCM will notify user of this event and take corrective actions (such as retiring or rebooting this machine)  or preventative actions (such as marking the bad page unavailable in DIMM if there are a number of sticky bits) based on user’s customized policy. Meanwhile, critical software or services running on these physical machines are also be monitored in the same way. User can define their monitoring target/threshold/action in the template. DCM also provides some default templates based on the role of physical machines, such as mysql database server or apache web server. These funcationalities utilize OpenStack Ceilometer and Ironic.

Diagnostics

Sometimes user is notified a hardware error event and he might want to know the possible root cause and the status of the related hardware. Diagnostics functionality serves this purpose. Online diagnostics will do this without impacting  your business or Operating system’s running and offlne diagnostics will automatically rebooting this machine and enter into UEFI and execute hardware tests which might erasing the user data.

Here is the architecture diagram.

Evidence the solution works

OpenStack are used to provision, manage, monitor virtual
machines to provide cloud services, which manifest that these technologies can
work with virtual machines perfectly and as long as there is an interface which
works with physical machines, these technologies should be work with physical
machines. And OpenStack Ironic can work as this role. Although the original
purpose of Ironic is to provision Cloud services on physical machine, it can be
used as an interface to work with physical machines.

A working prototype has been implemented which address the
monitoring/manage functionality, which works in our local machine room. But
obviously, it need to be extended to address all the features mentioned above.

Competitive approaches

Currently, there are 2 kinds of system management softwares:
provided by hardware vendors and provided by third-party software vendors, but
neither of them are considered as perfectly addressing data center’s
requirement.

Softwares provided by hardware vendors might work well with
their own physical hardwares but they don’t work well with, or even don’t
support physical machines of other vendors and in most cases, one data centers
hosts physical machines provided by different vendors.

Softwares provided by third-party software vendors might work
with all physical machines but they just provide the general or high-level
features because they don’t know the details or the specifics of hardwares
provided by one vendors.

The biggest advantage of this DCM is software-defined in all
level. User can specify the monitor target and the threshold via template
mechanism; user can also define how they are monitored and the corresponding
corrective or preventive actions once an event occurs via plugin mechanism. In
addition, plugin mechanism allows hardware vendors to provide their specific
diagnostic solution based on their own hardware or firmware features.

References

Ceilometer architecture:  
http://docs.openstack.org/developer/ceilometer/architecture.html

Ironic architecture: 
http://docs.openstack.org/developer/ironic/dev/architecture.html

时间: 2024-10-13 11:38:19

Data Center Manager Leveraging OpenStack的相关文章

System Center 2016 Data Protection Manager 部署手册

1. 环境描述 活动目录服务器 计算机名:SH-DC-01 Active Directory:365vCloud.net IP地址:192.168.100.2/24 操作系统:Windows Server 2016 中文标准版 已安装角色:ADDS活动目录服务 SCDPM服务器 计算机名:SH-SCDPM-01 Active Directory:365vCloud.net IP地址:192.168.100.7/24 操作系统:Windows Server 2016 中文标准版 已安装角色:无 2

System Center 2012 Data Protection Manager(1)

Overview of DPM Features 基于磁盘的数据保护与恢复. 使用PowerShell进行命令行脚本的编写. 分发DPM代理的企业级部署方式. 配合OM的企业级监控. 基于磁带的备份及归档解决方案. 灾备方案,为运行windows系统的服务器提供了裸机恢复功能. 可以将DPM的数据库备份到磁带,也可以使用物理隔离的副DPM Server来保护主DPM Server. 如果使用副DPM Server,则可以直接从副DPM Server将数据恢复到受保护的计算机上.副DPM Serv

System Center 2012 R2 POC部署之Data Protection Manager备份配置

System Center 2012 R2 POC部署之Data Protection Manager备份配置 1. 添加磁盘 DPM服务器中用来给DPM备份使用的磁盘,只需要初始化并联机即可,无需划分分区 打开DPM控制台----管理----磁盘 点击添加, 选择磁盘,点击添加 点击确定 点击是,然后点击确定就添加完成好了. 2. 安装代理 打开DPM控制台---管理---带来---安装 选择"安装代理",点击下一步 选择要点击代理的计算机,然后点击添加 点击下一步 输入用于安装代理

System Center 2012 R2 POC部署之Data Protection Manager部署

System Center 2012 R2 POC部署之Data Protection Manager部署 DPM服务器使用的数据库也与Virtual Machina Manager共用,在安装之前,需要在数据库服务器上线安装DPM Remote SQL Prep. 一.准备工作 1. 在SQL服务器SCSQL上安装DPM Remote SQL Prep 在SCSQL服务器上载入DPM安装镜像,运行DPM安装程序,选择DPM Remote SQL Prep 接受许可条款,点击确定 安装成功,点击

Data Center Group

Data Center Group||----Sr. Admin Assistant|----Technical Assistant|----Executive Assistant||--Enterprise & HPC platform Group--Cloud Platforms Group--Network Platform Group--Health& Life Sciences--Silicon Photonics Solutions Group--Storage Group||

Data Center手册(4):设计

基础架构 拓扑图 Switching Path L3 routing at aggregation layer L2 switching at access layer L3 switch融合了三种功能: RP, router processor, 处理路由协议 SP, switch processor, 处理L2协议 ASIC, Application-specific integrated circuit专用集成电路,用于重写header的 对于traffic forwarding有几种方法

Codeforces Gym 100513D D. Data Center 前缀和 排序

D. Data Center Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 http://codeforces.com/contest/560/problem/B Description The startup "Booble" has shown explosive growth and now it needs a new data center with the capacity of m petabytes. Booble can b

Data Center手册(2): 安全性

有个安全性有下面几种概念: Threat:威胁 Vulnerability: 安全隐患 Attack: 攻击 有关Threat 常见的威胁有下面几种 DoS(Denial of Service拒绝服务攻击) Breach of confidential information 破解机密信息 Data theft or alteration 数据盗用和篡改 Unauthorized use of compute resources 未授权访问 Identity theft 身份盗用 有关安全隐患

Data Center手册(1):架构

如图是数据中心的一个基本架构 最上层是Internet Edge,也叫Edge Router,也叫Border Router,它提供数据中心与Internet的连接. 连接多个网络供应商来提供冗余可靠的连接 对外通过BGP提供路由服务,使得外部可以访问内部的IP 对内通过iBGP提供路由服务,使得内部可以访问外部IP 提供边界安全控制,使得外部不能随意访问内部 控制内部对外部的访问 为了HA的需要,往往会有两个Border Router Typical enterprise Internet c