Hot Topics on Data Center (HotDC) 2018

Keynote Session

Accelerate Machine Intelligence: An Edge to Cloud Continuum

Hadi Esmaeilzadeh - UCSD

Background

open source: http://act-lab.org/artifacts

Data grows at an unprecedented rate
new landscape of computing: personalize and targeted experience for users
growing gap between data and compute
power/energy efficiency is a primary concern
approximate computing
AxGames: https://www.researchgate.net/publication/303905276_AxGames_Towards_Crowdsourcing_Quality_Target_Determination_in_Approximate_Computing
machines learn to extract insights from data - two disjoin solutions for ml
distrubute computer + FPGA / ASIC chips
don‘t use vhdl / verlog language in the full stack for normal user

CoSMIC stack

how to distribute

understanding machine learning - solving optimize problem
abstraction between algorithm and acceleration system - parallelized stochastic gradient descent solver(to fpga gpu asic cgra xeon phi)
leverage linearity of differentiation for distributed learning
programming and compilation
- build a new language for math
- dataflow graph generation

how to design customizable accelerator

multi-threading acceleration
connectivity and bussing
PE architecture - make hardware simple

how to reduce overhead of distributed coordination

specialized system software in CoSIMC

benchmarks

16-node CoSIMC with UltraScale+FPGA offer 18.8x speedup over 16-node spark with E3 skylake cpu
using FPGA (66%) and software (34%) for speedup

RoboX Accelerator Architecture

DNNs tolerate low-bitwidth operations - bit-level

Making Cloud Systems Reliable and Dependable: Challenges and Opportunities

Lidong Zhou- MSRA

Background

system reliability:

Fault Tolerance
Redundancies
State Machine Replication
Paxos
Erasure Coding

Real-World Gray Failures in Cloud

redundancies in data center networking
active device and link failure localization in data center
NetBouncer: large-Scale path probing and diagnosis
NetBouncer: leverage the power of scale
root cause of the gray failure - stuck due to network issue - heart beat still normal (request stuck)
Insight: should detect what the requesters errors
- critical gray failure are ovserviable
- from error handling to error reporting

Solution - Panorama

Analysis - automatically covert a software component into an in-situ observer
Runtime - observer send to local observation store(LOS)
- locate ob-boundary
- observations not always direct
- observations split to ob-origin & ob-sink
- match ob-origin & ob-sink
Detect what "requesters" see
- failure that matter are observable to requesters
- turn error handlers into error reporters
- enables construction of in-situ observers
- https://github.com/ryanphuang/panorma

Reliability of Large-Scale Distributed Systems

foundation reliability
rethink cloud reliability: new theory & new method
understand gray failure
systematic and comprehensive observations

paper: Gray Failure: The Achilles‘ Heel of Cloud-Scale Systems

Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!

Haibo Chen - SJTU

Background

(Distributed) Transactions were slow
High cost for distributed TX - Usually 10s~100s of thousands of TPS - (SIGMOD‘12)
only 4% of wall-clock time spent in useful data processing

new features:

RDMA: remote direct memory access
- ultra low latency(5us)
- ultra high throughput
NVM: Non-volatile memory

An Active Line of Research of RDMA-enabled TX

DrTM - DrTM(SOSP 2015) DrTM-R(EuroSys 2016) DrTM-B(USENIX ATC 2017)
FaRM - FaRM-KV(NSDI 2014) FaRM-TX(SOSP 2015)
FaSST(OSDI 2016)
LITE(SOSP 2017)

Transaction(TX)s

protocols - OCC,2PL,SI...
impl on hardware devices - CX3,CX4,CX5,ROCE, one-side, two-side....
OLTP workloads - TPC-C, TPC-E, TATP, Smallbank

Main: Use RDMA in TXs

outlet:

RDMA primitive-level analysis
Phase-by-phase analysis for TX
DrTM+H: Putting it all together

content:

phase: Exe/Val/Log/Commit
offloading with one-side improves the performance
one-sided primitive has good scalability on modern RNIC
Execution framework & DrTM+H:https://github/com/SJTU-IPADS/drtmh

RDMA in Data Centers: from Cloud Computing to Machine Learning

Chuanxiong Guo - ByteDance

Background

Data Center Network (DCN) offer lot services
- single ownership
- large scale
- bisection bandwidth
TCP/IP not working well
- latency
- bandwidth
- processing overhead(40G) - 12% CPU at receiver & 6% CPU at sender

RDMA over Commodity Ethernet (RoCEv2)

no CPU overhead
single QP, 88Gb/s 1.7% CPU usage (TCP 8 connection 30-50Gb/s, client 2.6% & server 4.3% CPU)
RoCEv2 needs a lossless ethernet network
- PFC(priority-based flow control) hop-by-hop flow control
- DCQCN - sender-switch-receiver (RP-CP-NP)
the slow-receiver symptom - ToR tot NIC is 40Gb/s & NIC to server is 64Gb/s. NIC may generate large number of PFC pause frames

RDMA for DNN Training Acceleration

understanding using DNN
DNN Training: BP
Distributed ML training, GPUs, with mini-batch
RDMA acceleration : ResNet \ RNNs \ DNN (rdma performance better than tcp)

Highlighted Research Session

Congestion Control Mechanisms in Data Center Networks

Wei Bai - MSRA

DCN中实现低时延

排队时延 -PIAS(NSDI 2015)
丢包重发时延 - TLT

PIAS

Flow completion Time (FCT)是关键问题
流信息不能假设为已知、可以在现有设备上快速部署
PIAS performs Multi-level feedback queue (MLFQ) to emulate shortest job first (SJF)
three function in pias:
- package tagging
- switch
- rate control

TLT

同时达到Lossy & Loss-Less两种网络的好处
using PFC to eliminate congestion packet losses
packet loss :
- middle - fast retransmissions
- tail - Timeout retransmissions
- 识别重要包, 当交换机队列超过阈值时丢掉非重要包

Understanding the challenges of Scaling Distributed DNN Training

Cheng Li - USTC

Deep Learning growth fast
DNN - Deep Neural Networks
benefit: more data / bigger models / more computation
Jeff Dean - Google

Distributed DNN

Model or data parallelism
- data parallelism is a primary choice
BSP / ASP - BSP is choice (ASP可能不收敛)
- Bulk Synchronous Parallel - 确定时间同步
- Asynchronous Parallel
net \ server \ other bottlenecks for parallelism
通过测试确定影响计算能力的制约条件
- 数据压缩传输带来的压缩开销
系统设计
- 弹性系统设计
- 短板效应 - 最终计算速度的制约
- 如何快速调整系统的规模等 - message bus流处理 - 用生产者消费者模型

Octopus: an RDMA-enable Distributed Persistent Memory File System

Youyou Lu - Tsinghua

分布式文件系统设计
非易失性内存 - 内存存储
DRAM Limitations
- Cell Density
- Refresh - 性能/功耗
NVDIMM内存 - 断电后存储数据
Intel 3D Xpoint - 接近内存的延迟, 高容量, 断电非易失
RDMA - 高性能环境下使用
DiskGluster - latency来自于HDD | MemGluster - latency来自于软件
RDMA-enable Distributed File System
- shard data mamangment
- New data flow strategies
- Efficient RPC design
- Concurrent control

Design

I/O处理
- 将所有NVMM组织为同一空间
- 降低DFS中的数据拷贝(7次降到4次)
- server扫描数据存储地址,client获取地址之后自己获取(将任务转嫁给client)
Metadata RPC
Collect-Dispatch Distributed Transaction
性能测试
- 局域网服务期间测试 - 带宽可以达到网络带宽的88%
- 在Hadoop平台下进行测试

Short Talk

Computer Organization and Design Course with FPGA Cloud, Ke Zhang (ICT, CAS)

新的技术AI \ IOT

提高新的软硬协同设计能力 - CPU\GPU\FPGA\GPU\ASIC

ZyForce平台 - 虚拟FPGA实验

ActionFlow：A Framework for Fast Multi-Robots Application Development, Jimin Han (UCAS)

国科大大四 - 2018.8开始

机器人应用快速开发

Labeled Network Stack, Yifan Shen (ICT, CAS)

Caching or Not: Rethinking Virtual File System for Non-Volatile Main Memory, Ying Wang (ICT, CAS)

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads, Chen Zheng (ICT, CAS)

原文地址：https://www.cnblogs.com/tinoryj/p/Hot-Topics-on-Data-Center-HotDC-2018.html

时间： 2024-11-06 11:06:41

Hot Topics on Data Center (HotDC) 2018的相关文章

Data Center Group

Data Center Group||----Sr. Admin Assistant|----Technical Assistant|----Executive Assistant||--Enterprise & HPC platform Group--Cloud Platforms Group--Network Platform Group--Health& Life Sciences--Silicon Photonics Solutions Group--Storage Group||

Data Center手册(4)：设计

基础架构拓扑图 Switching Path L3 routing at aggregation layer L2 switching at access layer L3 switch融合了三种功能: RP, router processor, 处理路由协议 SP, switch processor, 处理L2协议 ASIC, Application-specific integrated circuit专用集成电路,用于重写header的对于traffic forwarding有几种方法

Codeforces Gym 100513D D. Data Center 前缀和排序

D. Data Center Time Limit: 20 Sec Memory Limit: 256 MB 题目连接 http://codeforces.com/contest/560/problem/B Description The startup "Booble" has shown explosive growth and now it needs a new data center with the capacity of m petabytes. Booble can b

Data Center手册(2): 安全性

有个安全性有下面几种概念: Threat:威胁 Vulnerability: 安全隐患 Attack: 攻击有关Threat 常见的威胁有下面几种 DoS(Denial of Service拒绝服务攻击) Breach of confidential information 破解机密信息 Data theft or alteration 数据盗用和篡改 Unauthorized use of compute resources 未授权访问 Identity theft 身份盗用有关安全隐患

Data Center手册(1):架构

如图是数据中心的一个基本架构最上层是Internet Edge,也叫Edge Router,也叫Border Router,它提供数据中心与Internet的连接. 连接多个网络供应商来提供冗余可靠的连接对外通过BGP提供路由服务,使得外部可以访问内部的IP 对内通过iBGP提供路由服务,使得内部可以访问外部IP 提供边界安全控制,使得外部不能随意访问内部控制内部对外部的访问为了HA的需要,往往会有两个Border Router Typical enterprise Internet c

Data Center Manager Leveraging OpenStack

这是去年的一个基于OpenStack的数据中心管理软件的想法. Abstract OpenStack facilates users to provision and manage cloud services in a convenient way, including compute instances, storage and network. Meanwhile, data center requires a converged, uniformed management solutio

CodeForces-528C Data Center Drama

题目链接:CodeForces-528C Data Center Drama 题意给出一个无向图(连通,可能有重边和自环),要求加尽量少的边,并给每条边定向,使每个结点的入度和出度都是偶数. 思路对于度数为奇数的结点,加边依次连接,例如结点$1,2,3,4$的度数为奇数,则连接$(1,2)$,$(3, 4)$,使所有结点度数都为偶数,则为欧拉图. 如果此时边数为奇数,则对任一结点加个自环,这样可以构造出偶数长度的欧拉回路.沿着欧拉回路每隔一条边反向一次,可令结点每一条入边和出边变成两条入边或

Codeforces 527E Data Center Drama(欧拉回路)

题意: 给定一个无向图连通图,把这个的无向边变成有向边,并添加最少的有向边使这个图每个结点的出度为偶数. Solution: 题目很长,并且很多条件说的不太直接,确实不太好懂. 首先先看得到的无向图,是不是可以不加边就满足题目要求. 可以想到对于一个无向图,当所有点的度数为偶数时,图中存在欧拉回路.那么对于一个存在欧拉路的无向图似乎可以以某种方式构造出满足条件的有向边.假设图中有欧拉回路1 2 3 4 1, 可以构造边2->1,2->3,4->3,4->1满足条件. 而对于不存在欧

#296 (div.2) E. Data Center Drama

1.题目描述:点击打开链接 2.解题思路:本题要求每个点发出的两条路的方向都要相同,如果这样的路径不完整,需要添加尽量少的一些边使得满足该条件.实际上本题考查的就是欧拉回路,而且欧拉回路必须是偶数条路径.而欧拉回路存在的一个条件是度数为奇数的结点不能超过两个,但本题肯定不能存在度数为奇数的点,否则不可能满足题意. 因此所有结点的度数必须为偶数,如果发现有两个点的度数均为奇数,那么就把他们连起来.如果所有点的度数均为偶数但总的边数为奇数,那么就要加一条自环,使得路径总数为偶数. 接下来,就从起点出