Akka(17): Stream:数据流基础组件-Source,Flow,Sink简介

在大数据程序流行的今天,许多程序都面临着共同的难题:程序输入数据趋于无限大,抵达时间又不确定。一般的解决方法是采用回调函数(callback-function)来实现的,但这样的解决方案很容易造成“回调地狱(callback hell)”,即所谓的“goto-hell”:程序控制跳来跳去很难跟踪,特别是一些变量如果在回调函数中更改后产生不可预料的结果。数据流(stream)是一种解决问题的有效编程方式。Stream是一个抽象概念,能把程序数据输入过程和其它细节隐蔽起来,通过申明方式把数据处理过程描述出来,使整体程序逻辑更容易理解跟踪。当然,牺牲的是对一些运算细节的控制能力。我们在前面介绍过scalaz-stream,它与akka-stream的主要区别在于:

1、scalaz-stream是pull模式的,而akka-stream是push模式的。pull模式的缺点是接收数据效率问题,因为在这种模式里程序必须不断重复检测(polling)输入端口是否有数据存在。而push模式则会把数据推到输入端口后直接进入程序,但如果数据源头动作太快程序无法及时处理所有推送的数据时就会造成所谓的数据溢出问题,遗失数据。不过akka-stream实现了reactive-stream的back-pressure规范:数据发送方和接收方之间互动提示,使过快的数据产生能按接收方要求慢下来甚至暂时停下来。

2、scalaz-sstream和akka-stream的数据流都是一种申明式的数据处理流程描述,属于一种运算方案,最终都需要某种运算器来对数据流按运算方案进行具体的运算,得出运算结果和产生副作用。scalaz-stream的运算器是自备的函数式程序,特点是能很好的控制线程使用和进行并行运算。akka-stream的运算器是materializer。materializer在actor系统上运行,具备了actor模式程序的优点包括:消息驱动、集群运算、监管策略(SupervisorStrategy)等等。

akka-stream的数据流是由三类基础组件组合而成,不同的组合方式代表不同的数据处理及表达功能。三类组件分别是:

1、Source:数据源。akka-stream属于push模式,所以Source也就是Publisher(数据发布方),Source的形状SourceShape代表只有一个输出端口的形状。Source可以从单值、集合、某种Publisher或另一个数据流产生数据流的元素(stream-element),包括:

  /**
   * Helper to create [[Source]] from `Iterable`.
   * Example usage: `Source(Seq(1,2,3))`
   *
   * Starts a new `Source` from the given `Iterable`. This is like starting from an
   * Iterator, but every Subscriber directly attached to the Publisher of this
   * stream will see an individual flow of elements (always starting from the
   * beginning) regardless of when they subscribed.
   */
  def apply[T](iterable: immutable.Iterable[T]): Source[T, NotUsed] =
    single(iterable).mapConcat(ConstantFun.scalaIdentityFunction).withAttributes(DefaultAttributes.iterableSource)

  /**
   * Create a `Source` with one element.
   * Every connected `Sink` of this stream will see an individual stream consisting of one element.
   */
  def single[T](element: T): Source[T, NotUsed] =
    fromGraph(new GraphStages.SingleSource(element))

  /**
   * Helper to create [[Source]] from `Iterator`.
   * Example usage: `Source.fromIterator(() => Iterator.from(0))`
   *
   * Start a new `Source` from the given function that produces anIterator.
   * The produced stream of elements will continue until the iterator runs empty
   * or fails during evaluation of the `next()` method.
   * Elements are pulled out of the iterator in accordance with the demand coming
   * from the downstream transformation steps.
   */
  def fromIterator[T](f: () ⇒ Iterator[T]): Source[T, NotUsed] =
    apply(new immutable.Iterable[T] {
      override def iterator: Iterator[T] = f()
      override def toString: String = "() => Iterator"
    })

  /**
   * Starts a new `Source` from the given `Future`. The stream will consist of
   * one element when the `Future` is completed with a successful value, which
   * may happen before or after materializing the `Flow`.
   * The stream terminates with a failure if the `Future` is completed with a failure.
   */
  def fromFuture[T](future: Future[T]): Source[T, NotUsed] =
    fromGraph(new FutureSource(future))

  /**
   * Helper to create [[Source]] from `Publisher`.
   *
   * Construct a transformation starting with given publisher. The transformation steps
   * are executed by a series of [[org.reactivestreams.Processor]] instances
   * that mediate the flow of elements downstream and the propagation of
   * back-pressure upstream.
   */
  def fromPublisher[T](publisher: Publisher[T]): Source[T, NotUsed] =
    fromGraph(new PublisherSource(publisher, DefaultAttributes.publisherSource, shape("PublisherSource")))

  /**
   * A graph with the shape of a source logically is a source, this method makes
   * it so also in type.
   */
  def fromGraph[T, M](g: Graph[SourceShape[T], M]): Source[T, M] = g match {
    case s: Source[T, M]         ⇒ s
    case s: javadsl.Source[T, M] ⇒ s.asScala
    case other ⇒ new Source(
      LinearTraversalBuilder.fromBuilder(other.traversalBuilder, other.shape, Keep.right),
      other.shape)
  }

下面还有几个特殊的Source:

  /**
   * A `Source` with no elements, i.e. an empty stream that is completed immediately for every connected `Sink`.
   */
  def empty[T]: Source[T, NotUsed] = _empty
  private[this] val _empty: Source[Nothing, NotUsed] =
    Source.fromGraph(EmptySource)

  /**
   * Create a `Source` that will continually emit the given element.
   */
  def repeat[T](element: T): Source[T, NotUsed] = {
    val next = Some((element, element))
    unfold(element)(_ ⇒ next).withAttributes(DefaultAttributes.repeat)
  }

  /**
   * Creates [[Source]] that will continually produce given elements in specified order.
   *
   * Starts a new ‘cycled‘ `Source` from the given elements. The producer stream of elements
   * will continue infinitely by repeating the sequence of elements provided by function parameter.
   */
  def cycle[T](f: () ⇒ Iterator[T]): Source[T, NotUsed] = {
    val iterator = Iterator.continually { val i = f(); if (i.isEmpty) throw new IllegalArgumentException("empty iterator") else i }.flatten
    fromIterator(() ⇒ iterator).withAttributes(DefaultAttributes.cycledSource)
  }

2、Sink:数据终端。属于数据元素的使用方,主要作用是消耗数据流中的元素。SinkShape是有一个输入端的数据流形状。Sink消耗流元素的例子有:

  /**
   * A `Sink` that will consume the stream and discard the elements.
   */
  def ignore: Sink[Any, Future[Done]] = fromGraph(GraphStages.IgnoreSink)

  /**
   * A `Sink` that will invoke the given procedure for each received element. The sink is materialized
   * into a [[scala.concurrent.Future]] will be completed with `Success` when reaching the
   * normal end of the stream, or completed with `Failure` if there is a failure signaled in
   * the stream..
   */
  def foreach[T](f: T ⇒ Unit): Sink[T, Future[Done]] =
    Flow[T].map(f).toMat(Sink.ignore)(Keep.right).named("foreachSink")

注意,akka-stream实际是在actor上进行运算的。actor的内部状态最终可以形成运算结果。上面的例子可以得出Sink的运算结果是Future[??]类型的。

3、Flow:数据处理节点。对通过输入端口输入数据流的元素进行转变处理(transform)后经过输出端口输出。FlowShape有一个输入端和一个输出端。

在akka-stream里数据流组件一般被称为数据流图(graph)。我们可以用许多数据流图组成更大的stream-graph。

akka-stream最简单的完整(或者闭合)线性数据流(linear-stream)就是直接把一个Source和一个Sink相接。这种方式代表一种对数据流所有元素的直接表现,如:source.runWith(Sink.foreach(println))。我们可以用Source.via来连接Flow,用Source.to连接Sink:

  override def via[T, Mat2](flow: Graph[FlowShape[Out, T], Mat2]): Repr[T] = viaMat(flow)(Keep.left)

  override def viaMat[T, Mat2, Mat3](flow: Graph[FlowShape[Out, T], Mat2])(combine: (Mat, Mat2) ⇒ Mat3): Source[T, Mat3] = {
    val toAppend =
      if (flow.traversalBuilder eq Flow.identityTraversalBuilder)
        LinearTraversalBuilder.empty()
      else
        flow.traversalBuilder

    new Source[T, Mat3](
      traversalBuilder.append(toAppend, flow.shape, combine),
      SourceShape(flow.shape.out))
  }

  /**
   * Connect this [[akka.stream.scaladsl.Source]] to a [[akka.stream.scaladsl.Sink]],
   * concatenating the processing steps of both.
   */
  def to[Mat2](sink: Graph[SinkShape[Out], Mat2]): RunnableGraph[Mat] = toMat(sink)(Keep.left)

  /**
   * Connect this [[akka.stream.scaladsl.Source]] to a [[akka.stream.scaladsl.Sink]],
   * concatenating the processing steps of both.
   */
  def toMat[Mat2, Mat3](sink: Graph[SinkShape[Out], Mat2])(combine: (Mat, Mat2) ⇒ Mat3): RunnableGraph[Mat3] = {
    RunnableGraph(traversalBuilder.append(sink.traversalBuilder, sink.shape, combine))
  }

可以发现via和to分别是viaMat和toMat的简写,分别固定了Keep.left。意思是选择左边数据流图的运算结果。我们上面提过akka-stream是在actor系统里处理数据流元素的。在这个过程中同时可以用actor内部状态来产生运算结果。via和to连接了左右两个graph,并且选择了左边graph的运算结果。我们可以用viaMat和toMat来选择右边graph运算结果。这是通过combine: (Mat,Mat2)=>Mat3这个函数实现的。akka-stream提供了一个Keep对象来表达这种选择:

/**
 * Convenience functions for often-encountered purposes like keeping only the
 * left (first) or only the right (second) of two input values.
 */
object Keep {
  private val _left = (l: Any, r: Any) ⇒ l
  private val _right = (l: Any, r: Any) ⇒ r
  private val _both = (l: Any, r: Any) ⇒ (l, r)
  private val _none = (l: Any, r: Any) ⇒ NotUsed

  def left[L, R]: (L, R) ⇒ L = _left.asInstanceOf[(L, R) ⇒ L]
  def right[L, R]: (L, R) ⇒ R = _right.asInstanceOf[(L, R) ⇒ R]
  def both[L, R]: (L, R) ⇒ (L, R) = _both.asInstanceOf[(L, R) ⇒ (L, R)]
  def none[L, R]: (L, R) ⇒ NotUsed = _none.asInstanceOf[(L, R) ⇒ NotUsed]
}

既然提到运算结果的处理方式,我们就来看看Source,Flow,Sink的类型参数:

Source[+Out, +Mat]       //Out代表元素类型,Mat为运算结果类型
Flow[-In, +Out, +Mat]    //In,Out为数据流元素类型,Mat是运算结果类型
Sink[-In, +Mat]          //In是数据元素类型,Mat是运算结果类型

Keep对象提供的是对Mat的选择。上面源代码中to,toMat函数的返回结果都是RunnableGraph[Mat3],也就是说只有连接了Sink的数据流才能进行运算。RunnableGraph提供一个run()函数来运算数据流:

/**
 * Flow with attached input and output, can be executed.
 */
final case class RunnableGraph[+Mat](override val traversalBuilder: TraversalBuilder) extends Graph[ClosedShape, Mat] {
  override def shape = ClosedShape

  /**
   * Transform only the materialized value of this RunnableGraph, leaving all other properties as they were.
   */
  def mapMaterializedValue[Mat2](f: Mat ⇒ Mat2): RunnableGraph[Mat2] =
    copy(traversalBuilder.transformMat(f.asInstanceOf[Any ⇒ Any]))

  /**
   * Run this flow and return the materialized instance from the flow.
   */
  def run()(implicit materializer: Materializer): Mat = materializer.materialize(this)
...

上面shape = ClosedShape代表RunnableGraph的形状是闭合的(ClosedShape),意思是说:一个可运行的graph所有输人输出端口都必须是连接的。

下面我们就用一个最简单的线性数据流来做些详细解释:

import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import akka._
import scala.concurrent._

object SourceDemo extends App {
  implicit val sys=ActorSystem("demo")
  implicit val mat=ActorMaterializer()
  implicit val ec=sys.dispatcher

  val s1: Source[Int,NotUsed] = Source(1 to 10)
  val sink: Sink[Any,Future[Done]] = Sink.foreach(println)
  val rg1: RunnableGraph[NotUsed] = s1.to(sink)
  val rg2: RunnableGraph[Future[Done]]  = s1.toMat(sink)(Keep.right)
  val res1: NotUsed = rg1.run()

  Thread.sleep(1000)

  val res2: Future[Done] = rg2.run()
  res2.andThen {
    case _ => sys.terminate()
  }

}

我们把焦点放在特别注明的结果类型上面:Source的运算结果Mat类型是NotUsed,Sink的运算结果Mat类型是Future[Done]。从上面这段代码我们看到用toMat选择返回Sink的运算结果Future[Done]才能捕捉到运算终止节点。下面的另一个例子包括了一些组合动作:

  val seq = Seq[Int](1,2,3)
  def toIterator() = seq.iterator
  val flow1: Flow[Int,Int,NotUsed] = Flow[Int].map(_ + 2)
  val flow2: Flow[Int,Int,NotUsed] = Flow[Int].map(_ * 3)
  val s2 = Source.fromIterator(toIterator)
  val s3 = s1 ++ s2

  val s4: Source[Int,NotUsed] = s3.viaMat(flow1)(Keep.right)
  val s5: Source[Int,NotUsed] = s3.via(flow1).async.viaMat(flow2)(Keep.right)
  val s6: Source[Int,NotUsed] = s4.async.viaMat(flow2)(Keep.right)
  (s5.toMat(sink)(Keep.right).run()).andThen {case _ => sys.terminate()}

一般来讲,数据流元素的所有处理过程都合并在一个actor上进行(steps-fusing),这样可以免去actor之间的消息传递,但同时也会限制数据元素的并行处理。aync的作用是指定左边的graph在一个独立的actor上运行。注意:s6=s5。

从上面例子里的组合结果类型我们发现:把一个Flow连接到一个Source上形成了一个新的Source。

实际上我们可以用akka-stream Source提供的方法糖来直接运算数据流,如下:

  s1.runForeach(println)
  val fres = s6.runFold(0)(_ + _)
  fres.onSuccess{case a => println(a)}
  fres.andThen{case _ => sys.terminate()}

下面是Source中的一些runner:

 /**
   * Shortcut for running this `Source` with a fold function.
   * The given function is invoked for every received element, giving it its previous
   * output (or the given `zero` value) and the element as input.
   * The returned [[scala.concurrent.Future]] will be completed with value of the final
   * function evaluation when the input stream ends, or completed with `Failure`
   * if there is a failure signaled in the stream.
   */
  def runFold[U](zero: U)(f: (U, Out) ⇒ U)(implicit materializer: Materializer): Future[U] = runWith(Sink.fold(zero)(f))

  /**
   * Shortcut for running this `Source` with a foldAsync function.
   * The given function is invoked for every received element, giving it its previous
   * output (or the given `zero` value) and the element as input.
   * The returned [[scala.concurrent.Future]] will be completed with value of the final
   * function evaluation when the input stream ends, or completed with `Failure`
   * if there is a failure signaled in the stream.
   */
  def runFoldAsync[U](zero: U)(f: (U, Out) ⇒ Future[U])(implicit materializer: Materializer): Future[U] = runWith(Sink.foldAsync(zero)(f))

  /**
   * Shortcut for running this `Source` with a reduce function.
   * The given function is invoked for every received element, giving it its previous
   * output (from the second element) and the element as input.
   * The returned [[scala.concurrent.Future]] will be completed with value of the final
   * function evaluation when the input stream ends, or completed with `Failure`
   * if there is a failure signaled in the stream.
   *
   * If the stream is empty (i.e. completes before signalling any elements),
   * the reduce stage will fail its downstream with a [[NoSuchElementException]],
   * which is semantically in-line with that Scala‘s standard library collections
   * do in such situations.
   */
  def runReduce[U >: Out](f: (U, U) ⇒ U)(implicit materializer: Materializer): Future[U] =
    runWith(Sink.reduce(f))

  /**
   * Shortcut for running this `Source` with a foreach procedure. The given procedure is invoked
   * for each received element.
   * The returned [[scala.concurrent.Future]] will be completed with `Success` when reaching the
   * normal end of the stream, or completed with `Failure` if there is a failure signaled in
   * the stream.
   */
  // FIXME: Out => Unit should stay, right??
  def runForeach(f: Out ⇒ Unit)(implicit materializer: Materializer): Future[Done] = runWith(Sink.foreach(f))

它们的功能都是通过runWith实现的:

 /**
   * Connect this `Source` to a `Sink` and run it. The returned value is the materialized value
   * of the `Sink`, e.g. the `Publisher` of a [[akka.stream.scaladsl.Sink#publisher]].
   */
  def runWith[Mat2](sink: Graph[SinkShape[Out], Mat2])(implicit materializer: Materializer): Mat2 = toMat(sink)(Keep.right).run()

实际上是使用了Sink类里的对应方法Sink.???。

下面是本次的示范源代码:

import akka.actor._
import akka.stream._
import akka.stream.scaladsl._
import akka._
import scala.concurrent._

object SourceDemo extends App {
  implicit val sys=ActorSystem("demo")
  implicit val mat=ActorMaterializer()
  implicit val ec=sys.dispatcher

  val s1: Source[Int,NotUsed] = Source(1 to 10)
  val sink: Sink[Any,Future[Done]] = Sink.foreach(println)
  val rg1: RunnableGraph[NotUsed] = s1.to(sink)
  val rg2: RunnableGraph[Future[Done]]  = s1.toMat(sink)(Keep.right)
  val res1: NotUsed = rg1.run()

  Thread.sleep(1000)

  val res2: Future[Done] = rg2.run()
  res2.andThen {
    case _ =>   //sys.terminate()
  }

  val seq = Seq[Int](1,2,3)
  def toIterator() = seq.iterator
  val flow1: Flow[Int,Int,NotUsed] = Flow[Int].map(_ + 2)
  val flow2: Flow[Int,Int,NotUsed] = Flow[Int].map(_ * 3)
  val s2 = Source.fromIterator(toIterator)
  val s3 = s1 ++ s2

  val s4: Source[Int,NotUsed] = s3.viaMat(flow1)(Keep.right)
  val s5: Source[Int,NotUsed] = s3.via(flow1).async.viaMat(flow2)(Keep.right)
  val s6: Source[Int,NotUsed] = s4.async.viaMat(flow2)(Keep.right)
  (s5.toMat(sink)(Keep.right).run()).andThen {case _ => } //sys.terminate()}

  s1.runForeach(println)
  val fres = s6.runFold(0)(_ + _)
  fres.onSuccess{case a => println(a)}
  fres.andThen{case _ => sys.terminate()}

}
时间: 2024-08-16 01:54:38

Akka(17): Stream:数据流基础组件-Source,Flow,Sink简介的相关文章

Akka(18): Stream:组合数据流,组件-Graph components

akka-stream的数据流可以由一些组件组合而成.这些组件统称数据流图Graph,它描述了数据流向和处理环节.Source,Flow,Sink是最基础的Graph.用基础Graph又可以组合更复杂的复合Graph.如果一个Graph的所有端口(输入.输出)都是连接的话就是一个闭合流图RunnableGraph,否则就属于·开放流图PartialGraph.一个完整的(可运算的)数据流就是一个RunnableGraph.Graph的输出出入端口可以用Shape来描述: /** * A Shap

基础组件(一)

1.TextInput 允许用户输入文本的基础组件. 属性 onChangeText 接受一个函数,而此函数会在文本变化时被调用. onSubmitEditing 在文本被提交后(用户按下软键盘上的提交键)调用 实例: 12345678 <TextInput         style={{height: 40}}         placeholder="Type here to translate!"         onChangeText={(text) => th

Stream数据流

1.Collection接口的改进 在Iterable接口里面定义有一个简单的输出:default void forEach(Consumer<? super T> action). 也就是说如果要想进行迭代处理,没有必要去强制使用Iterator完成了. 使用Lamda操作forEach()方法和直接采用方法引用 范例: package cn.demo; import java.util.ArrayList; import java.util.List; import java.util.s

数据分析 关于基础组件与介绍

第二部分 关于基础组件与介绍 基础信息库种类 基础信息库是账户或者自然人的纯真数据库查询系统.系统内积累存储的数据包括有: ü 手机号归属信息 ü IP数据纯真库 ü GPS信息对应地址信息 ü 域名空间身份信息 ü 3G分组域通讯信息 ü VPN服务器基础信息 ü VPN服务器日志信息(最新的区域时间段) ü 国内运输系统基础数据 ü 网络帐号密码查询系统 ü 网络帐号详情搜索查询 3S定位技术 3S 是通过遥感技术(RS).地理信息系统(GIS).全球定位系统(GPS)实现位置确认技术的统称

Ext JS 6学习文档-第3章-基础组件

基础组件 在本章中,你将学习到一些 Ext JS 基础组件的使用.同时我们会结合所学创建一个小项目.这一章我们将学习以下知识点: 熟悉基本的组件 – 按钮,文本框,日期选择器等等 表单字段的校验 菜单和工具栏 设计一个表单 计算器程序– 本章的示例项目 转载请注明出处:http://www.jeeboot.com/archives/1219.html 本章的主要目的是创建一个表单设计和一个计算器示例项目.以下图分别展示了表单设计和计算器设计. 首先,你观察下列表单设计,你会发现我们使用了大量的控

3. playbook基础组件

Playbook playbook是由一个或多个“play”组成的列表.play的主要功能在于将事先归并为一组的主机装扮成事先通过ansible中的task定义好的角色. 从根本上来讲,所谓task无非是调用ansible的一个module.将多个play组织在一个playbook中,即可以让它们联同起来按事先编排的机制同唱一台大戏. playbook基础组件 Hosts和Users playbook中的每一个play的目的都是为了让某个或某些主机以某个指定的用户身份执行任务. hosts用于指

Android 基础组件

基础组件 所有的控件都可以在java代码中创建出来,并且大部分的属性都对应set和get方法,比如 View view = new View(Context context)  context是上下文,是Activity父类,一般传入当前Activity 1.TextView text 文本 setText() getText() textColor文本颜色 #FFFFFF setTextColor(Color.Blue) getTextColor() textSize文本大小   sp set

数据库写库基础组件设计思想与实现

码农一定会遇到写库的繁琐操作,字段少的话数据访问层的SQL语句封装还好实现,可是字段一旦多起来,比如十多个二十多个字段的话,SQL的封装将会是一个巨大的难题,并不是说难度有多大,而是这样的操作很繁琐,况且极容易出错,SQL语句一旦出错很难排查.我也是在开发中遇到了相同的问题,这样的问题总会浪费很多不必要的时间,所以我就想能不能提供一个公共的基础组件来实现繁琐的底层SQL语句操作,我们只需要调用一些简单的借口就能实现数据库的快捷的写库.首先,写库时必要的信息包含:要写入的列名,还有就是数据实体.(

Java Swing界面编程(17)---单行文本输入组件:JTextField

package com.beyole.util; import java.awt.GridLayout; import javax.swing.JFrame; import javax.swing.JLabel; import javax.swing.JTextField; public class test15 { public static void main(String[] args) { JFrame frame = new JFrame("Crystal");// 实例化窗