scala collection详解

原帖:http://stackoverflow.com/questions/1722137/scala-2-8-collections-design-tutorial 

There‘s a 2.8 collection walk-through by Martin Odersky which should probably be your first reference. It has been supplemented as well with architectural notes, which will be of particular interest to those who want to design their own collections.

The rest of this answer was written way before any such thing existed (in fact, before 2.8.0 itself was released).

You can find a paper about it as Scala SID #3. Other papers in that area should be interesting as well to people interested in the differences between Scala 2.7 and 2.8.

I‘ll quote from the paper, selectively, and complement with some thoughts of mine. There are also some images, generated by Matthias at decodified.com, and the original SVG files can be found here.

The collection classes/traits themselves

There are actually three hierarchies of traits for the collections: one for mutable collections, one for immutable collections, and one which doesn‘t make any assumptions about the collections.

There‘s also a distinction between parallel, serial and maybe-parallel collections, which was introduced with Scala 2.9. I‘ll talk about them in the next section. The hierarchy described in this section refers exclusively to non-parallel collections.

The following image shows the non-specific hierarchy introduced with Scala 2.8: 

All elements shown are traits. In the other two hierarchies there are also classes directly inheriting the traits as well as classes which can be viewed as belonging in that hierarchy through implicit conversion to wrapper classes. The legend for these graphs can be found after them.

Graph for immutable hierarchy: 

Graph for mutable hierarchy: 

Legend:

Here‘s an abbreviated ASCII depiction of the collection hierarchy, for those who can‘t see the images.

                    Traversable
                         |
                         |
                      Iterable
                         |
      +------------------+--------------------+
     Map                Set                  Seq
      |                  |                    |
      |             +----+----+         +-----+------+
    Sorted Map  SortedSet   BitSet   Buffer Vector LinearSeq

Parallel Collections

When Scala 2.9 introduced parallel collections, one of the design goals was to make their use as seamless as possible. In the simplest terms, one can replace a non-parallel (serial) collection with a parallel one, and instantly reap the benefits.

However, since all collections until then were serial, many algorithms using them assumed and depended on the fact that they were serial. Parallel collections fed to the methods with such assumptions would fail. For this reason, all the hierarchy described in the previous section mandates serial processing.

Two new hierarchies were created to support the parallel collections.

The parallel collections hierarchy has the same names for traits, but preceded with ParParIterableParSeqParMap and ParSet. Note that there is no ParTraversable, since any collection supporting parallel access is capable of supporting the stronger ParIterable trait. It doesn‘t have some of the more specialized traits present in the serial hierarchy either. This whole hierarchy is found under the directory scala.collection.parallel.

The classes implementing parallel collections also differ, with ParHashMap and ParHashSet for both mutable and immutable parallel collections, plus ParRange and ParVector implementing immutable.ParSeq and ParArray implementing mutable.ParSeq.

Another hierarchy also exists that mirrors the traits of serial and parallel collections, but with a prefix GenGenTraversableGenIterableGenSeqGenMap and GenSet. These traits are parents to both parallel and serial collections. This means that a method taking a Seq cannot receive a parallel collection, but a method taking a GenSeq is expected to work with both serial and parallel collections.

Given the way these hierarchies were structured, code written for Scala 2.8 was fully compatible with Scala 2.9, and demanded serial behavior. Without being rewritten, it cannot take advantage of parallel collections, but the changes required are very small.

Using Parallel Collections

Any collection can be converted into a parallel one by calling the method par on it. Likewise, any collection can be converted into a serial one by calling the method seq on it.

If the collection was already of the type requested (parallel or serial), no conversion will take place. If one calls seq on a parallel collection or par on a serial collection, however, a new collection with the requested characteristic will be generated.

Do not confuse seq, which turns a collection into a non-parallel collection, with toSeq, which returns a Seq created from the elements of the collection. Calling toSeq on a parallel collection will return a ParSeq, not a serial collection.

The Main Traits

While there are many implementing classes and subtraits, there are some basic traits in the hierarchy, each of which providing more methods or more specific guarantees, but reducing the number of classes that could implement them.

In the following subsections, I‘ll give a brief description of the main traits and the idea behind them.

Trait TraversableOnce

This trait is pretty much like trait Traversable described below, but with the limitation that you can only use it once. That is, any methods called on a TraversableOnce may render it unusable.

This limitation makes it possible for the same methods to be shared between the collections and Iterator. This makes it possible for a method that works with an Iterator but not using Iterator-specific methods to actually be able to work with any collection at all, plus iterators, if rewritten to accept TraversableOnce.

Because TraversableOnce unifies collections and iterators, it does not appear in the previous graphs, which concern themselves only with collections.

Trait Traversable

At the top of the collection hierarchy is trait Traversable. Its only abstract operation is

def foreach[U](f: Elem => U)

The operation is meant to traverse all elements of the collection, and apply the given operation f to each element. The application is done for its side effect only; in fact any function result of f is discarded by foreach.

Traversible objects can be finite or infinite. An example of an infinite traversable object is the stream of natural numbers Stream.from(0). The method hasDefiniteSize indicates whether a collection is possibly infinite. If hasDefiniteSize returns true, the collection is certainly finite. If it returns false, the collection has not been not fully elaborated yet, so it might be infinite or finite.

This class defines methods which can be efficiently implemented in terms of foreach (over 40 of them).

Trait Iterable

This trait declares an abstract method iterator that returns an iterator that yields all the collection’s elements one by one. The foreach method in Iterable is implemented in terms of iterator. Subclasses of Iterable often override foreach with a direct implementation for efficiency.

Class Iterable also adds some less-often used methods to Traversable, which can be implemented efficiently only if an iterator is available. They are summarized below.

xs.iterator          An iterator that yields every element in xs, in the same order as foreach traverses elements.
xs takeRight n       A collection consisting of the last n elements of xs (or, some arbitrary n elements, if no order is defined).
xs dropRight n       The rest of the collection except xs takeRight n.
xs sameElements ys   A test whether xs and ys contain the same elements in the same order

Other Traits

After Iterable there come three base traits which inherit from it: SeqSet, and Map. All three have an apply method and all three implement the PartialFunction trait, but the meaning of applyis different in each case.

I trust the meaning of SeqSet and Map is intuitive. After them, the classes break up in specific implementations that offer particular guarantees with regards to performance, and the methods it makes available as a result of it. Also available are some traits with further refinements, such as LinearSeqIndexedSeq and SortedSet.

The listing below may be improved. Leave a comment with suggestions and I‘ll fix it.

Base Classes and Traits

  • Traversable -- Basic collection class. Can be implemented just with foreach.

    • TraversableProxy -- Proxy for a Traversable. Just point self to the real collection.
    • TraversableView -- A Traversable with some non-strict methods.
    • TraversableForwarder -- Forwards most methods to underlying, except toStringhashCodeequalsstringPrefixnewBuilderview and all calls creating a new iterable object of the same kind.
    • mutable.Traversable and immutable.Traversable -- same thing as Traversable, but restricting the collection type.
    • Other special-cases Iterable classes, such as MetaData, exists.
    • Iterable -- A collection for which an Iterator can be created (through iterator).
      • IterableProxyIterableViewmutable and immutable.Iterable.
  • Iterator -- A trait which is not descendant of Traversable. Define next and hasNext.
    • CountedIterator -- An Iterator defining count, which returns the elements seen so far.
    • BufferedIterator -- Defines head, which returns the next element without consuming it.
    • Other special-cases Iterator classes, such as Source, exists.

The Maps

  • Map -- An Iterable of Tuple2, which also provides methods for retrieving a value (the second element of the tuple) given a key (the first element of the tuple). Extends PartialFunction as well.

    • MapProxy -- A Proxy for a Map.
    • DefaultMap -- A trait implementing some of Map‘s abstract methods.
    • SortedMap -- A Map whose keys are sorted.
      • immutable.SortMap

        • immutable.TreeMap -- A class implementing immutable.SortedMap.
    • immutable.Map
      • immutable.MapProxy
      • immutable.HashMap -- A class implementing immutable.Map through key hashing.
      • immutable.IntMap -- A class implementing immutable.Map specialized for Int keys. Uses a tree based on the binary digits of the keys.
      • immutable.ListMap -- A class implementing immutable.Map through lists.
      • immutable.LongMap -- A class implementing immutable.Map specialized for Longkeys. See IntMap.
      • There are additional classes optimized for an specific number of elements.
    • mutable.Map
      • mutable.HashMap -- A class implementing mutable.Map through key hashing.
      • mutable.ImmutableMapAdaptor -- A class implementing a mutable.Map from an existing immutable.Map.
      • mutable.LinkedHashMap -- ?
      • mutable.ListMap -- A class implementing mutable.Map through lists.
      • mutable.MultiMap -- A class accepting more than one distinct value for each key.
      • mutable.ObservableMap -- A mixin which, when mixed with a Map, publishes events to observers through a Publisher interface.
      • mutable.OpenHashMap -- A class based on an open hashing algorithm.
      • mutable.SynchronizedMap -- A mixin which should be mixed with a Map to provide a version of it with synchronized methods.
      • mutable.MapProxy.

The Sequences

  • Seq -- A sequence of elements. One assumes a well-defined size and element repetition. Extends PartialFunction as well.

    • IndexedSeq -- Sequences that support O(1) element access and O(1) length computation.

      • IndexedSeqView
      • immutable.PagedSeq -- An implementation of IndexedSeq where the elements are produced on-demand by a function passed through the constructor.
      • immutable.IndexedSeq
        • immutable.Range -- A delimited sequence of integers, closed on the lower end, open on the high end, and with a step.

          • immutable.Range.Inclusive -- A Range closed on the high end as well.
          • immutable.Range.ByOne -- A Range whose step is 1.
        • immutable.NumericRange -- A more generic version of Range which works with any Integral.
          • immutable.NumericRange.Inclusiveimmutable.NumericRange.Exclusive.
          • immutable.WrappedStringimmutable.RichString -- Wrappers which enables seeing a String as a Seq[Char], while still preserving the Stringmethods. I‘m not sure what the difference between them is.
      • mutable.IndexedSeq
        • mutable.GenericArray -- An Seq-based array-like structure. Note that the "class"Array is Java‘s Array, which is more of a memory storage method than a class.
        • mutable.ResizableArray -- Internal class used by classes based on resizable arrays.
        • mutable.PriorityQueuemutable.SynchronizedPriorityQueue -- Classes implementing prioritized queues -- queues where the elements are dequeued according to an Ordering first, and order of queueing last.
        • mutable.PriorityQueueProxy -- an abstract Proxy for a PriorityQueue.
    • LinearSeq -- A trait for linear sequences, with efficient time for isEmptyhead and tail.
      • immutable.LinearSeq

        • immutable.List -- An immutable, singlely-linked, list implementation.
        • immutable.Stream -- A lazy-list. Its elements are only computed on-demand, but memoized (kept in memory) afterwards. It can be theoretically infinite.
      • mutable.LinearSeq
        • mutable.DoublyLinkedList -- A list with mutable prevhead (elem) and tail(next).
        • mutable.LinkedList -- A list with mutable head (elem) and tail (next).
        • mutable.MutableList -- A class used internally to implement classes based on mutable lists.
          • mutable.Queuemutable.QueueProxy -- A data structure optimized for FIFO (First-In, First-Out) operations.
          • mutable.QueueProxy -- A Proxy for a mutable.Queue.
    • SeqProxySeqViewSeqForwarder
    • immutable.Seq
      • immutable.Queue -- A class implementing a FIFO-optimized (First-In, First-Out) data structure. There is no common superclass of both mutable and immutable queues.
      • immutable.Stack -- A class implementing a LIFO-optimized (Last-In, First-Out) data structure. There is no common superclass of both mutable immutable stacks.
      • immutable.Vector -- ?
      • scala.xml.NodeSeq -- A specialized XML class which extends immutable.Seq.
      • immutable.IndexedSeq -- As seen above.
      • immutable.LinearSeq -- As seen above.
    • mutable.ArrayStack -- A class implementing a LIFO-optimized data structure using arrays. Supposedly significantly faster than a normal stack.
    • mutable.Stackmutable.SynchronizedStack -- Classes implementing a LIFO-optimized data structure.
    • mutable.StackProxy -- A Proxy for a mutable.Stack..
    • mutable.Seq
      • mutable.Buffer -- Sequence of elements which can be changed by appending, prepending or inserting new members.

        • mutable.ArrayBuffer -- An implementation of the mutable.Buffer class, with constant amortized time for the append, update and random access operations. It has some specialized subclasses, such as NodeBuffer.
        • mutable.BufferProxymutable.SynchronizedBuffer.
        • mutable.ListBuffer -- A buffer backed by a list. It provides constant time append and prepend, with most other operations being linear.
        • mutable.ObservableBuffer -- A mixin trait which, when mixed to a Buffer, provides notification events through a Publisher interfaces.
        • mutable.IndexedSeq -- As seen above.
        • mutable.LinearSeq -- As seen above.

The Sets

  • Set -- A set is a collection that includes at most one of any object.

    • BitSet -- A set of integers stored as a bitset.

      • immutable.BitSet
      • mutable.BitSet
    • SortedSet -- A set whose elements are ordered.
      • immutable.SortedSet

        • immutable.TreeSet -- An implementation of a SortedSet based on a tree.
    • SetProxy -- A Proxy for a Set.
    • immutable.Set
      • immutable.HashSet -- An implementation of Set based on element hashing.
      • immutable.ListSet -- An implementation of Set based on lists.
      • Additional set classes exists to provide optimized implementions for sets from 0 to 4 elements.
      • immutable.SetProxy -- A Proxy for an immutable Set.
    • mutable.Set
      • mutable.HashSet -- An implementation of Set based on element hashing.
      • mutable.ImmutableSetAdaptor -- A class implementing a mutable Set from an immutable Set.
      • LinkedHashSet -- An implementation of Set based on lists.
      • ObservableSet -- A mixin trait which, when mixed with a Set, provides notification events through a Publisher interface.
      • SetProxy -- A Proxy for a Set.
      • SynchronizedSet -- A mixin trait which, when mixed with a Set, provides notification events through a Publisher interface.

  • Why the Like classes exist (e.g. TraversableLike)

This was done to achieve maximum code reuse. The concrete generic implementation for classes with a certain structure (a traversable, a map, etc) is done in the Like classes. The classes intended for general consumption, then, override selected methods that can be optmized.

  • What the companion methods are for (e.g. List.companion)

The builder for the classes, ie, the object which knows how to create instances of that class in a way that can be used by methods like map, is created by a method in the companion object. So, in order to build an object of type X, I need to get that builder from the companion object of X. Unfortunately, there is no way, in Scala, to get from class X to object X. Because of that, there is a method defined in each instance of X, companion, which returns the companion object of class X.

While there might be some use for such method in normal programs, its target is enabling code reuse in the collection library.

  • How I know what implicit objects are in scope at a given point

You aren‘t supposed to care about that. They are implicit precisely so that you don‘t need to figure out how to make it work.

These implicits exists to enable the methods on the collections to be defined on parent classes but still return a collection of the same type. For example, the map method is defined on TraversableLike, but if you used on a List you‘ll get a List back.

时间: 2024-12-18 12:04:03

scala collection详解的相关文章

Scala集合类详解

对scala中的集合类虽然有使用,但是一直处于一知半解的状态.尤其是与java中各种集合类的混合使用,虽然用过很多次,但是一直也没有做比较深入的了解与分析.正好趁着最近项目的需要,加上稍微有点时间,特意多花了一点时间对scala中的集合类做个详细的总结. 1.数组Array 在说集合类之前,先看看scala中的数组.与Java中不同的是,Scala中没有数组这一种类型.在Scala中,Array类的功能就与数组类似. 与所有数组一样,Array的长度不可变,里面的数据可以按索引位置访问. def

scala 模式匹配详解 3 模式匹配的核心功能是解构

http://www.artima.com/scalazine/articles/pattern_matching.html这篇文章是odersky谈scala中的模式匹配的一段对话,我做了部分片段翻译(不是连贯的): 模式可以嵌套,就像表达式嵌套,你可以定义深层的模式,通常一个模式看起来就像一个表达式.它基本上就是同一类事情.它看起来像一个复杂的对象树构造表达式,只是漏掉了new关键字.事实上在scala当你构造一个对象,你不需要new关键字然后你可以在一些地方用变量做站位符替代对象树上实际的

scala 模式匹配详解 2 scala里是怎么实现的?

在这篇martin和另外两位模式匹配领域专家的论文里说了模式匹配的几种实现方式,以及scala是选择哪种方式来实现的.http://lampwww.epfl.ch/~emir/written/MatchingObjectsWithPatterns-TR.pdf我引用了里面的一些描述. 在面向对象的程序中数据被组织为一级一级的类(class)面向对象语言在模式匹配方面的问题在于如何从外部探测这个层级. 有6种实现模式匹配的方法:1) 面向对象的分解 (decomposition)2) 访问器模式

Spark入门到精通--(第二节)Scala编程详解基础语法

Scala是什么? Scala是以实现scaleable language为初衷设计出来的一门语言.官方中,称它是object-oriented language和functional language的混合式语言. Scala可以和java程序无缝拼接,因为scala文件编译后也是成为.class文件,并且在JVM上运行. Spark是由Scala进行开发的. Scala安装? 这里就讲一下Scala在Centos上进行安装的过程,和安装JDK差不多. 官网下载Scala:http://www

java之集合Collection详解之2

package cn.itcast_02; import java.util.ArrayList; import java.util.Collection; /* * 练习:用集合存储5个学生对象,并把学生对象进行遍历. * * 分析: * A:创建学生类 * B:创建集合对象 * C:创建学生对象 * D:把学生添加到集合 * E:把集合转成数组 * F:遍历数组 */ public class StudentDemo { public static void main(String[] ar

java之集合Collection详解之3

package cn.itcast_03; public class Student { // 成员变量 private String name; private int age; // 构造方法 public Student() { super(); } public Student(String name, int age) { super(); this.name = name; this.age = age; } // 成员方法 // getXxx()/setXxx() public S

java之集合Collection 详解之4

package cn.itcast_04; public class Student { private String name; private int age; public Student() { super(); } public Student(String name, int age) { super(); this.name = name; this.age = age; } public String getName() { return name; } public void

Scala类型详解

语法: Type ::= InfixType ?=>? Type | ?(?[?=>? Type] ?)? ?=>? Type | InfixType [ExistentialClause] ExistentialClause ::= ?forSome? ?{?ExistentialDc { semi ExistentialDcl} ?}? ExistentialDcl ::= ?type? TypeDcl | ?val? ValDcl InfixType ::= CompoundTyp

Scala Trait详解

除了从父类集成代码外,Scala中的类还允许从一个或者多个traits中导入代码.对于Java程序员来说理解traits的最好方法就是把他们当作可以包含代码的接口(interface).在Scala中,当一个类继承一个trait时,它就实现了这个trait的接口,同时还从这个trait中继承了所有的代码.让我们通过一个典型的实例来看看这种trait机制是如何发挥作用的:排序对象.能够比较若干给定类型的对象在实际应用中是很有用的,比如在进行排 序时.在Java语言中可以比较的对象是通过实现Comp