Map & Reduce in Scala Collection

What's Map & Reudce

Map 是指將 collection 每個元素，經過指定的 function 處理，產生新的值 (也可以是另一個 Collection)

Map 的特性：

Collection 的每一個元素都是獨立被處理，也就是說跟 collection 的其他元素無關。
經 Map 程序處理後，最後回傳的 collection 資料型別不變。比如說： List 經過 map 程序後，回傳值依然是 List，但其中的元素資料型別視處理的 function 而有所不同。
原本的 collection 內的元素值都不會被改變，最後是回傳一個新的 collection。

延伸： flatMap

Reduce 是指將 collection 的元素做歸納處理後，回傳一個跟元素資料型別相同或者是supertype 的值。

Reduce 特性：

指定的歸納 function，第一次會處理兩個元素值。
每次處理後的結果，再跟下一個元素，再做一次處理，以此類推，直到所有的元素都被處理過。
處理的過程，不見得會依照預期的順序，因此指定的 function 如果沒有結合律的特性，也許結果會不如預期。
回傳最終處理的結果，且資料型別是跟元素相同或者是 supertype。

註：結合律是 (x + y) + z = x + (y + z)。像四則運算的 \(+\) 與 \(\times\) 有結合律，但 \(-\) 與 \(\div\) 沒有。

延伸：reduceLeft & reduceRight

舉例：輸入一個字串的 List，計算其中字串長度的總和。

scala> val lst = List("ABC", "Hello, World!!!", "Apple", "Microsoft")
lst: List[String] = List(ABC, Hello, World!!!, Apple, Microsoft)

scala> val lenLst = lst map { _.length }
lenLst: List[Int] = List(3, 15, 5, 9)

scala> val total = lenLst reduce { _ + _ }
total: Int = 32

或

scala> val lst = List("ABC", "Hello, World!!!", "Apple", "Microsoft")
lst: List[String] = List(ABC, Hello, World!!!, Apple, Microsoft)

scala> lst map { _.length } reduce { _ + _ }
res7: Int = 32

Function Languge Map-Reduce 的概念，後來被應用到 Hadoop 的 Map-Reduce。

Map, flatMap & for-yield

Scala 的 for-yield 處理，實際上是轉成 flatMap 與 map 處理。

舉例：

一層迴圈

scala> for (i <- 1 to 9) yield i + 1
res5: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> (1 to 9) map { _ + 1 }
res6: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9, 10)

二層迴圈

scala> for (i <- 1 to 9; j <- 1 to i) yield i * j
res7: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 4, 3, 6, 9, 4, 8, 12, 16, 5, 10, 15, 20, 25, 6, 12, 18, 24, 30, 36, 7, 14, 21, 28, 35, 42, 49, 8, 16, 24, 32, 40, 48, 56, 64, 9, 18, 27, 36, 45, 54, 63, 72, 81)

scala> (1 to 9) flatMap { i => (1 to i) map { i * _ } }
res8: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 4, 3, 6, 9, 4, 8, 12, 16, 5, 10, 15, 20, 25, 6, 12, 18, 24, 30, 36, 7, 14, 21, 28, 35, 42, 49, 8, 16, 24, 32, 40, 48, 56, 64, 9, 18, 27, 36, 45, 54, 63, 72, 81)

Option 處理

單個：

scala> val s1 = Some("s1")
s1: Some[String] = Some(s1)

scala> for (s <- s1) yield s + s
res11: Option[String] = Some(s1s1)

scala> s1 map { s => s + s }
res13: Option[String] = Some(s1s1)

多個：

scala> val s1 = Option("s1")
s1: Option[String] = Some(s1)

scala> val s2 = None
s2: None.type = None

scala> for (a <- s1; b <- s2) yield (a, b)
res15: Option[(String, Nothing)] = None

scala> s1 flatMap { a => s2 map { b => (a, b) } }
res17: Option[(String, Nothing)] = None

scala> for (a <- s1; b <- s2) yield a + b
res16: Option[String] = None

scala> s1 flatMap { a => s2 map { b => a + b } }
res18: Option[String] = None

flatMap or for-yield 很適合處理 AND 的情況

舉例：每個 Store 都有一個歸屬的 Store，目前要查詢歸屬的 Store 名稱。

scala> case class Store(id: Int, name: String, belong: Int)
defined class Store

scala> val map = Map(0 -> Store(0, "3C", 0))
map: scala.collection.immutable.Map[Int,Store] = Map(0 -> Store(0,3C,0))

if-else 的版本

val s1 = map.get(0)

scala> if (s1.isDefined) {
     | val s2 = map.get(s1.get.belong)
     | if (s2.isDefined) Some(s2.get.name)
     | else None
     | } else None
res6: Option[String] = Some(3C)

for-yield 版本

scala> for (s1 <- map.get(0); s2 <- map.get(s1.belong)) yield s2.name
res0: Option[String] = Some(3C)

Collection 相關函數

fold, foldLeft, foldRight

fold 與 reduce 很類似，fold 多了可以指定 初始值

scala> val lst = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
lst: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> lst reduce { _ + _ }
res20: Int = 45

scala> lst.fold(100) { _ + _ }
res25: Int = 145

scan, scanLeft, scanRight

scan 可以指定 初始值，第一個元素與初始值處理的結果，再與第二個元素處理，以此類推，最後回傳原本 collection 的資料型別，初始值當作第一個元素。概念跟 map 有點像，map 是獨立處理每個元素，scan 會與上一次處理的結果有關。

scala> val lst = List(1, 2, 3)
lst: List[Int] = List(1, 2, 3)

scala> lst map { _ + 1 }
res26: List[Int] = List(2, 3, 4)

scala> lst.scan(10) { (a, b) => println(a, b); a + b  }
(10,1)
(11,2)
(13,3)
res30: List[Int] = List(10, 11, 13, 16)

groupBy

groupBy 利用指定的 function 回傳值當作 key，自動依 key 分群，回傳一個 Map，Map 內的 value 資料型別，會與原來的 collection 相同。

scala> val lst = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
lst: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> lst groupBy { _ % 3 }
res31: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(2, 5, 8), 1 -> List(1, 4, 7), 0 -> List(3, 6, 9))

scala> val arr = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
arr: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> arr groupBy { _ % 3 }
res32: scala.collection.immutable.Map[Int,Array[Int]] = Map(2 -> Array(2, 5, 8), 1 -> Array(1, 4, 7), 0 -> Array(3, 6, 9))

Zip

將兩個 collection 中的元素，一對一的方式組成兩個元素的 tuple。行為很類似拉鏈。

scala> val lst1 = List("a", "b", "c", "d")
lst1: List[String] = List(a, b, c, d)

scala> val lst2 = List(1, 2, 3)
lst2: List[Int] = List(1, 2, 3)

scala> lst1 zip lst2
res0: List[(String, Int)] = List((a,1), (b,2), (c,3))

scala> lst1.zipWithIndex
res2: List[(String, Int)] = List((a,0), (b,1), (c,2), (d,3))

附錄

Scala Variance & Bounds

Varince

Variance 主要是在討論

If \(T_1\) is subclass of \(T\), is Container[\(T_1\)] is subclass of Container[\(T\)] ?

Variance	Meaning	Scala notation
covariant	C[\(T_1\)] is subclass of C[\(T\)]	[+T]
contravariant	C[\(T\)] is subclass of C[\(T_1\)]	[-T]
invariant	C[\(T_1\)] and C[\(T\)] are not related	[T]

舉例：Covariant

scala> class Covariant[+A]
defined class Covariant

scala> val cv: Covariant[AnyRef] = new Covariant[String]
cv: Covariant[AnyRef] = Covariant@1fc2b765

scala> val cv: Covariant[String] = new Covariant[AnyRef]
<console>:8: error: type mismatch;
 found   : Covariant[AnyRef]
 required: Covariant[String]
       val cv: Covariant[String] = new Covariant[AnyRef]

舉例：Contravariant

scala> class Contravariant[-A]
defined class Contravariant

scala> val cv: Contravariant[String] = new Contravariant[AnyRef]
cv: Contravariant[String] = Contravariant@7bc1a03d

scala> val cv: Contravariant[AnyRef] = new Contravariant[String]
<console>:8: error: type mismatch;
 found   : Contravariant[String]
 required: Contravariant[AnyRef]
       val cv: Contravariant[AnyRef] = new Contravariant[String]

範例來自：Twitter's Scala School - Type & polymorphism basics

記憶方法：

\[ \forall T_1 \in +T, \text{then }T_1 \text{ is subclass of } T \]

\[ \forall T_1 \in -T, \text{then }T_1 \text{ is superclass of } T \]

\[ \begin{equation} -T \\ \uparrow\\ T \\ \uparrow \\ +T \end{equation} \]

ContraVariant 案例：Function1[-T, +R]

eg:

scala> class Test[+A] { def test(a: A): String = a.toString }
<console>:7: error: covariant type A occurs in contravariant position in type A of value a
       class Test[+A] { def test(a: A): String = a.toString }

fix:

scala> class Test[+A] { def test[B >: A](b: B): String = b.toString }
defined class Test

Bounds

Lower Type Bound

A >: B => A is superclass of B. B is the lower bound.

Upper Type Bound

A <: B => A is subclass of B. B is the upper bound.

參考：

Twitter's Scala School - Type & polymorphism basics

工程師的筆記本

2015年7月26日星期日