2015年7月26日 星期日

Map & Reduce in Scala Collection (Session 7)

Map & Reduce in Scala Collection

What's Map & Reudce

Map 是指將 collection 每個元素,經過指定的 function 處理,產生新的值 (也可以是另一個 Collection)

Map 的特性:

  • Collection 的每一個元素都是獨立被處理,也就是說跟 collection 的其他元素無關。
  • 經 Map 程序處理後,最後回傳的 collection 資料型別不變。比如說: List 經過 map 程序後,回傳值依然是 List,但其中的元素資料型別視處理的 function 而有所不同。
  • 原本的 collection 內的元素值都不會被改變,最後是回傳一個的 collection。

延伸: flatMap

Reduce 是指將 collection 的元素做歸納處理後,回傳一個跟元素資料型別相同或者是supertype 的值。

Reduce 特性:

  • 指定的歸納 function,第一次會處理兩個元素值。
  • 每次處理後的結果,再跟下一個元素,再做一次處理,以此類推,直到所有的元素都被處理過。
  • 處理的過程,不見得會依照預期的順序,因此指定的 function 如果沒有結合律的特性,也許結果會不如預期。
  • 回傳最終處理的結果,且資料型別是跟元素相同或者是 supertype。

註:結合律是 (x + y) + z = x + (y + z)。像四則運算的 \(+\) 與 \(\times\) 有結合律,但 \(-\) 與 \(\div\) 沒有。

延伸:reduceLeft & reduceRight

舉例:輸入一個字串的 List,計算其中字串長度的總和。

scala> val lst = List("ABC", "Hello, World!!!", "Apple", "Microsoft")
lst: List[String] = List(ABC, Hello, World!!!, Apple, Microsoft)

scala> val lenLst = lst map { _.length }
lenLst: List[Int] = List(3, 15, 5, 9)

scala> val total = lenLst reduce { _ + _ }
total: Int = 32

scala> val lst = List("ABC", "Hello, World!!!", "Apple", "Microsoft")
lst: List[String] = List(ABC, Hello, World!!!, Apple, Microsoft)

scala> lst map { _.length } reduce { _ + _ }
res7: Int = 32

Function Languge Map-Reduce 的概念,後來被應用到 Hadoop 的 Map-Reduce。

Map, flatMap & for-yield

Scala 的 for-yield 處理,實際上是轉成 flatMapmap 處理。

舉例:

一層迴圈

scala> for (i <- 1 to 9) yield i + 1
res5: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> (1 to 9) map { _ + 1 }
res6: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 3, 4, 5, 6, 7, 8, 9, 10)

二層迴圈

scala> for (i <- 1 to 9; j <- 1 to i) yield i * j
res7: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 4, 3, 6, 9, 4, 8, 12, 16, 5, 10, 15, 20, 25, 6, 12, 18, 24, 30, 36, 7, 14, 21, 28, 35, 42, 49, 8, 16, 24, 32, 40, 48, 56, 64, 9, 18, 27, 36, 45, 54, 63, 72, 81)

scala> (1 to 9) flatMap { i => (1 to i) map { i * _ } }
res8: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 4, 3, 6, 9, 4, 8, 12, 16, 5, 10, 15, 20, 25, 6, 12, 18, 24, 30, 36, 7, 14, 21, 28, 35, 42, 49, 8, 16, 24, 32, 40, 48, 56, 64, 9, 18, 27, 36, 45, 54, 63, 72, 81)

Option 處理

單個:

scala> val s1 = Some("s1")
s1: Some[String] = Some(s1)

scala> for (s <- s1) yield s + s
res11: Option[String] = Some(s1s1)

scala> s1 map { s => s + s }
res13: Option[String] = Some(s1s1)

多個:

scala> val s1 = Option("s1")
s1: Option[String] = Some(s1)

scala> val s2 = None
s2: None.type = None

scala> for (a <- s1; b <- s2) yield (a, b)
res15: Option[(String, Nothing)] = None

scala> s1 flatMap { a => s2 map { b => (a, b) } }
res17: Option[(String, Nothing)] = None

scala> for (a <- s1; b <- s2) yield a + b
res16: Option[String] = None

scala> s1 flatMap { a => s2 map { b => a + b } }
res18: Option[String] = None

flatMap or for-yield 很適合處理 AND 的情況

舉例:每個 Store 都有一個歸屬的 Store,目前要查詢歸屬的 Store 名稱。

scala> case class Store(id: Int, name: String, belong: Int)
defined class Store

scala> val map = Map(0 -> Store(0, "3C", 0))
map: scala.collection.immutable.Map[Int,Store] = Map(0 -> Store(0,3C,0))

if-else 的版本

val s1 = map.get(0)

scala> if (s1.isDefined) {
     | val s2 = map.get(s1.get.belong)
     | if (s2.isDefined) Some(s2.get.name)
     | else None
     | } else None
res6: Option[String] = Some(3C)

for-yield 版本

scala> for (s1 <- map.get(0); s2 <- map.get(s1.belong)) yield s2.name
res0: Option[String] = Some(3C)

Collection 相關函數

fold, foldLeft, foldRight

foldreduce 很類似,fold 多了可以指定 初始值

scala> val lst = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
lst: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> lst reduce { _ + _ }
res20: Int = 45

scala> lst.fold(100) { _ + _ }
res25: Int = 145

scan, scanLeft, scanRight

scan 可以指定 初始值,第一個元素與初始值處理的結果,再與第二個元素處理,以此類推,最後回傳原本 collection 的資料型別,初始值當作第一個元素。概念跟 map 有點像,map 是獨立處理每個元素,scan 會與上一次處理的結果有關。

scala> val lst = List(1, 2, 3)
lst: List[Int] = List(1, 2, 3)

scala> lst map { _ + 1 }
res26: List[Int] = List(2, 3, 4)

scala> lst.scan(10) { (a, b) => println(a, b); a + b  }
(10,1)
(11,2)
(13,3)
res30: List[Int] = List(10, 11, 13, 16)

groupBy

groupBy 利用指定的 function 回傳值當作 key,自動依 key 分群,回傳一個 Map,Map 內的 value 資料型別,會與原來的 collection 相同。

scala> val lst = List(1, 2, 3, 4, 5, 6, 7, 8, 9)
lst: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> lst groupBy { _ % 3 }
res31: scala.collection.immutable.Map[Int,List[Int]] = Map(2 -> List(2, 5, 8), 1 -> List(1, 4, 7), 0 -> List(3, 6, 9))

scala> val arr = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
arr: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)

scala> arr groupBy { _ % 3 }
res32: scala.collection.immutable.Map[Int,Array[Int]] = Map(2 -> Array(2, 5, 8), 1 -> Array(1, 4, 7), 0 -> Array(3, 6, 9))

Zip

將兩個 collection 中的元素,一對一的方式組成兩個元素的 tuple。行為很類似 拉鏈

scala> val lst1 = List("a", "b", "c", "d")
lst1: List[String] = List(a, b, c, d)

scala> val lst2 = List(1, 2, 3)
lst2: List[Int] = List(1, 2, 3)

scala> lst1 zip lst2
res0: List[(String, Int)] = List((a,1), (b,2), (c,3))

scala> lst1.zipWithIndex
res2: List[(String, Int)] = List((a,0), (b,1), (c,2), (d,3))

附錄

Scala Variance & Bounds

Varince

Variance 主要是在討論

If \(T_1\) is subclass of \(T\), is Container[\(T_1\)] is subclass of Container[\(T\)] ?

Variance Meaning Scala notation
covariant C[\(T_1\)] is subclass of C[\(T\)] [+T]
contravariant C[\(T\)] is subclass of C[\(T_1\)] [-T]
invariant C[\(T_1\)] and C[\(T\)] are not related [T]

舉例:Covariant

scala> class Covariant[+A]
defined class Covariant

scala> val cv: Covariant[AnyRef] = new Covariant[String]
cv: Covariant[AnyRef] = Covariant@1fc2b765

scala> val cv: Covariant[String] = new Covariant[AnyRef]
<console>:8: error: type mismatch;
 found   : Covariant[AnyRef]
 required: Covariant[String]
       val cv: Covariant[String] = new Covariant[AnyRef]

舉例:Contravariant

scala> class Contravariant[-A]
defined class Contravariant

scala> val cv: Contravariant[String] = new Contravariant[AnyRef]
cv: Contravariant[String] = Contravariant@7bc1a03d

scala> val cv: Contravariant[AnyRef] = new Contravariant[String]
<console>:8: error: type mismatch;
 found   : Contravariant[String]
 required: Contravariant[AnyRef]
       val cv: Contravariant[AnyRef] = new Contravariant[String]

範例來自:Twitter's Scala School - Type & polymorphism basics

記憶方法:

\[ \forall T_1 \in +T, \text{then }T_1 \text{ is subclass of } T \]

\[ \forall T_1 \in -T, \text{then }T_1 \text{ is superclass of } T \]

\[ \begin{equation} -T \\ \uparrow\\ T \\ \uparrow \\ +T \end{equation} \]

ContraVariant 案例:Function1[-T, +R]

eg:

scala> class Test[+A] { def test(a: A): String = a.toString }
<console>:7: error: covariant type A occurs in contravariant position in type A of value a
       class Test[+A] { def test(a: A): String = a.toString }

fix:

scala> class Test[+A] { def test[B >: A](b: B): String = b.toString }
defined class Test

Bounds

Lower Type Bound

A >: B => A is superclass of B. B is the lower bound.

Upper Type Bound

A <: B => A is subclass of B. B is the upper bound.

參考:

張貼留言