scala - Non-integer ids in Spark MLlib ALS

Question

Welcome To Ask or Share your Answers For Others

scala - Non-integer ids in Spark MLlib ALS

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

scala - Non-integer ids in Spark MLlib ALS

I'd like to use

val ratings = data.map(_.split(',') match {
      case Array(user,item,rate)
      =>
        Rating(user.toInt,item.toInt,rate.toFloat)
    })
val model =  ALS.train(ratings,rank,numIterations,alpha)

However, the user data i get are stored as Long. When switched to int, it may produce error. How can i do to solve the problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:39:42+0000

You can use one of ML implementations which support Long labels. RDD version it is significantly less user friendly compared to other implementations:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.recommendation.ALS.Rating

val ratings = sc.parallelize(Seq(Rating(1L, 2L, 3.0f), Rating(2L, 3L, 5.0f)))

val (userFactors, itemFactors) = ALS.train(ratings)

and returns only factors but DataFrame version returns a model:

val ratingsDF= ratings.toDF

val alsModel = new ALS().fit(ratingsDF)

Categories

scala - Non-integer ids in Spark MLlib ALS

scala - Non-integer ids in Spark MLlib ALS

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags