Relative Content

Tag Archive for scalaapache-spark

Spark-Scala : Read csv ( string , int ) 2 columns

I want to read a csv which has one string and one INT columns ( just two of them ) . I want to pass a string variable to it , match it with first col and return value from second INT col . By default if no key matches and it returns a dummy value .

Best way to find multiple ids in list of files in spark scala

I have a list of IDs that I want to find in my parquet files. For each of the IDs I do have an idea in which files they could be present i.e. I would have a mapping where I have
ID1 -> file1, file2
ID2 -> file2, file5
ID -> file3, file4 and so on…

What would be best way in spark scala to do such a task. I thought of plenty ways.

Spark lazy val evaluates twice for dataframe

I have a lazy val defined in my code studentDataReader which eventually reads data from an S3 path. My understanding is that even if I call this multiple times, it should call S3 once only.

DF has column after drop

I am attempting to drop a column using the drop function. But the column remains after the drop. The problem is evident in the following code: