PySpark join fields in JSON to a dataframe
I am trying to pull out some fields from a JSONn string into a dataframe. I can achieve this by put each field in a dataframe then join all the dataframes like below. But is there some easier way to do this? Because this is just an simplified example and I have a lot more fields to extract in my project.
Replace Quotes to convert to JSON string
“{“”ab””: 4.21, “”cd””: null, “”ef””: 4.62, “”gh””: null, “”ij””: 4.33, “”kl””: 0.91}”
json.loads on records sometimes having nulls
I have a column that sometimes contains a geometry object ({“type”: “Point”, “coordinates”: [123.12345, 456.789]} and sometimes is null (this is expected, some items do not have geometry).
How to best handle badly concatenated json
I receive a single json file from a client which is not correct.
The client concatenates multiple json resposnses into one file: