I am trying to parse a string to datetime using pandas but I am getting the following error:
ValueError: Cannot parse both %Z and %z
My date string is:
ds = '03.01.2021 23:00:00.000 GMT+0100'
and the conversion I’m trying is as follow:
format = '%d.%m.%Y %H:%M:%S.%f %Z%z'
dt = pd.to_datetime(ds, format=format)
which is funny because panda seems to use an almost identical version of strptime. When I use:
dt = datetime.datetime.strptime(ds, format)
it works as expected. In the pandas documentation there is a small section saying “Differences to strptime” but no mention to usage of %Z with %z.
On further explanations about timezones and offsets I only see explanations regarding UTC offsets. In this case I assume that GMT and UTC is anyways the same so if I parse using:
format = '%d.%m.%Y %H:%M:%S.%f GMT%z'
dt = pd.to_datetime(ds, format=format)
this also works. But I was wondering, what if one has a datetime with offset regarding to a different timezone, for example EST instead of UTC? For example:
ds = '03.01.2021 23:00:00.000 EST+0100'
is there anyway one could convert this directly using pd.to_datetime()?
2
GMT+0100
is not really a standard format. Usually you only have +0100
and the reference is UTC.
What you could do is extract the offset and add it after conversion to Timedelta
:
import re
ds = '03.01.2021 23:00:00.000 GMT+0100'
m = re.search(r'(.*)(+d+)$', ds)
H, M = divmod(int(m.group(2)), 100) # + 1h0min
fmt = '%d.%m.%Y %H:%M:%S.%f %Z'
dt = pd.to_datetime(m.group(1), format=fmt) + pd.Timedelta(60*H+M, unit='min')
Output: Timestamp('2021-01-04 00:00:00+0000', tz='UTC')