Parsing string to datetime using pandas

  Kiến thức lập trình

I am trying to parse a string to datetime using pandas but I am getting the following error:

ValueError: Cannot parse both %Z and %z

My date string is:

ds = '03.01.2021 23:00:00.000 GMT+0100'

and the conversion I’m trying is as follow:

format = '%d.%m.%Y %H:%M:%S.%f %Z%z'
dt = pd.to_datetime(ds, format=format)

which is funny because panda seems to use an almost identical version of strptime. When I use:

dt = datetime.datetime.strptime(ds, format)

it works as expected. In the pandas documentation there is a small section saying “Differences to strptime” but no mention to usage of %Z with %z.

On further explanations about timezones and offsets I only see explanations regarding UTC offsets. In this case I assume that GMT and UTC is anyways the same so if I parse using:

format = '%d.%m.%Y %H:%M:%S.%f GMT%z'
dt = pd.to_datetime(ds, format=format)

this also works. But I was wondering, what if one has a datetime with offset regarding to a different timezone, for example EST instead of UTC? For example:

ds = '03.01.2021 23:00:00.000 EST+0100'

is there anyway one could convert this directly using pd.to_datetime()?

2

GMT+0100 is not really a standard format. Usually you only have +0100 and the reference is UTC.

What you could do is extract the offset and add it after conversion to Timedelta:

import re

ds = '03.01.2021 23:00:00.000 GMT+0100'

m = re.search(r'(.*)(+d+)$', ds)
H, M = divmod(int(m.group(2)), 100) # + 1h0min

fmt = '%d.%m.%Y %H:%M:%S.%f %Z'
dt = pd.to_datetime(m.group(1), format=fmt) + pd.Timedelta(60*H+M, unit='min')

Output: Timestamp('2021-01-04 00:00:00+0000', tz='UTC')

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website

LEAVE A COMMENT