What is the cleanest way to replace a hostname in an URL with Python?

  Kiến thức lập trình

In Python, there is a standard library module urllib.parse that deals with parsing URLs:

>>> import urllib.parse
>>> urllib.parse.urlparse("https://127.0.0.1:6443")
ParseResult(scheme='https', netloc='127.0.0.1:6443', path='', params='', query='', fragment='')

There are also properties on urllib.parse.ParseResult that return the hostname and the port:

>>> p.hostname
'127.0.0.1'
>>> p.port
6443

And, by virtue of ParseResult being a namedtuple, it has a _replace() method that returns a new ParseResult with the given field(s) replaced:

>>> p._replace(netloc="foobar.tld")
ParseResult(scheme='https', netloc='foobar.tld', path='', params='', query='', fragment='')

However, it cannot replace hostname or port because they are dynamic properties rather than fields of the tuple:

>>> p._replace(hostname="foobar.tld")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/collections/__init__.py", line 455, in _replace
    raise ValueError(f'Got unexpected field names: {list(kwds)!r}')
ValueError: Got unexpected field names: ['hostname']

It might be tempting to simply concatenate the new hostname with the existing port and pass it as the new netloc:

>>> p._replace(netloc='{}:{}'.format("foobar.tld", p.port))
ParseResult(scheme='https', netloc='foobar.tld:6443', path='', params='', query='', fragment='')

However this quickly turns into a mess if we consider

  • the fact that the port is optional;
  • the fact that netloc may also contain the username and possibly the password (e.g. https://user:[email protected]);
  • the fact that IPv6 literals must be wrapped in brackets (i.e. https://::1 isn’t valid but https://[::1] is);
  • and maybe something else that I’m missing.

What is the cleanest, correct way to replace the hostname in a URL in Python?

The solution must handle IPv6 (both as a part of the original URL and as the replacement value), URLs containing username/password, and in general all well-formed URLs.

LEAVE A COMMENT