Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I haven't tested it thoroughly, from what I remember urlparse works fine.


I checked it right now... it's not complete, but seems to be mostly fine

basically, unlike Java, it doesn't give you an encode() function that takes an arbitrary string... the only urlencode() function expects data representing a query

obviously, you still have to remember to handle the quoting of each part of the url separately... if you build your url (actually just resource+query+fragment) , and then you just quote() it at the end, you're no better than with java

e.g, if you have a path made by 2 segments "yadda/yadda" and "foo/bar"

quote("/".join(["yadda/yadda", "foo/bar"]))

yields 'yadda/yadda/foo/bar', which might not be correct, if what you want is actually

"/".join(quote(segment, safe="") for segment in ["yadda/yadda", "foo/bar"])

that yields 'yadda%2Fyadda/foo%2Fbar'

kinda error-prone, if you ask me

Also, python's urlparse seems to not handle correctly path parameters:

urlparse("http://example.com/egypt;p=0/nile;p2=1;p3=2")

only recognizes p2=1;p3=2 as path parameters

I think that part of the confusion is that we think of encoding as "The way in which symbols are mapped onto bytes", but if we use that meaning, it's not correct to talk about "url encoding", because each part of the url cannot be converted in ascii while ignoring the context (are the / meaningful?) and the place it appears into the url

it's more like a "url language", and if we would talk about "parsing" or "formatting" imho we could get less ambiguities and misunderstandings

PS: I realized just now that for the "http://example.com/egypt;p=0/nile;p2=1;p3=2" example, the doc suggests to use the urlsplit function instead (that will avoid to parse the path parameters altogether... kind of a non-solution imo, but at least it's known)



Thanks! I wonder why the post author didn't use/discover it


may be they haven't experienced enough java (especially the j2EE stack...its a bit of a big and complicated stack, and unless you really worked with it, you won't know it).

Guava also has tonnes of good stuff - http://docs.guava-libraries.googlecode.com/git/javadoc/com/g... , and its ostensibly lighter weight and more modular too.


urlparse is okay but it has issues in the sense that it only supports URIs and not IRIs. werkzeug.urls provides full unicode support.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: