REBOL3 tracker
  0.9.12 beta
Ticket #0001644 User: anonymous

Project:



rss
TypeBug Statusreviewed Date2-Sep-2010 18:59
Versionalpha 97 CategoryMezzanine Submitted byBrianH
PlatformAll Severityminor Priorityhigh

Summary DECODE-URL and url! syntax don't obey the url encoding rules
Description Hex-encoded characters in URLs are supposed to stay encoded until the URL is broken into its component parts. This does not happen correctly with REBOL URLs: They are decoded too early. This causes a problem when one or more of the component parts have characters in them that would be mistaken for structural characters. This happens pretty often nowadays with many sites that use a full email address as a user name, or that allow passwords with less restricted character sets.

This can probably be solved by fixing DECODE-URL. Note: This problem also exists in R2, and could use a similar solution.
Example code
>> http://user%40rebol.com:blah@www.rebol.com/
== http://user@rebol.com:blah@www.rebol.com/
; should be http://user%40rebol.com:blah@www.rebol.com/
>> decode-url http://user%40rebol.com:blah@www.rebol.com/
== [scheme: 'http user: "user" host: "rebol.com" path: ":blah@www.rebol.com/"]
; should be [scheme: 'http pass: "blah" user: "user@rebol.com" host: "www.rebol.com" path: "/"]

Assigned ton/a Fixed in- Last Update5-Apr-2013 03:09


Comments
(0002483)
BrianH
2-Sep-2010 19:20

Gabriele wrote a correct parser here: http://www.rebol.it/power-mezz/parsers/uri-parser.html

Perhaps we can adapt that to R3 style, or at least analyze the code to get the correct rules to follow. We can't use it directly because it decodes the URLs too far, decoding the path and fragment identifier when we don't want that in this case. But it shouldn't be too hard to adapt, as long as it doesn't use too much of the rest of the Power Mezz stuff.
(0002486)
meijeru
7-Sep-2010 10:03

The following tickets also apply: #482, #1327, #1333.
(0002501)
Carl
21-Sep-2010 02:29

This does not seem like hex decoding problem, but more like a field decoding problem in DECODE-URL. For example, in the example, I could have used the @ directly, without the %40.
(0002534)
BrianH
23-Sep-2010 09:50

But what if the hex was %2F or some other character? All hex characters need to be allowed when specified in hex form.
(0003776)
Ladislav
5-Apr-2013 02:56

"This can probably be solved by fixing DECODE-URL." - this *provably* can't be solved by "fixing" DECODE-URL in any way.
(0003778)
BrianH
5-Apr-2013 03:09

Agreed, the internal treatment of url! values by LOAD and the other url! manipulation functions would need to be changed first, because url! values are getting corrupted long before DECODE-URL ever sees them. Once those other functions are fixed, and possibly a new internal model for the url! type is chosen (see #2014), then DECODE-URL can be fixed to work on the new url! data model.

Date User Field Action Change
5-Apr-2013 03:11 BrianH Comment : 0003778 Modified -
5-Apr-2013 03:09 BrianH Comment : 0003778 Added -
5-Apr-2013 02:56 Ladislav Comment : 0003776 Added -
23-Sep-2010 09:50 BrianH Comment : 0002534 Added -
21-Sep-2010 02:29 carl Comment : 0002501 Added -
7-Sep-2010 10:03 meijeru Comment : 0002486 Modified -
7-Sep-2010 10:03 meijeru Comment : 0002486 Added -
2-Sep-2010 19:21 BrianH Description Modified -
2-Sep-2010 19:21 BrianH Code Modified -
2-Sep-2010 19:21 BrianH Status Modified submitted => reviewed
2-Sep-2010 19:20 BrianH Comment : 0002483 Added -
2-Sep-2010 18:59 BrianH Ticket Added -