作者HZYSoft (pcman.ptt.cc)
看板PCman
标题[释疑]关於网址的说明---PART 2
时间Fri Jan 16 21:42:00 2004
关於网址可以使用的字元,请参照这里
抱歉我之前写程式查的文件是非正式的,正确性不够,写的不够完整,所以我也有点弄错
今天看了正牌的文件,应
我再节录一段 RFC 文件,有兴趣的看看
转录自: ftp://ftp.rfc-editor.org/in-notes/rfc1738.txt
Berners-Lee, Masinter & McCahill [Page 2]
RFC 1738 Uniform Resource Locators (URL) December 1994
the chararacter which has that octet as its code within the US-ASCII
[20] coded character set.
In addition, octets may be encoded by a character triplet consisting
of the character "%" followed by the two hexadecimal digits (from
"0123456789ABCDEF") which forming the hexadecimal value of the octet.
(The characters "abcdef" may also be used in hexadecimal encodings.)
Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.
No corresponding graphic US-ASCII:
URLs are written only with the graphic printable characters of the
US-ASCII coded character set. The octets 80-FF hexadecimal are not
used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
control characters; these must be encoded.
Unsafe:
Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".
All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.
--
※ 发信站: 批踢踢实业坊(ptt.cc)
◆ From: 140.129.59.3