A. Nature of the data
Consider the web page sample_form.html
discussed in a
previous handout.
The HTML for the page includes the usual header and ender
stuff and also a <FORM ACTION="get_info.cgi" ...> ... </FORM>
block,
specifying a web form. In this block are various tags for
text fields, boxes, buttons, etc.
Notice that each field has a field name, given by an attribute
NAME=
. When the user fills out the form and clicks on the ``Submit''
button, the browser sends information to the CGI program
get_info.cgi
as ``name=value pairs''.
For example, one field is named email
and whatever the user
fills in is its value. If the user fills in emilyts@ucla.edu
then the information comes to your CGI program as a string
email=emilyts@ucla.edu
This string is actually part of a longer string holding all the data from the form. Your CGI program needs to take this longer string apart to get the data.
Point your browser to the sample form at
http://www.math.ucla.edu/~baker/40/sample_form.html
Then fill in some information and click on the ``submit'' button.
The information is sent to get_info.cgi
, which summarizes
the name=value pairs it found. Back up, change the information,
and try again.
B. How does the browser put together the data to send?
The browser follows these steps:
%2C
and so on.
(You don't have to know the codes.) A current version of Netscape
replaces all special characters except underbar, hyphen, asterisk,
period, and space.
+
(in browsers that have not
already encoded the space characters)
&
as the separator, to make one longer combined data string.
The browser gives the combined data string to the web server, which starts the CGI program and passes the string to it. The CGI program has to undo the steps just listed in order to get the information out.
C. What are the name=value pairs for the different kinds of fields?
In every case, the NAME=
attribute gives the field name, so
it's just the value that's in question.
<INPUT TYPE="checkbox" NAME="student">
the value is on
if the box is checked but no value is sent if
the box is not checked. You can use the attribute VALUE=
to
change on
to something else if you wish, for example, VALUE="good"
.
Different checkboxes should have different names, so your program can tell them apart when it gets the data--or if they have the same name, then they should have different values. Your program will need to see whether any value was sent or not.
VALUE=
. Having a name in common is how the
browser knows they are in the same group. The values have to be
different so your CGI program can tell which one was checked.
MULTIPLE
).
Each value selected results in a separate name=value pair.
For example, if a form asks the user to select US states and
allows multiple choices and if the user selects both CA
and TX
,
the CGI program will get bothstate=CA
and
state=TX
.
<INPUT TYPE="submit" VALUE="Submit the form">
the value returned is given by the VALUE=
attribute. In other
words the value returned is the same as what is printed on the button.
VALUE=
attribute.
<INPUT TYPE="hidden" NAME="info" VALUE="student">
the value is whatever is given by the VALUE
attribute, since the user
doesn't have any opportunity to change it!
Some points not to get confused about:
VALUE=
attribute, which
becomes the default value. So it's returned as part of a
name=value pair only if the user doesn't change it to something else.
NAME=name
.
D. How is the combined data string sent?
There are two methods: ``GET'' and ``POST''. The method is specified as
an attribute of the FORM
tag, for example,
<FORM ACTION="get_info.cgi" METHOD="GET">
and a CGI Perl script is told the method via $ENV{"REQUEST_METHOD"}
For ``GET'', the combined data string is appended to the URL, separated
by a question mark. The browser gives it to a CGI Perl script as
$ENV{"QUERY_STRING"}
.
For ``POST'', the combined data string is given to the Perl CGI script
as standard input of a certain length. The Perl script can tell the
length from $ENV{"CONTENT_LENGTH"}
. There is a Perl command to
read a string of a fixed length into your variable, say $query_string
read (STDIN, $query_string, $ENV{"CONTENT_LENGTH"});
Which method is better? That depends on the application. The ``GET'' method is easier to test, since you can just make up data to append directly to the URL and call the CGI program directly. The ``POST'' method is used for larger amounts of data and for data containing passwords, where you don't want them visible in the browser's URL window.
Try this: In your browser, go directly to the URL
http://www.math.ucla.edu/~baker/40/get_info.cgi?XX=hi
instead of filling out the sample form. Does this do what you expect?
In the next assignment, you'll write a Perl subroutine to read the form either way, so you can call the subroutine to get the data without worrying about which way the data was sent. Of course, since you're designing the HTML form page as well, you do control which way the data is sent in the first place.
E. Summary of what your CGI program does
$ENV{"REQUEST_METHOD"}
is GET
or POST
. (Use eq
.)
GET
, use $query_string = $ENV{"QUERY_STRING"};
If it's POST
, use
read (STDIN, $query_string, $ENV{"CONTENT_LENGTH"});
@pairs = split /\&/, $query_string;
($name,$value) = split /\=/, $pair
+
signs by spaces and then any
codes by the proper special characters.
One way: Use $value =~ s/\%([\dA-Fa-f]{2})/chr(hex($1))/ge;
Here [\dA-Fa-f]
describes possible hex digits, {2}
means
``exactly two'' of them, ()
results in $1
, hex()
is to convert
hex to integer, chr
finds the character with that integer as its code,
g
means global as usual, and e
means ``evaluate the expression''
instead of putting literally chr(hex...)
F. The ASCII character set in hexadecimal.
You don't have to learn these codes; just get the flavor.
The first group consists of nonprinting characters, with antique names that come from teletype codes decades ago. The ones still occasionally relevant for you are BS=backspace, BEL=bell, HT=tab (i.e., horizontal tab), NL=newline, NP=new page, CR=carriage return, and ESC=escape. The end of line is signaled by NL in UNIX, NL CR in DOS and Windows, and CR on Macintoshes.
Some of these nonprinting characters are produced directly by keys,
for example tab, backspace, and escape. All the nonprinting
characters can be produced using the ``control key'' CTRL, which sets
the first three bits to 0. For example, backspace can also be produced by
CTRL h , since h
is hex 68 = 01101000, which is stripped to 00001000
= hex 08 = backspace.
00 NUL 01 SOH 02 STX 03 ETX 04 EOT 05 ENQ 06 ACK 07 BEL
08 BS 09 HT 0A NL 0B VT 0C NP 0D CR 0E SO 0F SI
10 DLE 11 DC1 12 DC2 13 DC3 14 DC4 15 NAK 16 SYN 17 ETB
18 CAN 19 EM 1A SUB 1B ESC 1C FS 1D GS 1E RS 1F US
20 SP 21 ! 22 " 23 # 24 $ 25 % 26 & 27 '
28 ( 29 ) 2A * 2B + 2C , 2D - 2E . 2F /
30 0 31 1 32 2 33 3 34 4 35 5 36 6 37 7
38 8 39 9 3A : 3B ; 3C < 3D = 3E > 3F ?
40 @ 41 A 42 B 43 C 44 D 45 E 46 F 47 G
48 H 49 I 4A J 4B K 4C L 4D M 4E N 4F O
50 P 51 Q 52 R 53 S 54 T 55 U 56 V 57 W
58 X 59 Y 5A Z 5B [ 5C \ 5D ] 5E ^ 5F _
60 ` 61 a 62 b 63 c 64 d 65 e 66 f 67 g
68 h 69 i 6A j 6B k 6C l 6D m 6E n 6F o
70 p 71 q 72 r 73 s 74 t 75 u 76 v 77 w
78 x 79 y 7A z 7B { 7C | 7D } 7E ~ 7F DEL
In this table of 128 codes the first bit is always 0. In Extended ASCII there are 128 more codes for other symbols, with the first bit 1.
A more modern character set is Unicode, which uses 124 bits and encodes many more foreign-language characters.