CSE 135: Homework #1 Solutions
  1. Capture the HTTP response headers for the following requests: (10 pts/ 1 each)

    You may find various network capture tools like samspade.org or others useful to do this rather than issuing requests by hand via telneting to port80.

    1. GET / HTTP/1.1
      Host: www.google.com
      
      HTTP/1.1 200 OK
      Cache-Control: private
      Content-Type: text/html
      Set-Cookie: PREF=ID=42866e427cf33c4b:TM=1089779847:LM=1089779847:S=GnTZTj8a33LiT0ej; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
      Server: GWS/2.1
      Content-Length: 2017
      Date: Wed, 14 Jul 2004 04:37:27 GMT
      
    2. HEAD / HTTP/1.1
      Host: www.google.com
      
      HTTP/1.1 200 OK
      Cache-Control: private
      Content-Type: text/html
      Set-Cookie: PREF=ID=26055faf16d08c46:TM=1089779909:LM=1089779909:S=io9lk_db1mJFyZ8f; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
      Server: GWS/2.1
      Content-Length: 2017
      Date: Wed, 14 Jul 2004 04:38:29 GMT
      
    3. GET /foo HTTP/1.1
      Host: www.ucsd.edu
      
      HTTP/1.1 404 Not Found
      Date: Wed, 14 Jul 2004 04:39:12 GMT
      Server: Apache/1.3.27 (Unix)
      Connection: close
      Transfer-Encoding: chunked
      Content-Type: text/html; charset=iso-8859-1
      
    4. OPTIONS / HTTP/1.1
      Host: www.google.com
      

      No headers; connection reset by peer.

    5. OPTIONS / HTTP/1.1
      Host: www.w3.org
      
      HTTP/1.1 200 OK
      Date: Wed, 14 Jul 2004 04:40:38 GMT
      Server: Apache/1.3.28 (Unix) PHP/4.2.3
      P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
      Cache-Control: max-age=600
      Expires: Wed, 14 Jul 2004 04:50:38 GMT
      Content-Length: 0
      Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE
      Connection: close
      
    6. TRACE / HTTP/1.1
      Host: www.whitehouse.gov
      

      No headers; connection closed.

    7. TRACE / HTTP/1.1
      Host: www.ucla.edu
      
      HTTP/1.1 200 OK
      Date: Wed, 14 Jul 2004 04:41:35 GMT
      Server: Apache/2.0.48 (Unix)
      Connection: close
      Transfer-Encoding: chunked
      Content-Type: message/http
      
    8. POST / HTTP/1.1
      Host: www.pint.com
      
      (you'll need to figure out how to send some data)
      HTTP/1.1 400 Bad Request
      Content-Type: text/html
      Date: Wed, 14 Jul 2004 04:45:37 GMT
      Connection: close
      Content-Length: 42
      
    9. DELETE / HTTP/1.1
      Host: www.w3.org
      
      HTTP/1.1 405 Method Not Allowed
      Date: Wed, 14 Jul 2004 04:45:58 GMT
      Server: Apache/1.3.28 (Unix) PHP/4.2.3
      Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE
      Connection: close
      Transfer-Encoding: chunked
      Content-Type: text/html; charset=iso-8859-1
      
    10. DELETE / HTTP/1.1
      Host: www.microsoft.com
      
      HTTP/1.1 400 Bad Request
      Content-Type: text/html
      Date: Wed, 14 Jul 2004 04:45:59 GMT
      Connection: close
      Content-Length: 20
      
  2. Explain what P3P is and how it is used in an HTTP response? Present example(s) of P3P header(s) and explain the compact policy form. Does Internet Explorer support P3P? Discuss. Provide URLs and policy examplesfrom at least two well known Web sites. (20 pts)

    See the W3C's Platform for Privacy Preferences (P3P) Project page. P3P provides a standardized, machine-readable way for Web sites to publicize their privacy policies to users. An HTTP response may include a P3P header that refers to a document describing the policy; the header may also contain a compact encoding of the policy. The following header appears in this P3P example:

    P3P: policyref="http://catalog.example.com/P3P/PolicyReferences.xml"
    

    The optional compact policy form comprises a number of tokens that summarize policies to enable faster client-side policy application.

    According to Microsoft Knowledge Base Article 293222, "Internet Explorer 6 implements advanced cookie filtering based on the Platform for Privacy Preferences (P3P) specification." However, the P3P specification is defined more broadly than as a mechanism for cookie filtering.

    Two well known Web sites that supply P3P headers are Yahoo and Dell:

    $ echo "GET / HTTP/1.0\n\n" | nc yahoo.com 80 | grep P3P
    P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE GOV"
    $ echo "GET / HTTP/1.0\n\n" | nc dell.com 80 | grep P3P
    P3P: CP="BUS CAO CNT COM CUR DEV DSP INT NAV OUR PSA PSD SAM STA TAI UNI"
    P3P: policyref="http://www.dell.com/w3c/p3p.xml", CP="BUS CAO CNT COM CUR DEV DSP INT NAV OUR PSA PSD SAM STA TAI UNI"
    
  3. What does the 204 HTTP Response code mean? Explain how this response code might be useful to a Web programmer to create an interesting type of communication between browser and server. (10 pts)

    See section 10.2.5 of RFC 2616, HTTP/1.1. Most importantly,

    If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent's active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent's active view.

    This means a Web developer can have a server respond with a 204 to create a kind of one-way communication from client to server, of which the end-user need not be aware. This could be useful for applications such as usage tracking.

  4. What is the robots.txt file? Present an example robots.txt file with a variety of directives in it. Are there HTML equivalents to the features provided by robots.txt? If so what are they? Provide examples if there any. (20 pts)

    The robots.txt file specifies an access policy for robots---programs that retrieve many pages from Web servers by following links from one to another. Here is an example robots.txt file:

    $ wget -qO- http://google.com/robots.txt
    User-agent: *
    Disallow: /search
    Disallow: /groups
    Disallow: /images
    Disallow: /catalogs
    Disallow: /catalog_list
    Disallow: /news
    Disallow: /pagead/
    Disallow: /relpage/
    Disallow: /imgres
    Disallow: /keyword/
    Disallow: /u/
    Disallow: /univ/
    Disallow: /cobrand
    Disallow: /custom
    Disallow: /advanced_group_search
    Disallow: /advanced_search
    Disallow: /googlesite
    Disallow: /preferences
    Disallow: /setprefs
    Disallow: /swr
    Disallow: /url
    Disallow: /wml
    Disallow: /hws
    Disallow: /bsd?
    Disallow: /linux?
    Disallow: /mac?
    Disallow: /microsoft?
    Disallow: /unclesam?
    Disallow: /answers/search?q=
    Disallow: /local
    Disallow: /froogle?
    Disallow: /froogle_
    

    As an alternative to the file, there is an HTML tag to accomplish the same goal: the robots META tag. That page gives the following example of such a tag:

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    
  5. Briefly explain HTTP Delta encoding. (10 pts)

    The idea behind HTTP delta encoding is to transmit merely the differences between pages, much the way sequences of frames are encoded in compressed digital video. This scheme hasn't been widely adopted.

  6. The user agent header is often used to perform browser detection, unfortunately due to spoofing and other problems it really isn't that useful. Explain the end user characteristics that would be useful to detect for and then provide a brief overview of how one would go about detecting for these characteristics. If it is not possible to detect for something indicate that. You do not have to code anything, though presenting psuedo-code or code fragments to present your approach may be useful and garner extra points.

    Beware: do not just copy some "sniff code" you find. I am looking for a discussion of the problem with code, not just code which may or not address the problem properly. (30 pts)

    It would be useful to know what file formats, encodings, and languages a client accepts, all of which can normally be determined from HTTP headers the client sends. Further, it would be useful to know what client-side scripting languages the client supports; this can best be accomplished by including in the response scripts that, when executed, will communicate to the server. Assuming client-side scripting is available, it can be used to communicate other information such as display characteristics---perhaps browser window size and color depth, for example.

    This is only a synopsis. A full answer should address more specifics on the characteristics detected for and how that might be done; for example: line speed, screen size, plugin availability, and so on.