Practical 1: HTTP Web Proxy Server Programming Practical
Due Mar 29 by 17:00 Points 24
Adapted from Kurose & Ross - Computer Networking: a top-down approach featuring the Internet
CBOK categories: Abstraction, Design, Data & Information, Networking, Programming
In this practical, you will learn how web proxy servers work and one of their basic functionalities - caching.
Task
Your task is to develop a small web proxy server which is able to cache web pages.
It is a very simple proxy server which only understands simple HTTP/1.1 GET-requests but is able to handle all
kinds of objects - not just HTML pages, but also images.
Introduc!on
In this practical, you will learn how web proxy servers work and one of their basic functionalities, caching
(https://en.wikipedia.org/wiki/Cache_(computing)) . Generally, when a client (e.g. your browser) makes a web request
the following occurs:
1. The client sends a request to the web server
2. The web server then processes the request
3. The web server sends back a response message to the requesting client
And let‘s say that the entire transaction takes 500 ms.
In order to improve the performance, we create a proxy server between the client and the web server. The proxy
server will act as a middle-man for the web transactions. Requesting a web request will now occur in the following
steps:
1. The client sends a request to the proxy server
2. Skip to step 7 If the proxy server has cached the response
3. Forward the request to the web server
4. The web server processes the request
5. The web server sends a response back to the proxy server
6. The proxy server caches the response
7. The proxy server returns the cached response to the client
On the first request, the transaction may be a fraction longer than the previous example. However, subsequent
requests will be significantly faster due to reduced network latency and server load (sometimes less than 10 ms).
The mechanism for caching can be as simple as storing a copy of a resource on the proxy servers file system.
Steps for approaching the prac!cal
Step 1
Understand the HTTP/1.1 requests/responses of the proxy. Your proxy MUST be able to handle GET requests from
a client. Your proxy may handle other request types (such as POST) but this is not required.
You need to know the following:
1. What HTTP request will the browser send to the proxy?
2. What will HTTP response look like?
3. In what ways will the response look different if it comes from the proxy than if it comes from the origin server
(i.e. the server where the original page is stored?). You will not be able to test this yet, but what do you think
would happen?
Step 2
Understand the socket connections:
1. How will the client know to talk to the proxy?
2. What host and port number will the proxy be on?
3. The proxy may need to connect to the origin server (if the web object is not in the cache), what host and port
number will the proxy connect to?
4. Where will the proxy get the web object information?
Step 3: Checkpoint
Make sure you have the answers to steps 1 & 2 before you go any further.
You can‘t code it if you don‘t know what should happen.
Ask questions on the discussions forum if you are unsure above the above.
Step 4: Python Code
Now that you know what should be happening, have a look at the code here.
Look at the interactions you identified in steps 1 & 2 and see where they would occur in the code.
Review the python code presented in the sockets lecture. You‘ll find details of the socket calls in the Python Socket
library documentation https://docs.python.org/2.7/library/socket.html
(https://docs.python.org/2.7/library/socket.html)
(https://docs.python.org/2.7/library/socket.html) You won‘t just be able to copy the lecture code, but it shows
you the key steps; creating sockets, connecting sockets, sending data on sockets and receiving data on sockets.
Your task is to make the correct socket calls and supply the correct arguments for those calls.
You will only need to fill in the code between the comment lines.
The comments above the comments lines give you hints to the code to insert.
Step 5: Differences in Python and C
If you are new to python, look at the code structure. Most of the code is given to you with your focus just on adding
the networking code. A couple of things that are different in Python than in the C derived languages:
1. Python uses whitespace to indicate code blocks and end of lines. Note there are no brackets or braces { }
around code blocks and no semi-colons ; at the end of lines. The indentation isn‘t just important for readability
in Python, it affects how your code runs. Make sure you indent code blocks correctly.
2. Python has a tuple data structure. So functions can return more than one value, or more precisely return one
tuple with multiple values, but the syntax allows the parenthesis to be left off. Tuples appear in the lecture slides
in the line:
(serverName, serverPort) is a tuple of two values that are passed as the argument to the connect() function.
The accept() call returns a tuple of two values, the first value is the new socket and the second is the address
information. This is the same as:
The use of the ( ) around the tuple is optional.
Step 6
Start with getting the proxy working when a cached file is requested. Where it will return the response itself and
does not have to contact the origin server
Step 7
# ~~~~ INSERT CODE ~~~~
...
# ~~~~ END CODE INSERT ~~~~
1
2
3
1 clientSocket.connect((serverName,serverPort))
1 connectionSocket, addr= serverSocket.accept()
1 (connectionSocket, addr)= serverSocket.accept()
Once that is working, add the code to handle the case where the file is not cached by the proxy and the proxy must
request it from the origin server.
Step 8
In both steps 6 & 7, make use of both telnet utility and a browser to test your proxy server. You can also use
Wireshark to capture what is being sent/received from the origin server.
Running your proxy in the labs and Websub
The labs and test machines are running Python 2.7. Make sure that your submission is compatible with Python 2
and does not use Python 3 features that are not backward compatible with Python 2.7.
The University proxy will intercept any attempt to connect to a host outside of the University and require
authentication (which is outside the scope of this practical), thus, your proxy will not be able to connect to origin
servers outside of the University when running on computers inside the University. When testing inside the
University, use URLs within the University infrastructure. This should not affect you when running your proxy
outside the University.
Running the Proxy Server
Download the template file and save it as Proxy.py .
Run the following command in terminal to run the proxy server
You can change localhost to listen on another IP address for requests, localhost is the address for your machine.
You can change 8888 as the port to listen to.
When you run the proxy server for the first time, it will loop forever failing to connect to a server. That‘s ok for now.
Press Ctrl+C to terminal the program.
When you are ready to test your proxy server, open a Web browser and navigate to
http://localhost:8888/https://ecms.adelaide.edu.au
Note that when typing the proxy into the browser request in this way, the browser may only display the main page
and may fail to download associated files such as images and style sheets. This is due to a difference in URI
requirements when sending requests to a proxy vs sending to an origin Web server (we‘ll look at this in the
tutorials).
Configuring your Browser (op!onal)
You can also directly configure your web browser to use your proxy if you want without using the URI. This depends
on your browser.
In Internet Explorer, you can set the proxy in Tools > Internet Options > Connections tab > LAN Settings.
In Netscape (and derived browsers such as Mozilla), you can set the proxy in Tools > Options > Advanced tab >
Network tab > Connection Settings.
In both cases, you need to give the address of the proxy and the port number that you set when you ran the proxy
1 $ python Proxy.py localhost 8888
Prac 1: Web Proxy
server. You should be able to run the proxy and the browser on the same computer without any problem. With this
approach, to get a web page using the proxy server, you simply provide the URL of the page you want.
For example, running https://ecms.adelaide.edu.au would be the same as running
http://localhost:8888/https://ecms.adelaide.edu.au
Set up like this, the browser should successfully load both the main web page and all associated files due to
reasons we‘ll discuss in tutorials.
Tes!ng your code
When it comes time to test your code, cUrl is a useful tool to use.
Let‘s have a look at these two tests:
Obtained remote file
The above command requests https://ecms.adelaide.edu.au https://ecms.adelaide.edu.au (https://ecms.adelaide.edu.au) (https://ecms.adelaide.edu.au) via the Web
proxy. -I prints out response headers and -s removes additional output and head -n 1 extracts the first line
from the cUrl command.
The result is the first line in the response from the proxy. This response should match if you were talking directly to
the origin server, like so
Handle page that does not exist
The response for a path that doesn‘t exist shows the above status code. Your proxy to work correctly will also need
to handle this case too.
Submission
Your proxy server will be inside the Proxy.py file only.
Your work will need to be submitted to Websub (https://cs.adelaide.edu.au/services/websubmission/) with the
assignment path /2019/s1/cna/webproxy
The Web submission system will perform a static analysis of your code. Your code will be run by one of the
teachers.
$ curl -sI localhost:8080/https://ecms.adelaide.edu.au | head -n 1
HTTP/1.1 200 OK
1
2
$ curl -sI https://ecms.adelaide.edu.au | head -n 1
HTTP/1.1 200 OK
1
2
$ curl -sI localhost:8080/https://ecms.adelaide.edu.au/fakefile.html | head -n 1
HTTP/1.1 404 Not Found
1
2
Total Points: 24.0
Criteria Ratings Pts
0.0 pts
10.0 pts
2.0 pts
2.0 pts
2.0 pts
2.0 pts
1.0 pts
1.0 pts
4.0 pts
Proxy server started 0.0 pts
Full Marks
0.0 pts
No Marks
Connected to Proxy server 10.0 pts
Full Marks
0.0 pts
No Marks
Obtained remote homepage 2.0 pts
Full Marks
0.0 pts
No Marks
Obtained remote file 2.0 pts
Full Marks
0.0 pts
No Marks
Handle page that does not exist 2.0 pts
Full Marks
0.0 pts
No Marks
Cache requested webpages 2.0 pts
Full Marks
0.0 pts
No Marks
Read from a cached file 1.0 pts
Full Marks
0.0 pts
No Marks
Redownloaded the file from server after file was removed 1.0 pts
Full Marks
0.0 pts
No Marks
Handles internal server error 4.0 pts
Full Marks
0.0 pts
No Marks
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
微信:codinghelp
原文地址:https://www.cnblogs.com/whltay/p/10548233.html