"RFC" stands for "Request For Comments". "STD" is short for "Standard". RFC and STD documents are technical memos which detail specific information about how some part of the Internet works. For example, HTTP is specified in several documents starting with RFC 7230.
The Internet Engineering Task Force (IETF) is the non-profit organization that writes, edits, and publishes these documents.
As the name suggests, a "Request for Comments" is not a full-fledged formal standard (a "STD"), but in practice they are often treated like standards anyway.
RFC 20 was issued in 1969, before the Internet, the IETF, or the "STD" label existed, and describes a basic network technology that literally every Internet-aware device implements. This elevation to STD 80 is mostly a symbolic acknowledgement of that.
Yes, and a server that accepts 0 length URIs is also perfectly valid according to the spec... (See RFC 2616 or the revised RFC 7230)
This sort of flexibility is usually a good thing. If you write a shitty parser (or server) no one will use it. If you understand that memory is limited and supporting a wide variety of devices requires allowing those devices to be flexible you make a recommendation and leave the implementation to the folks who are trying to make useful things rather than snarky comments :D
To be honest, I'm kind of confounded by the question. HTTP is built around URIs. It has always been intended as a protocol for requesting and sending data at any given URI. To propose that URIs are eliminated and replaced with content bodies is nearly the same as creating a new protocol.
If you won't take my word for it: https://tools.ietf.org/html/rfc7230#section-2.7
> Uniform Resource Identifiers (URIs) [RFC3986] are used throughout HTTP as the means for identifying resources (Section 2 of [RFC7231]).
Now consider at least one possible anecdote. Many existing technologies use URIs to handle various types of caching. A request for /doctor/1
is easy to ship off to a server that has the cached data. If the request went to /
then no processing on the URL can happen, all headers must be processed, and the request body must be parsed before anything else can happen. You've effectively tied your application logic to your infrastructure.
In short: this isn't how HTTP works, so don't do it.
If it's HTTP 1.1 it should be case insensitive, so instead of making workarounds, pestering Amazon to fix it would be the "correct" way to handle it.
> not everything can be boiled down to a simple question without any context required
Then how can you expect strangers on the internet to answer it?
> I get it
No, you don't. I am not defending every "but why?" comment, and I'll gladly acknowledge that plenty of those commenters know one trick and that one trick only, so they can't even fathom out how to answer a question that says "I want to select an element with this id, but without using jQuery".
That's not the case where I see such comments, though. This very abstract discussion, that you seem to take way too personal, would be helped with a link to an actual example, if you can provide one.
For example, paraphrased, the last time I remember seeing this dreaded "why" question, was when someone asked how to find out the content-length from an HTTP response when there was no such header. Their response indicated they were writing an HTTP client from scratch, and wanted to know when they could safely close the connection. So the real answer to their actual question, which was not "How to find a header that does not exist" but rather "How do I know I received an entire HTTP response", was "See section 3.3.3. Message Body Length of the relevant RFC".
That is the kind of "why" that gets you the answer you're looking for, and that's why I don't fully understand the hatred there seems to be against such comments.
So is your goal to serve files via HTTP, or is your goal to build a web server?
If you want to just serve files from a Rust web server, have a look at Hyper or Rocket.
If you want to write your own HTTP server to serve files, yes, you need to start with a TCP stream to receive data from the client, but then you need to parse an HTTP request out of that using an HTTP parser. After that, you need to generate an HTTP response, and send that over the socket. That response will have details in it which indicate that you'll be sending a binary file of a specific size.
Following the header, you can write the file directly to the stream using something like io:copy() . An opened file implements the Read interface, and a TCP Stream implements the Write interface. io::copy copies the contents of a read interface to a write interface.
I'm glossing over a lot of stuff here.
Some references : https://tools.ietf.org/html/rfc7230
An example web server in rust: https://gist.github.com/mjohnsullivan/e5182707caf0a9dbdf2d
Note that in this example the response they send hard-codes the HTTP response to be sent back to the client. You'll probably want something that can dynamically indicate the size of the payload you're sending.
Relevant part of RFC7230. https://tools.ietf.org/html/rfc7230#section-5.4
> When a proxy receives a request with an absolute-form of request-target, the proxy MUST ignore the received Host header field (if any) and instead replace it with the host information of the request-target. A proxy that forwards such a request MUST generate a new Host field-value based on the received request-target rather than forward the received Host field-value.
So HTTP is just a protocol for communicating between a client and a server. To create a server that handles HTTP isn't too tough, you'll need to make your server to continuously listen for new messages on a port and respond appropriately. Start small by handling GET requests, you're not trying to make a browser so all you need to do for responses is format it appropriately and fill the body with the string of the HTML. For technical specifications check out https://tools.ietf.org/html/rfc7230
I like it. Great point.
You've caused me to dig up some RFCs related to HTTP.
It seems to me that application servers could better adhere to the letter and spirit of the relevant RFCs by acting more logically on requests that include query arguments to pages/functions that shouldn't get them. In other words, provide an error rather than make a naive guess about the desired result of a syntactically correct but semantically invalid request.
References:
edit: Seems like it's worth articulating that the vulnerability in the article is based on functionality being exposed without authentication and authorization, and the workaround for the proposed browser-level mitigation of the proposed attack is based on exploiting a problem with inadequate sanitizing and sanity-checking of requests. Also, spelling.
It already is that in RFC 7230 Chapter 3.3.2 (which is current standard)
>A sender MUST NOT send a Content-Length header field in any message that contains a Transfer-Encoding header field.
As a result, such messages are considered malformed and must be ignored.
That header contains multiple values that are not allowed by the spec. This is why Apache rejects it
https://tools.ietf.org/html/rfc7230#section-3.2.6
The following characters are illegal and should be url encoded: ���"������
Just to follow up on this. I figured it out. Most modern webservers on the Internet will require a Host header for the request to go through. It just so happens that the service I was using for testing (https://webhook.site/), does not require a Host header, but this is not the norm. That is what got me so confused.
If you use the Arduino WifiClient class (https://www.arduino.cc/en/Reference/WiFiClient) per the example on the arduino reference documentation, it will likely fail on most modern websites. This is because the example does not pass the Host request header in the GET request, so many sites will return an HTTP 400 Bad Request error instead of the expected 200 OK. The HTTP 1.1 specification (and most HTTP 1.0 requests are upgraded to 1.1) explictly requires the Host header to be passed: https://tools.ietf.org/html/rfc7230#section-5.4
Examples with CURL:
curl iot.mydomain.com?device_id=curlTest -v -H 'User-Agent:' -H 'Accept:' -H 'Host:' --http1.0
Connected to iot.mydomain.com (54.192.73.32) port 80 (#0) > GET /?device_id=curlTest HTTP/1.0 > < HTTP/1.1 400 Bad Request < Server: CloudFront
---
curl iot.mydomain.com?device_id=curlTest -v -H 'User-Agent:' -H 'Accept:' --http1.0
* Connected to iot.mydomain.com (54.192.73.97) port 80 (#0) > GET /?device_id=curlTest HTTP/1.0 > Host: iot.mydomain.com
< HTTP/1.1 200 OK < Content-Type: application/json < Content-Length: 37 < Connection: close
So the fix is to simply pass the host header in your GET request code like this:
// Make GET Request with Headers (CloudFront requires Host header) if (client.connected() ) { client.println((String)"GET /?device_id=" + device_id + " HTTP/1.0"); client.println("Host: iot.mydomain.com"); client.println(); }
I'll also post on the Arduino forums to try to get the documentation corrected ( or at least a comment added ) to help people not get confused like I did.
according to RFC7230, It is RECOMMENDED that all HTTP senders and recipients support, at a minimum, request-line lengths of 8000 octets. The implementations of popular browsers today, however, are set to <2000 characters
In HTTP, the order of headers with the same name is significant (RFC7230, section 3.2.2), so this mandates an ordered collection, such as a vector. Conversely the order of headers with differing names is not significant, so we can use a unordered map for lookup.
Per https://tools.ietf.org/html/rfc7230#section-3.5 it is not mandatory and you MAY terminate HTTP with just a LF.
Http2 doesn’t use them at all since it’s a binary protocol https://tools.ietf.org/html/rfc7540
IRC uses either cr or lf, or crlf, https://tools.ietf.org/html/rfc7540
FTP, SMTP, IMAP, and POP all remain CRLF as they’re remnants from the early 1980s
From RFC 7230 here is the grammer fro an HTTP 1.1 message:
HTTP-message = start-line
*( header-field CRLF )
CRLF
[ message-body ]
As you can see, The body doe not end in an CRLF, so readLine will hang there until either it receives a newline or the connection is closed. Hence it appears to send when the connection is terminated.
running your command sends the following message:
> POST HTTP://www.lepel.nl/ HTTP/1.1
> User-Agent: curl/7.35.0
> Host: www.lepel.nl
> Accept: /
> Proxy-Connection: Keep-Alive
> Content-Length: 11
> Content-Type: application/x-www-form-urlencoded
>
> name=reddit
So using the Content-Length head should be good enough
Indeed, it's not only replaceable by local equivalents, it's completely optional, clients should ignore it: https://tools.ietf.org/html/rfc7230#section-3.1.2
> The reason-phrase element exists for the sole purpose of providing a textual description associated with the numeric status code, mostly out of deference to earlier Internet application protocols that were more frequently used with interactive text clients. A client SHOULD ignore the reason-phrase content.
This is not a bug. The reason for this behavior should be obvious: a web server cannot blindly buffer arbitrarily long request URIs because that would make it trivial to DoS the server. Why? Because the server has to keep the full request URI in memory while parsing. An attacker would only need a handful of clients to send requests lines of excessive length to use all of the server's memory and crash the application.
Neither the original RFC 2616 nor the updated RFC 7230 HTTP specifications specify a maximum allowed request line length:
> HTTP does not place a predefined limit on the length of a > request-line, as described in Section 2.5. A server that receives a > method longer than any that it implements SHOULD respond with a 501 > (Not Implemented) status code. A server that receives a > request-target longer than any URI it wishes to parse MUST respond > with a 414 (URI Too Long) status code
My humble advice? Any application sending URI query strings in excess of the built-in server's 1375-byte buffer is seriously Doing It Wrong™. Use the HTTP message entity body if you're submitting data of that size. Request URIs are designed to denote resources. HTTP entity bodies are designed to carry payloads.