Last updated: 2022-10-16

Exploring OAuth 2.0 on the Server for Foundkey

On this page I'll try to document the OAuth 2.0 server side. (If I say OAuth anywhere in the following, I probably meant OAuth 2.0.) OAuth is defined by RFC 6749 and a ton of other documents. Because, y'know this might not be the last time I want or need to implement OAuth on a server. Also for anyone else from the Foundkey team that might be reading this. Throughout the page I'll use Foundkey as an example, as it is the project I am currently trying to add OAuth to. Starting right with some definitions for the people that are part of our nice play ("roles"):

term description in example
resource owner The actual human or whatever the resource/data belongs to. Foundkey user
resource server The server that keeps the resource/data. Foundkey
client whoever is trying to access the resource (might also be a "server"!) a 3rd party app or service
authorization server the server that says who may or may not access a resource Foundkey (can be another server in theory)

The general idea is that the client wants to act on behalf of the resource owner but the resource owner doesnt want them to do just anything. OAuth provides a way to do that. Instead of the users personal username and password, the application will receive an access token. In general, an access token is anything the authorization server wants. There are some fancy things you can do with access tokens like JSON Web Token (JWT), but that is another rabbit hole that we're not getting into here. And of course needs to agree with the resource server on this, but here those two are the same. Abstractly, the process goes like this:

  1. client asks resource owner for authorization
  2. resource owner authorizes client
  3. client uses the authorization to get an access token from authorization server
  4. authorization server gives an access token to client
  5. client uses the access token to get a resource from resource server
  6. resource server gives a resource to client

You'll notice, its all a dance between the different actors, specifically always the client asking and then getting something from one of the other actors.

Now, the interesting part is, how exactly these abstract steps are executed. I have to confess, I've cheated a bit, because some parts of this abstract process are already implemented in Foundkey. More specifically, steps (E) and (F) are already implemented. But the OAuth RFC says anyway that …

Access token attributes and the methods used to access protected resources are beyond the scope of this specification and are defined by companion specifications such as [RFC 6750].
RFC 6749 § 1.4

Security Stuff

Refresh Tokens

In step (D), the authorization server can optionally give the client a "refresh token" in addition to the access token. This refresh token can then be used to obtain another access token at a later point, although that access token may have different properties, access rights, etc.

If a refresh token is issued this will most likely mean that the access token expires at some point. This measure seems to be a bit overkill in Foundkeys case so we'll ignore it here.

Expiring Tokens

In a similar vein to Refresh Tokens, maybe even as a companion is the expiry of tokens. Both access tokens as well as refresh tokens can be made to expire.

However, this seems to not be necessary here so it will mostly be ignored.

TLS

Since an access token, just like a username and password can be abused if someone can exfiltrate it, you should definitely be using some kind of encrypted transport. The currently obvious choice would be HTTP/1.1 over TLS ("HTTPS"), or maybe HTTP/3. Please make sure that you use an appropriate TLS version, at time of writing probably at least TLS 1.2, better TLS 1.3. In Foundkeys case, this has to be configured on a reverse proxy (e.g. nginx) so out of scope.

Client registration

The RFC says that a client has to be registered. Big centralized resource/authorization servers like e.g. GitHub or Twitter usually have some kind of form a human should fill in to do this. But, in case you didn't know: Foundkey is part of the Fediverse and is supposed to be rolled out as multiple instances. This makes it very unlikely that someone would want or have the time to register the app by hand. So, we'll have to figure out a way to do automatic client registration. At the moment, Foundkey has a few API endpoints regarding this, but since the Fediverse has a variety of servers, it might be a good idea to implement RFC 7591. And for good measure maybe also RFC 8414.

For now though, just take it for granted that a client does some API call where it gives its details to register. To be a bit more specific about what details the client gives, the RFC mandates client type, redirection URI(s) as well as any other information the server requires. In Foundkeys case we require a few more details like the clients name and a short description.

The client type can either be "public" or "confidential" and the distinction is based on whether the client has a public or confidential client secret. For example, if you were to build a Twitter app, you would register your app beforehand and then have to include the client secret in the app itself. Since it is technically possible to extract this client secret from the app, this is a public client. On the contrary, if you are building a web service where the user needs to go to your own server to use it, the client secret will never get to the clients computer, hence it is a confidential client.

In our case, since we offer or even require dynamic client registration, we will consider all clients confidential clients. Yes this is technically against what the RFC says …

The client type designation is based on the authorization server's definition of secure authentication and its acceptable exposure levels of client credentials. The authorization server SHOULD NOT make assumptions about the client type.
RFC 6749 § 2.1

… but our (i.e. the authentication server's) definition of secure authentication is met by this process so we are happy.

As a result of the registration, the client receives back a client ID and client secret. Note that a "public" client would not get a client secret, since... it would not be much of a secret, right?

Grant Types

At this point, I'll skip around the RFC a bit, and skip over the definitions of the endpoints. Explaining them without explaining what they do is a bit difficult.

So, this is now getting more into the technical implementation of steps (A) to (D) from the abstract process. In OAuth spec speak, these different methods are called grant types. And the following grant types are defined by OAuth:

  1. Authorization Code Grant
  2. Implicit Grant
  3. Client Credentials Grant
  4. Resource Owner Password Grant

Note that in the following, while the RFC distinguishes between the resource owner and their user agent (e.g. web browser), we're not making that distinction here. We are just on the server's perspective and thus not really too interested in the client and resource owners perspective, right?


Authorization Code Grant

This is the most commonly found grant type, maybe even the thing that people think of when they hear OAuth. And for good reason, as you will notice when taking a look at the other grant types. It is the most involved of the processes but in exchange should offer the most security. Here is an outline of the steps in this grant:

  1. client sends resource owner to a webpage of authorization server

    There is some data being sent along here, for example which of the multiple previously registered redirect URLs to use (if more than one redirect URL is supported).

  2. resource owner tells authorization server its own authentication and whether or not to grant access

    The resource owner authentication could for example be an existing login session or asking the resource owner to sign in.

  3. authorization server redirects resource owner to the indicated redirect URL

    Usually, the redirect URL is a page provided by the client. Attached to the redirect URL is some data from the authorization server, usually the authorization code that gives this grant type its name. However, there is also the possibility that instead an error code will be attached.

  4. through following the redirect, resource owner will forward the attached data to client

    If the attached data is an error code or otherwise not an authorization code, the client will now have to deviate from the rest of this process for error handling.

  5. client uses the authorization code and client secret to request an access token from authorization server
  6. authorization server checks the credentials provided by client and if they are okay, returns an access token

Now, let's take a more in depth look at these.

Authorization Request

The authorization request is made in step (a). It is directed at a URL provided by the documentation. Alternatively, as said previously, you could use RFC 8414. In our case, the documentation is an OpenAPI 3 specification. As I mentioned before, we had already set up Bearer token Authentication, so previously the documentation had this snippet in it:

securitySchemes: {
	// legacy authentication...
	
	// the current authentication we are trying to convert to OAuth
	Bearer: {
		type: 'http',
		schema: 'bearer',
	},
},

To convert this part to OAuth we have to know the endpoints. Although we are in the process of implementing OAuth, there also is already a 3rd party authentication API that we are basically going to "upgrade". It is already pretty close to this flow. If you are implementing something from scratch, assume for now you already wrote those endpoints. In our case the new documentation looks something like this:

securitySchemes: {
	// legacy authentication...

	Bearer: {
		type: 'oauth2',
		flows: {
			authorizationCode: {
				authorizationUrl: `${baseUrl}/auth`,
				tokenUrl: `${baseUrl}/api/auth/session/userkey`,
				scopes: {
					'read:example': 'Read example data',
					'write:example': 'Write example data',
					// ...
				},
			},
		},
	},
},

You may recognize that this already has place for other grant flow types, and that I already put something we are going to cover in a later step. But the important part is that a client can now look at the authorizationUrl value to find out where to go. Technically, I would expect the developer of a client to kinda hard code that URL into the client, but they could in theory also dynamically look this up. Note that, different form the tokenUrl, the authorizationUrl is a path in the frontend, because the client will direct the resource owner to go there.

Okay, now that we know where the authorization URL is, we can actually prepare the URL the resource owner needs to go to. It is a HTTP GET request and should use the followig URL parameters (i.e. text/x-www-form-urlencoded query):

response_type
for this grant flow type, must always be "code"

"code" indicates that the requested response is an authorization code.

client_id
client ID from the Client Registration
redirect_uri
redirect URI (usually a URL) used in a later step as pointed out before

It is not mandatory to provide this, as long as there is exactly one redirect URL registered for the client. The client can of course be found based on its client_id that is also provided.

scope
requested scopes/permissions the client wants to be able to use

This parameter is also not manadatory. For the case where it is missing, the RFC says:

If the client omits the scope parameter when requesting authorization, the authorization server MUST either process the request using a pre-defined default value or fail the request indicating an invalid scope. The authorization server SHOULD document its scope requirements and default value (if defined).
RFC 6749 § 3.3

Due to the design of the old API, applications will have to provide a set of permissions when registering the app. Because we will have this data, we can use it as a fallback.

state
optional data used by the client to track some state

Similarly how the access token is gibberish to the client, this parameter is completely gibberish to the authorization server. The authorization server will just pass it back later, along with the authorization token.

With that all added to a URL the client can tell the resource owner to go there.

Authorization Response

Now comes the interesting part, where we actually have to do something again. Since as I said we are "upgrading" an older API, we have the web UI do some API calls that previously the client would have had to perform. One way or the other, the end result is that the web UI (a.k.a. authorization server) first of all makes sure that the resource owner is logged in. After that they are presented with a form to accept or deny the request for permissions. If they decide to accept, the resource owner will be redirected to the redirect URI given earlier. To transfer the data, the following parameters are added to the URL query part (as before):

code
actual authorization code the client wanted to get
state
value of state provided in the request (if any)

Error Response

Now, what happens if they decide to not accept, or something else goes wrong? That is where the Error Response comes in. Instead of returning the Authorization Response, the resource owner is also redirected to the redirect URI given above, but different parameters will be added:

error
indicates that an error happened and which error it was, one of the following:
invalid_request
The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed.
unauthorized_client
The client is not authorized to request an authorization code using this method.
access_denied
The resource owner or authorization server denied the request.
unsupported_response_type
The authorization server does not support obtaining an authorization code using this method.
invalid_scope
The requested scope is invalid, unknown, or malformed.
server_error
The authorization server encountered an unexpected condition that prevented it from fulfilling the request. (This error code is needed because a 500 Internal Server Error HTTP status code cannot be returned to the client via an HTTP redirect.)
temporarily_unavailable
The authorization server is currently unable to handle the request due to a temporary overloading or maintenance of the server. (This error code is needed because a 503 Service Unavailable HTTP status code cannot be returned to the client via an HTTP redirect.)

RFC 6749 § 4.1.2.1
error_description
optional human readable description of the specific error, potentially more detailled than error
error_uri
optional webpage that describes the error
state
value of state provided in the request (if any)

Note that if the client receives an Error Response, it has to exit this process and try to do something else to remedy the error etc. The following steps should not be used if there is an error.

Access Token Request

Now the client finally has its authorization token (if it was successful). But with the authorization token it can still not access any resources. For that it needs the access token. And to get that access token it has to make a separate request to the authorization server. This time, since it doesn't have to go through the resource owner (or their user agent), it is a HTTP POST request. The URL can again be gotten from the documentation. (Remember tokenUrl from earlier?) However, the body is still text/x-www-form-urlencoded, with these parameters:

grant_type
for this grant flow type, must always be "authorization_code"
code
the authorization code
redirect_uri
must be identical to redirect_uri from the Authorization Request (if provided)
client_id
client ID from the Client Registration, if not authenticated

Since we assume that all clients are confidential clients here, the client must also authenticate itself. This means that the client_id is not required in our case. It can do this by using the client secret it was issued in the Client Registration. This client secret is used in a HTTP Basic authentication, i.e. a header like this:

Authorization: Basic <secret>

<secret> has to be replaced by the base64 encoding of the client id concatenated with a colon and the client secret.

Now again, on to what the authorization server has to do: First of all check the invariants given by the field definitions above. Also ensure that the client authentication/client secret is correct and fits together with the provided client ID. And check the provided redirect URI. Although in our case, because the legacy API does not check this, OAuth apps will get a pass on this one, at least for now.

Access Token Response

Again, skipping a bit through the RFC, there are now again two ways to respond. One if everything went well and one if there was an error.

Successful Response

As described above, this is when the client finally gets their access token to play with the API. As mentioned before, there is also the possibility to also issue a refresh token with it, but we are not doing that here. The token also doesn't expire. Although we have seen a lot of text/x-www-form-urlencoded until now, this time the response is JSON. The authorization server will return a HTTP 200 OK response with a JSON object containing the following keys: (I'm omitting the refresh token and expiration time.)

access_token
what the client has wanted all along
token_type
in which way the access token should be used

In our case, as mentioned before, the API uses bearer tokens. So for us, this will always be "bearer".

scope
which scope/permissions this token grants

If everything went well for the client these are the same permissions that were requested before. However, in theory the authorization server or resource owner could decide against giving the client a specific permission while still granting other permissions. As long as the scope is the same as requested, this is optional. But otherwise, it is of course required so the client knows what's up.

Error Response

If something about the request or its handling didn't quite work out, there will be an error response. The error response will be a HTTP 400 Bad Request response an similar in content to the other Error Response from earlier. However, there are of course different error types this time. And similar to the Successful Response above, this time it is formatted as a JSON object, with the following keys:

error
indicates that an error happened and which error it was, one of the following:
invalid_request
The request is missing a required parameter, includes an unsupported parameter value (other than grant type), repeats a parameter, includes multiple credentials, utilizes more than one mechanism for authenticating the client, or is otherwise malformed.
invalid_client
Client authentication failed (e.g., unknown client, no client authentication included, or unsupported authentication method). The authorization server MAY return an HTTP 401 (Unauthorized) status code to indicate which HTTP authentication schemes are supported. If the client attempted to authenticate via the "Authorization" request header field, the authorization server MUST respond with an HTTP 401 (Unauthorized) status code and include the "WWW-Authenticate" response header field matching the authentication scheme used by the client.
invalid_grant
The provided authorization grant (e.g., authorization code, resource owner credentials) or refresh token is invalid, expired, revoked, does not match the redirection URI used in the authorization request, or was issued to another client.
unauthorized_client
The authenticated client is not authorized to use this authorization grant type.
unsupported_grant_type
The authorization grant type is not supported by the authorization server.

RFC 6749 § 5.2
error_description
optional human readable description of the specific error, potentially more detailled than error
error_uri
optional webpage that describes the error

Implicit Grant

Okay, as said before, the Authorization Code Grant is the most involved, so from now its only getting easier. The Implicit Grant is mainly rehashing the Authorization Code Grant, but leaving out parts of it. The biggest difference is that instead of the redirect providing an authorization code and thus requiring another request, the access token is provided immediately. The downside is of course, that someone or something else may see the access token. This includes the resource owner, we do not want them doing something stupid with an access token like accidentally sharing a URL containing one. Thus, implicit grant was intended for public clients.

However, it should be noted that this flow is no longer recommended. Instead, public clients should use the authorization code flow with the PKCE extension (Proof Key for Code Exchange). PKCE is not covered here. Also, the implicit grant will not be specially implemented in Foundkey but I still want to cover it here.

Here are the necessary steps. They start out pretty similar to the Authorization Code Grant, so I've tried to highlight differences.

  1. client sends resource owner to a webpage of authorization server

    There is some data being sent along here, for example which of the multiple previously registered redirect URLs to use (if more than one redirect URL is supported).

  2. resource owner tells authorization server its own authentication and whether or not to grant access

    The resource owner authentication could for example be an existing login session or asking the resource owner to sign in.

  3. authorization server redirects resource owner to the indicated redirect URL

    Usually, the redirect URL is a page provided by the client. Attached to the redirect URL is some data from the authorization server, usually the access token. However, there is also the possibility that instead an error code will be attached.

  4. through following the redirect, resource owner will forward the attached data to client

    If the attached data is an error code or otherwise not an access token, the client will now have to deviate from the rest of this process for error handling.

Steps (e) and (f) in the Authorization Code Grant are not necessary for an Implicit Grant.

Authorization Request

The authorization request is mostly the same as the authorization request from the access token grant. The difference is that the response_type value must always be token to differentiate it.

Access Token Response

As said before, there are no intermediate steps and the access token is issued immediately. However, this response must still be using the redirect URI, so is packaged in URL parameters. The provided parameters may somewhat remind you of the successful response. Since it is still the redirect, it also contains the state parameter again.

This time around the parameters are (again omitting the expiration):

access_token
what the client has wanted all along
token_type
in which way the access token should be used

In our case, as mentioned before, the API uses bearer tokens. So for us, this will always be "bearer".

scope
which scope/permissions this token grants

If everything went well for the client these are the same permissions that were requested before. However, in theory the authorization server or resource owner could decide against giving the client a specific permission while still granting other permissions. As long as the scope is the same as requested, this is optional. But otherwise, it is of course required so the client knows what's up.

state
value of state provided in the request (if any)

Error Response

The error process is again similar to the error process of the authorization code grant. But since we are still doing this in the redirect, the data is provided in the URL parameters instead. Also as a side of a redirect, the state parameter must also be present if supplied. Otherwise it is the same.


Client Credentials Grant

The idea of this grant type is quite a bit different from the previous ones. Depending on how you want to look at it, this grant does not even involve any resource owner. Or maybe you could say the authorization server is the resource owner in this case. In any case, this means that the client is not looking to access resources of a particular "user" just some general resources. This might be useful for example if there are rate limited resources which are not rate limited for registered clients. The process outline is as follows:

  1. client requests an access token from the authorization server

    The client needs to authenticate with the client credentials (i.e. client ID and client secret), hence the name.

  2. authorization server issues the requested token to client

And that is already it. If you want to reimagine it as rehashing parts of the authorization code grant, its only the last two steps.

Access Token Request

Since the client must authenticate itself, HTTP Basic authentication is used, using the client secret. A HTTP POST request to the token endpoint specified by the documentation, form encoded body with the following parameters:

grant_type
for this grant flow type, must always be "authorization_code"
scope
optionally the requested scopes/permissions the client wants to be able to use

Access Token Response

The response process is identical to the access token response of the authorization code grant.


Resource Owner Password Credentials Grant

The name of this grant stems from the fact that the password and credentials of the resource owner are used (by the client) to obtain the access token. Because of this fact, this grant is also usually not recommended.

Its execution is similar to the client credentials grant. These are the steps, differences are highlighted.

  1. client requests an access token from the authorization server

    The client needs to authenticate with the client credentials (i.e. client ID and client secret). The client also needs to provide the resource owners credentials (i.e. username and password).

  2. authorization server issues the requested token to client

Access Token Request

Since the client must authenticate itself, HTTP Basic authentication is used, using the client secret. A HTTP POST request to the token endpoint specified by the documentation, form encoded body with the following parameters:

grant_type
for this grant flow type, must always be "password"
username
the username of the client
password
the password of the client
scope
optionally the requested scopes/permissions the client wants to be able to use

Access Token Response

The response process is identical to the access token response of the authorization code grant.