How to configure a Flask app to log user id and session id in nginx logs
Nginx access logs are a very important data source for web traffic analysis. Analysing these logs can provide us information like the frequency of hits to specific urls, traffic from specific ips etc. While this is useful by itself, for it to be a proper analytics datasource, it should also be possible to analyse user behaviour over time — monitoring things like when the user signs up, when the user converts to a paying customer etc using the log entries themselves. In order to do this, we can configure our application server to send user and session id details in every response and then configure nginx to then record those details in the logs.
Since we want this information to be sent for each and every request, the preferable way to do it would be to include it as a part of the header information for every response generated by the server. Different frameworks have different ways of customising the response headers. Here we will see how to do it in Flask
Configuring the application server to set user id and session id in header
Flask has a concept called after_request handler which is a decorator which will ensure that the decorated function is called after every request. So we will use this feature to modify the response headers.
We can use an after request handler to set some response headers. For custom headers, the convention is to prefix it with a X. So we can call our headers as X-Userid and X-Sessionid.
In the above example, for X-Userid I am setting the user’s email. Any other unique identifier will do as well. For session id, I am using the unique identifier set by the session handling library.
Now that our application is setting these headers in the response, how do we configure nginx to read and log them ?
By default this is what an nginx access log entry looks like.
100.123.123.123 — — [10/Apr/2020:13:07:13 +0000] “GET /api/posts/1?user=2343&utm_source=ads HTTP/1.1” 200 75612 “ https://example.com/posts" “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36”
The default format shown above is called the
combined log format. It is defined as follows
$remote_addr — $remote_user [$time_local] “$request” $status $body_bytes_sent “$http_referer” “$http_user_agent”
We can modify this log format using the log_format directive. This directive is documented here. As mentioned there, various variables are available at log writing time which can be used to log various types of information.
In our particular case, we need the variables which contain the info in the two response headers X-Userid and X-Sessionid generated by the application server. Typically the headers in the request are available as variables of the format $http_<insert_request_header_name> and the headers in the nginx response are available in variables of the format $sent_http_<insert_response_header_name>. However the headers we need are response headers which are not set by nginx, but by the flask application server. How do we access those headers ?
Since nginx acts as the reverse proxy, the application server (Flask in our case) is considered as an upstream server in nginx terminology. And the documentation corresponding to the upstream servers mentions that these headers are made available as embedded variables which are named like $upstream_http_<insert_header_name_here> The documentation also mentions that the upper case letters in the header name are converted to lower case and dashes are converted to underscores. So X-Userid becomes available as $upstream_http_x_userid and X-Sessionid becomes available as $upstream_http_x_sessionid
Armed with this info, we can now modify the log_format directive to include this info also in the logs. Open /etc/nginx/nginx.conf and replace the log_format directive there with the following
log_format extended ‘$remote_addr — $remote_user [$time_local] “$request” ‘ ‘$status $body_bytes_sent “$http_referer” ‘ ‘“$http_user_agent” ‘ ‘userid=”$upstream_http_x_userid” sessionid=”$upstream_http_x_sessionid”’;
Also make sure to modify the access_log directive to use this modified format as follows
access_log /var/log/nginx/access.log extended;
After making these two changes in the http section of nginx.conf, if we restart the nginx server (and the application server as well), we will see that the log entries are recorded with the user information like this
100.123.123.123 — — [10/Apr/2020:13:07:13 +0000] “GET /api/posts/1?user=2343&utm_source=ads HTTP/1.1” 200 75612 “ https://example.com/posts" “Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36” userid=”email@example.com “ sessid=”4f23b99ce5c298065aeaac569b3a22854a00eb2”
Now with this extra information recorded as a part of the nginx logs, we can incorporate this into our analytics queries in order to obtain various behavioral insights. We will see some examples of such queries in a later post.
Originally published at https://techonometrics.com.