Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Warning

This document is not ready for publication. Remove this notification and remove viewer restrictions when ready.

...

The ID Service stores metrics data to an InfluxDB service with a default retention policy of 24 hours. This may not be long enough time for your use cases. This document will show with examples how to store metrics data for a longer period of time, and how to configure Grafana to display it.

Table of Contents

The theory

To store measurements for a longer time, it must be stored with a Retention Policy of desired length. It doesn’t make sense to store ALL data for the specified time: When Grafana shows values from a month in a graph, it physically cannot display data at 50 millisecond accuracy. Having a lot of data at very short intervals takes a lot of processing, bandwidth and disk space. So you must think about downsampling the data as well.

...

Let’s create a CQ that collects everything usable from default mgmt_api_request data. We can find the tag and field names with commands like SELECT * FROM mgmt_api_request LIMIT 1.

...

...

CREATE

...

CONTINUOUS

...

QUERY

...

"mgmt_api_requests_5min_for_1month"

...

ON

...

"oneportal"

...

BEGIN

...

SELECT

...

count(auth_header_exists)

...

AS

...

"count_requests",

...

sum("request_duration")

...

AS

...

"sum_request_duration",

...

mean("request_duration")

...

AS

...

"mean_request_duration"

...

INTO

...

"one_month"."mgmt_api_requests"

...

FROM "mgmt_api_request"

...

GROUP

...

BY

...

time(5m),

...

"path",

...

"method",

...

"api_client_id",

...

"oauth2_client_id"

...

END

We could have created a CQ that collects only the number of requests and groups them by path, but in anticipation of other needs, we collect more data and more tags.

...

  • The data will be stored with RP one_month and name mgmt_api_requests

  • It will have the following values:

    • count_requests: The number of requests made during 5 minutes. This is needed in our example.

    • sum_request_duration: A sum of the duration of request processing during 5 minutes. This can be used to display which requests are slowest to process and might be the causes of performance problems.

    • mean_request_duration: The mean duration of request processing during 5 minutes. This can be used to display how long requests take on average.

    • Tags path, method, api_client_id and oauth2_client_id. Of these path is needed for our example, others can be used in other graphs.

  • The fields response_code, request_uri apparently cannot be used for grouping in a CQ, so they are not included.

Test the Continuous Query

Make a few Management API calls to add data.

Code Block
curl -X GET "https://your-id-server.com/api/rest/v1/version"

Wait 5 minutes for the CQ to run. Then run the query:

Code Block
breakoutModefull-width
languagetext
> SELECT * FROM one_month.mgmt_api_requests
name: mgmt_api_requests
time                api_client_id    count_requests mean_request_duration method path                       sum_request_duration
----                -------------    -------------- --------------------- ------ ----                       --------------------
1608640500000000000 1248769513590337 1              14                    GET    user/{userId}/customfields 14
1608640500000000000 1248769513590337 2              48.5                  POST   user/{userId}/customfields 97
1608640500000000000                  5              2                     GET    example                    10
1608641100000000000                  6              1.8333333333333333    GET    version                    11

We see that 6 such requests were made, with an average processing time of 1.83 ms.

Update the Grafana panel

Make a duplicate of the original panel as a backup. Update the panel query configuration:

  • In FROM use the one_month RP and the mgmt_api_requests name,

  • change the SELECT parameters to use field count_requests and the function to sum() since we now need to add the counts up.

...

The equivalent query is: SELECT sum("count_requests") FROM "one_month"."mgmt_api_requests" WHERE $timeFilter GROUP BY time($__interval), "path" fill(0).

The graph now shows the expected correct values. You should change the Min interval to 5m to match the CQ, and to visualise that the data covers a 5 minute period.

...

More CQ examples

Number of Management API requests that ended if error code (>= 400):

CREATE CONTINUOUS QUERY mgmt_api_request_errors_5min_for_1month ON oneportal BEGIN SELECT count(auth_header_exists) AS count_requests INTO oneportal.one_month.mgmt_api_request_errors FROM oneportal.one_day.mgmt_api_request WHERE response_code >= 400 GROUP BY time(5m), api_client_id, oauth2_client_id, path END

Number of Management API requests that ended in successful code (200 to 399):

CREATE CONTINUOUS QUERY mgmt_api_request_successes_5min_for_1month ON oneportal BEGIN SELECT count(auth_header_exists) AS count_requests INTO oneportal.one_month.mgmt_api_request_successes FROM oneportal.one_day.mgmt_api_request WHERE response_code >= 200 AND response_code < 400 GROUP BY time(5m), api_client_id, oauth2_client_id, path END

Number of OpenID API requests, similar to number of Management API requests in the above example:

CREATE CONTINUOUS QUERY "openid_api_requests_5min_for_1month" ON "oneportal" BEGIN SELECT count(auth_header_exists) AS "count_requests" INTO "one_month"."openid_api_requests" FROM "openid_api_request" GROUP BY time(5m), "path", "method" END

Number of log entries:

CREATE CONTINUOUS QUERY "log_entries_5min_for_1month" ON "oneportal" BEGIN SELECT count("event_id") AS "count_entries" INTO "one_month"."log_entries" FROM "log_entry" GROUP BY time(5m), "source" END