MSR metrics exposed for Prometheus¶
Comprehensive detail on all of the metrics exposed by MSR is provided herein. For specific key metrics, refer to the Usage information, which offers valuable insights on interpreting the data and using it to troubleshoot your MSR deployment.
Registry metrics¶
Registry metrics capture essential MSR functionality, such as repository count, tag count, push events, and pull events.
Metrics often incorporate labels to differentiate specific attributes of the measured item. The table below provides a list of possible values for the labels associated with registry metrics:
Label |
Possible values |
---|---|
|
Namespace name |
|
Repository name |
repos¶
Description |
Current number of repositories |
---|---|
Metric type |
Gauge |
Labels |
None |
public_repos¶
Description |
Current number of public repositories |
---|---|
Metric type |
Gauge |
Labels |
None |
private_repos¶
Description |
Current number of private repositories |
---|---|
Metric type |
Gauge |
Labels |
None |
pull_count¶
Description |
Running total of image pulls |
---|---|
Metric type |
Counter |
Labels |
None |
pull_count_per_repo¶
Description |
Running total of image pulls per repository |
---|---|
Metric type |
Counter |
Labels |
|
push_count¶
Description |
Running total of image pushes |
---|---|
Metric type |
Counter |
Labels |
None |
push_count_per_repo¶
Description |
Running total of image pushes per repository |
---|---|
Metric type |
Counter |
Labels |
|
pruning_policy_enabled_repos¶
Description |
Current number of repositories for which at least one pruning policy is enabled |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
To assess whether pruning policy usage should be increased across your cluster, compare this number with the total number of repositories. |
Mirroring metrics¶
Mirroring metrics track the number of push and pull mirroring jobs, categorized by job status.
Considered as a whole, these metrics offer real-time insights into the
performance of your mirroring jobs. For example, when you observe a
simultaneous decrease in poll_mirror_running
and an increase in
poll_mirror_done
, this provides immediate assurance that your poll
mirroring configuration is functioning properly.
poll_mirror_waiting¶
Description |
Current number of poll mirroring jobs with a ‘waiting’ status |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
If there is a significant number of poll mirroring jobs in the
|
poll_mirror_running¶
Description |
Current number of poll mirroring jobs with a ‘running’ status |
---|---|
Metric type |
Gauge |
Labels |
None |
poll_mirror_done¶
Description |
Running total of poll mirroring jobs with a ‘done’ status |
---|---|
Metric type |
Counter |
Labels |
None |
poll_mirror_errored¶
Description |
Running total of poll mirroring jobs with an ‘errored’ status |
---|---|
Metric type |
Counter |
Labels |
None |
Usage |
If there is a sudden surge in the number of poll mirroring jobs in the
|
push_mirror_waiting¶
Description |
Current number of push mirroring jobs with a ‘waiting’ status |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
If there is a significant number of push mirroring jobs in the
|
push_mirror_running¶
Description |
Current number of push mirroring jobs with a ‘running’ status |
---|---|
Metric type |
Gauge |
Labels |
None |
push_mirror_done¶
Description |
Running total of push mirroring jobs with a ‘done’ status |
---|---|
Metric type |
Counter |
Labels |
None |
push_mirror_errored¶
Description |
Running total of push mirroring jobs with an ‘errored’ status |
---|---|
Metric type |
Counter |
Labels |
None |
Usage |
If there is a sudden surge in the number of push mirroring jobs in the
|
Authentication metrics¶
Authentication metrics monitor the count of CLI logins and active web UI sessions.
cli_login_count¶
Description |
Running total of CLI logins made |
---|---|
Metric type |
Counter |
Labels |
None |
Usage |
If you observe a sharp decline in CLI logins, investigate the Garant logs to troubleshoot the issue. |
ui_sessions¶
Description |
Current number of active user interface sessions |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
If you observe a sharp decline in active UI sessions, investigate the eNZi logs to troubleshoot the issue. |
RethinkDB metrics¶
The metrics for RethinkDB are extracted from the system statistics and current issues tables, providing a broad range of information about your RethinkDB deployment.
Metrics often incorporate labels to differentiate specific attributes of the measured item. The table below provides a list of possible values for the labels associated with RethinkDB metrics:
Label |
Possible values |
---|---|
|
Database name |
|
Table name |
|
Server name |
|
|
cluster_client_connections¶
Description |
Current number of connections from the cluster |
---|---|
Metric type |
Gauge |
Labels |
None |
cluster_docs_per_second¶
Description |
Current number of document reads and writes per second from the cluster |
---|---|
Metric type |
Gauge |
Labels |
|
server_client_connections¶
Description |
Current number of client connections to the server |
---|---|
Metric type |
Gauge |
Labels |
|
server_queries_per_second¶
Description |
Current number of queries per second from the server |
---|---|
Metric type |
Gauge |
Labels |
|
server_docs_per_second¶
Description |
Current number of document reads and writes per second from the server |
---|---|
Metric type |
Gauge |
Labels |
|
table_docs_per_second¶
Description |
Current number of document reads and writes per second from the table |
---|---|
Metric type |
Gauge |
Labels |
|
Usage |
If you observe that certain tables have a high volume of reads or writes, it is advisable to evenly distribute the primary replicas associated with those tables across the RethinkDB servers. This approach ensures a balanced distribution of the cluster load, leading to improved performance across the system. |
table_rows_count¶
Description |
Current number of rows in the table |
---|---|
Metric type |
Gauge |
Labels |
|
tablereplica_docs_per_second¶
Description |
Current number of document reads and writes per second from the table replica |
---|---|
Metric type |
Gauge |
Labels |
|
tablereplica_cache_bytes¶
Description |
Table replica cache size, in bytes |
---|---|
Metric type |
Gauge |
Labels |
|
tablereplica_io¶
Description |
Table replica byte reads and writes per second |
---|---|
Metric type |
Gauge |
Labels |
|
tablereplica_data_bytes¶
Description |
Table replica size, in stored bytes |
---|---|
Metric type |
Gauge |
Labels |
|
log_write_issues¶
Description |
Current number of log write issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Log write issues refer to situations where RethinkDB encounters failures while attempting to write to its log file. Refer to System current issues table in the official RethinkDB documentation for more information. |
name_collision_issues¶
Description |
Current number of name collision issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Name collision issues arise when multiple servers, databases, or tables within the same database are assigned identical names. Refer to System current issues table in the official RethinkDB documentation for more information. |
outdated_index_issues¶
Description |
Current number of outdated index issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Outdated index issues occur when indexes that were created using an older version of RethinkDB need to be rebuilt due to changes in the indexing mechanism employed by RethinkDB Query Language (ReQL). Refer to System current issues table in the official RethinkDB documentation for more information. |
total_availability_issues¶
Description |
Current number of total availability issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Total availability issues occur when a table within the RethinkDB cluster is missing at least one replica. Refer to System current issues table in the official RethinkDB documentation for more information. |
memory_availability_issues¶
Description |
Current number of memory availability issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Memory availability issues arise when a page fault occurs on a RethinkDB server and the system starts using swap space. Refer to System current issues table in the official RethinkDB documentation for more information. |
connectivity_issues¶
Description |
Current number of connectivity issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Connectivity issues occur when certain servers within a RethinkDB cluster are unable to establish a connection or communicate with all other servers in the cluster. Refer to System current issues table in the official RethinkDB documentation for more information. |
other_issues¶
Description |
Current number of uncategorized issues |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Refer to your RethinkDB logs to diagnose the issue. Note If the number of |
table_size¶
Description |
Table size in MB |
---|---|
Metric type |
Gauge |
Labels |
|
Usage |
When a specific table in your MSR deployment grows unchecked, it may
indicate a potential issue with the corresponding functionality. For
instance, if the size of the |
Prometheus scrape metrics¶
Prometheus scrape metrics capture the duration of each metrics scrape and the number of errors returned during the process.
scrape_latency¶
Description |
Duration of metrics collection |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Elevated metrics scrape latency can serve as an indicator that additional resources should be allocated to your Prometheus server. |
scrape_errors¶
Description |
Current number of errors that occurred during metrics collection |
---|---|
Metric type |
Gauge |
Labels |
None |
Usage |
Since MSR metrics depend heavily on the use of RethinkDB, any scrape errors encountered are likely to be caused by issues related to RethinkDB itself. To diagnose and troubleshoot the problem, refer to the logs of your RethinkDB deployment. |
See also
Official Prometheus documentation: Metric Types
Official RethinkDB documentation: System statistics table
Official RethinkDB documentation: System current issues table