Prometheus generic HTTP service discovery

Prometheus has recently added support for a generic service discovery over the network, http_sd. This blog post will explain why and how it was implemented, and go over the functionality itself.

Service Discovery

Service discovery is one of the major features of Prometheus. As a monitoring tool, being up to date with the infrastructure is critical. Configuring targets manually and keeping the list up to date is a tedious work; not doing it creates a partial view of the infrastructure and alert fatigue.

That is why Prometheus can take away this task and use sources of truth instead. At the time of writing this blog post, it supports 22 service discovery mechanisms.

Over the last year, 10 new service discovery mechanisms were added. We keep adding new ones, depending on different rules. When accepting new service discoveries, we look at community interest, technical details, and how we can, as Prometheus maintainers, develop and support the service discovery in the future.

We can not support all the service discovery mechanisms. In particular, we can’t maintain service discoveries we don’t have access to. In that category, you would find your in-house closed-source CMDB, and cloud providers who do not provide an open-source tier.

Another thing that can block us from merging new service discoveries is the technical difficulties. An example is netbox. We have got multiple requests in the past to have a netbox service discovery, but when I looked into the implementation, the go bindings were not recommended by the netbox community. Therefore, a native implementation was quite difficult.

File-based service discovery

For the service discovery mechanisms that are not integrated natively in Prometheus, the solution has always been file_sd. File SD is a service discovery based on files.

Prometheus can read YAML and JSON files and update its targets accordingly. This is a powerful way to configure Prometheus. This method has a few advantages:

  • You can generate the file the way you want, e.g. with configuration management software, or with dedicated sidecars.

  • Inotify makes this approach event-based. As soon as the file changes, we can pick up the changes.

And, over the years, many people developed side cars that enables a rich SD integration with Prometheus.

Drawbacks of file service discovery

The file-based service discovery mechanism has two major drawbacks: the first one is that in most cases you need an extra process running next to your Prometheus server to generate the file.

The second one is that this sidecar must share a filesystem with the Prometheus server.

Those drawbacks did not stop the implementation of service discoveries, but in some cases, we have seen people mocking other networking service discoveries to avoid this situation. This is obviously not recommended.

As Prometheus matures, it was time to add a generic network-based service discovery.

http_sd

The HTTP Service Discover enables the discovery of targets over the HTTP protocol. The discovery source has to expose targets over an HTTP endpoint.

This approach lifts some of the limitations of the file_sd. The sources of truth do not need to share a filesystem with Prometheus, and therefore sidecars are not needed.

It also uses the Prometheus HTTP client, which means that we can use features like Authentication (Basic, Bearer Token, OAuth2, Client certificate), TLS and HTTP proxy.

Format

The HTTP Service Discovery body format is the same as the file_sd JSON format:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

Which translates to:

[
    {
        "targets": ["10.0.10.2:9100", "10.0.10.3:9100"],
        "labels": {
            "__meta_datacenter": "london"
        }
    },
    {
        "targets": ["10.0.40.2:9100", "10.0.40.3:9100"],
        "labels": {
            "__meta_datacenter": "london"
        }
    }
]

This is actually a list of groups, and you can have multiple groups of one item each if you want more detailed labels:

[
    {
        "targets": ["10.0.10.2:9100"],
        "labels": {
            "datacenter": "london",
            "hostname": "frontend03"
        }
    },
    {
        "targets": ["10.0.10.4:9100"],
        "labels": {
            "datacenter": "london",
            "hostname": "frontend04"
        }
    }
]

Note that this last example does not have the prefix __meta on the labels. The prefix is not mandatory, but I would recommend using it if you publish a software for others to use, and let users use relabeling to extract the labels they need. For internal projects, it’s fine to remove the __meta prefix.

If no targets are discoverable, you can return an empty JSON list: [].

Prometheus configuration

On the Prometheus server, it is simply needed to indicate the URL of your endpoint, in an http_sd_configs section:

scrape_configs:
- job_name: mycmdb
  http_sd_configs:
  - url: http://mycmdb.internal/prometheus-sd-targets

You can easily add authentication and use TLS:

scrape_configs:
- job_name: mycmdb
  http_sd_configs:
  - url: https://mycmdb.internal/prometheus-sd-targets
    basic_auth:
        username: prometheus
        password: changeme

Implementation

When implementing such a service discovery, there is a few things you should know.

You must send the Content-Type: application/json HTTP Header. This ensures that we do not try to decode arbitrary endpoints.

Every response should contain all the targets. We do not cache targets across restarts. You could cache them in your endpoint if it costs too much to recompute them each time. The HTTP response code should be HTTP 200.

As explained above, we support multiple authentication mechanisms. OAuth 2, Basic Authentication, Bearer token, Client certificate. The URL of the http_sd, however, is not considered secret. Please use an existing authentication mechanism to secure your endpoint.

Differences with file_sd

There are a few differences between http_sd and file_sd. Both services discoveries are supported and valid.

file_sd supports YAML in addition to JSON. http_sd is limited to JSON.

Inotify makes file_sd “event-based”, meaning we would update the targets as soon as the file has changed. http_sd just polls at regular intervals.

When should you use http_sd or file_sd?

I expect that http_sd will at some point become a new point of Prometheus integration for third party software. It is convenient, and does not require extra binaries to sit in the same filesystem as Prometheus.

It is a great option when your service discovery has not been accepted in Prometheus itself. We can’t have every service discovery in Prometheus because it causes a lot of work, and we need to have access to supported service discoveries to maintain them in the long term.

It’s also a viable option if your service discovery source is not written in go. Adding an HTTP endpoint directly within your application can be a lot better than trying to create mappers in go.

The last use case is when you want to combine exporters and service discovery. This is a new idea that is likely to be experimented soon.

Community showcase

I tried to list some early implementations I’ve found in Github.

Prometheus vCD SD is a community built Service Discovery mechanism for VMWare vCloud Director. This is an interesting one because it supports both file_sd and http_sd. Because we made file_sd and http_sd format the same, in order to support HTTP SD, the discovery was adapted to serve its file_sd files over HTTP.

netbox-plugin-prometheus-sd is a Plugin that can be installed in your netbox installation to add an SD endpoint to netbox. This is much more convenient for the Netbox community than the file_sd sidecar, as plugins are quite the norm in the netbox ecosystem.

fastly-exporter might gain the ability to tell Prometheus its targets of interest. This would help to spread the queries of the different services, rather than doing one big scrape.

Conclusion

http_sd is mainly the JSON file_sd over HTTP. We did not create a new, fancy service discovery mechanism. Instead, we have built a pragmatic solution, which is easy to use and understand. It will help a lot of people, and open new possibilities for Prometheus users.