Logging

This page describes the information that your code should provide in all log entries it generates, and some tools fybrik provides to ensure consistency across components.

Background

Log entries should be written to stdout and stderr.
Fybrik does not collect nor aggregate logs. This may be done by external tools. (ex: logstash, fluentd, etc.)
A globally unique identifier for each FybrikApplication instance is passed to all control plane and data plane components to be included in log entries. This enables corrrelation of log entries across different logs and clusters for the specific instance, even if the name of the FybrikApplication is reused over time.

Log Entry Contents

All fybrik components, whether control plane or data plane components, should write log entries to stdout and stderr in json format. The contents of the log entries are detailed in fybrik.io/pkg/logging/logging.go.

The fybrik control plane uses zerolog for its golang components, and provides a library of fybrik specific helper functions to be used with it. Examples of how to use zerolog: https://github.com/rs/zerolog/blob/master/log_example_test.go

TBD - fybrik logging helper functions for python and java.

Log Entry Verbosity

The choice of a log level should take into account in which environments the logged information is relevant: production, testing, or development. Although the administrator can configure the verbosity as desired, the following are typical configurations for the different environments.

All environments

Errors should always be logged, and preferably with as much information as possible. To this end, the function LogStructure in in pkg/logging/logging.go converts golang structures to json for inclussion in the log. Please note that panic and fatal should be used sparingly.

panic (zerolog.PanicLevel, 5) - Errors that prevent the component from operating correctly and handling requests Ex: fybrik control plane did not deploy correctly Ex: Data plane component crashed and cannot handle requests
fatal (zerolog.FatalLevel, 4) - Errors that prevent the component from successfully completing a particular task Ex: fybrikapplication controller cannot generate a plotter Ex: Arrow/Flight server used to read data cannot access data store
error (zerolog.ErrorLevel, 3) - Errors that are not fatal nor panic, but that the user / request initiator is made aware of (typical production setting for stable solution) Ex: Dataset requested in fybrikapplication.spec is not allowed to be used Ex: Query to Arrow/Flight server used to read data returns an error because of incorrect dataset ID
warn (zerolog.WarnLevel, 2) - Errors not shared with the user / request initiator, typically from which the component recovers on its own

Production

All of the previous plus: - info (zerolog.InfoLevel, 1) - High level health information that makes it clear the overall status, but without much detail (highest level used in production)

Testing

All of the previous plus: - debug (zerolog.DebugLevel, 0) - Additional information needed to help identify problems (typically used during testing)

Development

All of the previous plus: - trace (zerolog.TraceLevel, -1) - For tracing step by step flow of control (typically used during development)

JSON Logging Standard Format

All Fybrik components should generate logging information in a standard format. This information will be used by different actors for different purposes, so as much relevant information as possible needs to be captured in a consistent format.

We list all mandatory and optional fields to be used by all Fybrik components. In addition to the fields we list, Fybrik components may include extra fields as needed.

Mandatory Fields

The fields in this section are typically generated by the logging libraries.

level - log level (‘panic’, ‘fatal’, ‘error’, ‘warn’, ‘info’, ‘debug’, or ‘trace’)
time - timestamp of the log event. Timestamps should be in ISO8601 format with time offset from UTC or timezone. Example: ‘2022-02-16T10:46:21+02:00’
caller - the code line which generated the error (file name + line number). Example: manager/main.go:319

Optional Fields

app.fybrik.io/app-uuid - unique identifier for kubernetes FybrikApplication, used to correlate log messages across components for a particular FybrikApplication instance. It is also unique over time so one may differentiate between FybrikApplication instances with the same name created at different times
message - string message for the log entry. Either this field or message_id must be included
message_id - unique identifier indicating the message string that should be used. This is used instead of a message string for messages that need to support internationalization, such as those that go to users
funcName - method or function in which the error occurred
DataSetID - unique identifier for the data set
ForUser - True if this should be shared with the end user in fybrikapplication status or events. False otherwise
ForArchive - True if this should be archived long term. For example, if it contains full contents of FybrikApplication and its status and should be stored for auditing purposes
cluster - cluster name on which the process generating the entry ran
component - name of the component generating the log entry
action - current operation being called. For example, “create_catalog” or “update_asset”
response_time - response time of the current operation in milliseconds. Can be used in monitoring dashboards such as Kibana
error – the error code or message returned to the fybrik component upon an unsuccessful action. Additional context should usually be provided in the accompanying message field

Environment Variables

LOGGING_VERBOSITY - should be set to one of the levels described in the previous section.
PRETTY_LOGGING - If true log entries are in human readable format. If false, they are in json. Should only be true during development, since json is preferred to enable easy parsing by aggregator tools.

Logging of Structures

Fybrik provides a helper function called LogStructure in pkg/logging/logging.go for writing Go structures in json format to the log. It supports different verbosity levels, and thus can be used in production, testing and development environments.