Nagios Plugin State Retention Routines

Posted by tonvoon on 16 June 2010 - 2:42pm in

The aim is to create a set of library routines that can be used for saving state information between invocations of a plugin. This way, it is possible to calculate the rate of change and provide threshold calculations on this, rather than just the current state.

This is based on a patch submitted by Alain Williams, Nagios::Plugin::Differences by Jose Luis Martinez and + comments on the mailing list (see references)

Lots of discussion between Holger and I ended up with this.

Terms

  • Location - use ./configure --sharedstatedir to define, default $PREFIX/var. Override with NAGIOS_PLUGIN_STATE_DIRECTORY envvar at runtime if set. Add plugin name to end

  • Key - Is used as the filename of the store. Default to state.dat. Recommend that this is set to the string returned by np_state_generate_key(), to be unique per plugin call. Key can only consist of alphanumerics and underscore

Format

Example format:

# NP state file
1 [state file version number]
{state data version number}
{time}
{data}

Structs

np_state_key

char *name
char *plugin_name
int data_version
char *_filename

np_state_data

time_t time
void *data
int length (of binary data)

Calls

(char *) np_state_generate_key(argv)

Returns a string to use as a keyname, based on an md5 hash of argv, thus hopefully a unique key per service/plugin invocation. Use the extra-opts parse of argv, so that uniqueness in parameters are reflected there.

(np_state_key *) np_state_init(pluginname, keyname, state data version)

Sets variables. Generates filename. Returns np_state_key. die with UNKNOWN if exception

np_state_read(np_state_key)

Returns np_state_data. Will return NULL if no data is available (first run). If key currently exists, read data. If state file format version is not expected, return as if no data. Get state data version number and compares to expected. If numerically lower, then return as no previous state. die with UNKNOWN if exceptional error.

np_state_write_string(np_state_key,time,string)

If time=NULL, use current time. Create state file, with state format version, default text. Writes version, time, and data. Avoid locking problems - use mv to write and then swap. Possible loss of state data if two things writing to same key at same time.

np_state_write_binary(np_state_key,time,start,length)

Same as np_state_write_string, but writes binary data

np_state_data_cleanup(np_state_data)

Cleanup

np_state_key_cleanup(np_state_key)

Cleanup

Notes

  • All opens and close within these functions, retaining atomicity
  • libtap tests for library
  • Update dev guidelines with library usage
  • This has problems if a remote host is checked from different Nagios instances
  • binary data may not restore on a program compiled with different options from the program that saved it, eg 32 or 64 bit
  • binary data may include a structure containing a pointer. Pointer values may not be used in the reading program - ie you need to overwrite the value with something malloced in the current run of the program
  • State files could be left lying around. We recommend you run a regular job to remove unmodified state files older than 1 week

References