Simple data logger under Linux

This project is about a simple data logging engine which stores values from sensors on a regular basis for monitoring purposes. The origin of the project was an application for measuring the geometry of railway tracks. On the way it turned out that the code could be reused for the surveillance of a laboratory at my workplace. This latter application is less complex and therefore better suited for demonstrating how it works.

This page does not present all the details. It's rather about the general ideas. The rest can be taken from the code.

What it is good for

Suppose a laboratory equipped with a couple of sensors which monitor important stability parameters. Here we'll taking about

a powermeter PM100 from Thorlabs Inc. which records the power of a laser beam,
a CMOS camera from Basler Inc. which captures images of the beam profile,
two laser beam stabilisers from MRC Systems GmbH,
and three combined temperature and humidity senseors from Thermotrac Inc.

The Thermotrack sensors are hooked up on the internet. All other devices are connected by USB to the machine on which the data logger is running. These devices should be read out on a regular basis and the obtained data should be stored for monitoring purposes and analysis in case of instabilities.

Some basics

The idea is to be as short as possible without having to rewrite half of the thing when a new sensor is to be added. This leads to a couple of demands.

The system should be modular, one module per device, easy to extend and to configure.
Data is coming from different sources and in different formats. To ease reading out and displaying, a common data format is needed. We'll go for SQL storage.
Some data vary quickly, like the read-out of the beam stabiliser and have to be recorded rather often. Others like the room temperature may be recorded at larger intetrvals. This asks for a configurable scheduling system.
Data like the images from the CMOS camera may need processing before storage to reduce disk space usage. Some flexibility is necessary to accomplish such jobs from within the logger.
A further twist is due to the fact that data logging might be not necessary every now and then, and storing meaningless data should be avoided. The system should be switchable. However, Murphy's law tells us that the day something goes south, the colleague working in the lab will have forgotten to start the logger in the morning. We need an automatic detection.
Furthermore, not all devices in the lab are used all the time. Some sensors may not deliver meaningful data all the time or may even be switched off when not used. The automatic detection has therefore be able to handle disappearing and reappearing devices without needing to be restarted.

All devices came either directly with a Python interface or a such could easily be written. The decision to write the logger in Python was therefore a straight forward one. Though Python experts may see from my code that I'm not at all an expert in coding Python.

What you have to know for being able running it yourself

To run this logger yourself and on your own devices you have to know how to do a couple of things. These are essentially

basic knowledge of MySQL (or MariaDB or similar)
especially setting up a MySQL database
some basic Python knowledge (trust me, this is the largest piece of Python code I've written ever so far and it took me only a couple of days)

The implementation

Modules

Each device is accessed by its own module, residing in a separate directory. Let's say we want to read our PM100 powermeter. the module is called PM100, resising in PM100/PM100.py. The name itself can be freely chosen, however, the names of the directory and the file must be exactly this name. The file PM100.py contains all necessary code for reading out the device and storing the obtained values in the database. All preprocessing before storage should happen within the module.

Each module must provide a method for acquisition and storage. The choice of the name has no restrictions else then the general ones for Python. It may further provide methods for initialisation, clean-up and for activity checking. How to hook up a module into the logger is explained further down. Let's have a look at the code for reading out the Thorlabs powermeter.

import time
from ThorlabsPM100 import ThorlabsPM100, USBTMC

def init_pm100():
    global power_meter1
    inst1 = USBTMC(device="/dev/usbtmc0")
    power_meter1 = ThorlabsPM100(inst=inst1)

def read_pm100(db):
    power1 = power_meter1.read
    query = ("INSERT INTO `power` (`time-of-measurement`,`power`, `location`)" +
             " VALUES (%s,%s,'800/400')")
    values = (time.strftime("%Y-%m-%d %H:%M:%S"), power1)
    cursor = db.cursor()
    cursor.execute(query, values)
    db.commit()

The code contains two functions, one for the initialisation and one for the read-out. The latter inserts the acquired value into a table of the SQL database. The link to the database (the return value from mysql.connector.connect()) is provided from the main script. All device- and data-specific database code resides in the module's code. This means in turn that the logger itself is completely independent of the structure of the database. All data may go into a single table or each device may have its own table. There is, however, one important restriction. When modules are sheduled for the same time, the order in which they are called is undefined. No module can rely on data from other modules called 'at the same time'.

Configuration

The logger is configured through two configuration files. db.conf contains the database related information

[database]
  db-name = sampledb
  user = sample-logger
  pass = super-secret-password

logger.conf defines all modules and necessary parameters. It is slightly more complicated. Let's have a look at it.

[devices]
  names=device-one,device-two,external-device

[activity-check]
  module=device_one
  method=read
  period=10min
  performs-acquisition=true

[device-one]
  module=device_one
  script=read
  period=1sec
  init=init_device_one
  terminate=terminate_device_one

[device-two]
  module=device_two
  scripts=read,acquire
  periods=1sec,1min
  init=init
  stop=stop
  resume=resume

[external-device]
  path=/path/to/external
  program=read-external
  arguments=-o some-argument -f some-other-argument

The devices section lists the names of all devices to be considered. There is a slight redundancy here since the device names could be recovered from the following sections without predefined name. The approach chosen here permits to easily take out devices which are temporarily unavailable. Any section without predefined name and not listed in the devices section is simply ignored.

The section activit-check permits to define a module which checks whether logging should be active. The module must be associated to one of the devices defined before. The frst three lines should be self-explaining while the last one deserves a comment. Presume that the logger is inactive at present. Accordingly, device_one.read_one() will be called every ten minutes. When it returns True, logging is started. The fourth parameter tells whether upon a positive activity check the required data set has already been entered into the database. In that case the first acquisition round will leave out the corresponding method call.

The meaning of the entries of the two devices should be fairly clear. Note that per device an arbitrary number of methods can be called. The keywords stop and resume assign methods for interrupting and resuming the logging process.

The section external-device points to an external executable which will be started in a separate process whenever the logger is started. These programs are ment to record data independently when Python's overhead slows down the acquisition process too much. These programs have to implement handlers for the signals SIGUSR1, SIGUSR2 and SIGTERM. The logger will send the first two when halting and resuming the logging, respectively. The last one is send before quitting and permits the external processes to terminate gracefully.

Main file

The gory details about sheduling, stopping and resuming are implemented in logger-main.py. I'm not going to walk through the code. The details of the implementation do not matter for configuration and module-extension.

Note that more than one data logger instance may access the same database in parallel. Likewise it can be accessed by any other client operating.

Getting it up and running

You'll need a linux machine with some flavor of mysql and Python running. Most of the code will also run as-is on a Windows machine, except the external programs which relay on Unix-style inter-process communication. The packages for accessing the devices have to be downloaded from the providers' websites. If no driver is provided, provide it yourself.

Running logger-main.py from the command line or from Python's console will throw a lot of debugging messages at you. When started by calling loggerd, the logger process will detach from the actual shell and continue to run as independent process after a logout. It can be stopped by sending it a SIGTERM signal or by calling loggerd stop. A configuration reload without stopping the logger is not possible. The logger writes a message to syslog when starting, halting, resuming or stopping activity.

And now what?

Now we've filled our database with lots of data. To make use out of it, some other piece of software is needed. From a home-brew visualisation to existing data treatment packages all is possible. In a further chapter I'll show how to browse through and play with the data in an online dashboard application.