WebFaction
Community site: login faq

Based on this example I've set up a cron job as follows:

*/5 * * * * $HOME/cron/watchdog.sh > $HOME/cron/watchdog.log 2>&1

It works, running the following script every 5 minutes:

#!/bin/bash mkdir -p "$HOME/tmp" PIDFILE="$HOME/tmp/sixty.pid" if [ -e "${PIDFILE}" ] && (ps -u $(whoami) -opid= | grep -P "^\s*$(cat ${PIDFILE})$" &> /dev/null); then echo "Already running." exit 99 fi $HOME/gosrc/src/sixty/sixty > $HOME/tmp/sixty.log & echo $! > "${PIDFILE}" chmod 644 "${PIDFILE}"

There is no extraneous white space and each line is terminated with \n (LF).

The problem is that the if condition never evaluates to true. An empty watchdog.log file is created every five minutes which I assume indicates that the job was launched successfully.

New sixty.pid and sixty.log files are created in /home/tmp indicating that the if condition evaluated to false. There is a different value in the .pid file every time. It is never the PID of the running sixty application. The .log file is always the same except for the timestamps:

2014/11/27 04:50:02 [I] Running on :12491 2014/11/27 04:50:02 [C] Admin ListenAndServe: %!(EXTRA *net.OpError=listen tcp 127.0.0.1:8088: bind: address already in use) 2014/11/27 04:50:02 [C] ListenAndServe: %!(EXTRA *net.OpError=listen tcp :12491: bind: address already in use)

Is there a way to get the if condition to evaluate to true? If not, is it OK to leave it as is? I assume that should the application stop running, the script would restart it as intended.

asked 27 Nov '14, 06:10

emadera52
116
accept rate: 0%

edited 27 Nov '14, 06:15

Whether or not it's actually OK to leave it as-is depends on how the application itself handles the situation when it's started and another instance of itself is already running.

Perhaps it tries to bind to a port, which is in use, and therefore fails -- exiting gracefully. If that's true, then sure, it's okay to leave as-is: the watchdog script will start it when it's not running. But in that case, it's extraneous to even use the watchdog script at all: just start the application every 5 minutes from cron instead, and rely on its graceful failure.

If instead it really is launching a new process every five minutes, so that you have many of these processes piling up over time, then that's a serious problem, and you'll want to fix it. For hints on how to do that, see my Answer below.

(27 Nov '14, 06:56) ryans ♦♦

The way that the watchdog script works is that, every time it is run, it looks at your process listing and sees if you have a running process with the same PID as recorded in the "PIDFILE" file. If you have a matching running process, it concludes that the process is already running. Otherwise, it starts a new process.

Therefore, the intended behavior is that the if condition evaluates to true when the process is already running. If that's not happening, there could be a couple of reasons:

  • Is the process you're starting a SUID/SGID executable ("ssh-agent" is a common example)? If it is, then that process won't show up in the "ps -u $(whoami)" process listing, and the watchdog script won't ever find the process when it looks for it to see if it's already running. (This is intended behavior for hidepid in the linux kernel).

  • Does the process you're starting actually itself fork and exec a new process? If so, then the PID that the watchdog script is looking for won't actually be the same PID that it's recording into the PIDFILE file. In other words, the real PID of the running process can't be found by simply looking at $!.

In either of these cases, the solution is to rely on the real subprocess to manage its own PID file, and then have the watchdog script read that file instead of making its own PIDFILE from "$!".

Another (bit more advanced) approach is to record the current process's PID from "$$" to PIDFILE, and then directly "exec" the child process.

In summary, the way to debug this is to start by looking at the PIDFILE and seeing if the PID recorded there does indeed match the one you see for the running process in the "ps -u $(whoami)" process listing. Hope that helps!

permanent link

answered 27 Nov '14, 06:52

ryans ♦♦
5.0k93160
accept rate: 43%

edited 27 Nov '14, 07:06

Thanks Ryan. The problem is a result of the second reason you mention. The process does show up in the "ps -u $(whoami)" process listing, but not with the PID that the watchdog script puts in PIDFILE.

The solution will be 'to rely on the real subprocess to manage its own PID file, and then have the watchdog script read that file instead of making its own PIDFILE from "$!".' In the mean time, the existing script does fail gracefully such that my process won't be instantiated again if it's already running. I suspect that it will also restart OK after a server restart... time will tell. :)

This is my first serious exposure to shell scripting, so I have a lot to learn.

(27 Nov '14, 15:06) emadera52
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×23
×17

question asked: 27 Nov '14, 06:10

question was seen: 2,359 times

last updated: 27 Nov '14, 15:06

WEBFACTION
REACH US
SUPPORT
AFFILIATE PROGRAM
LEGAL
© COPYRIGHT 2003-2019 SWARMA LIMITED - WEBFACTION IS A SERVICE OF SWARMA LIMITED
REGISTERED IN ENGLAND AND WALES 5729350 - VAT REGISTRATION NUMBER 877397162
5TH FLOOR, THE OLD VINYL FACTORY, HAYES, UB3 1HA, UNITED KINGDOM