Skip to content

Commit b55b3e2

Browse files
devkapilbansalnemesifier
authored andcommitted
[feature] Resilient sending: store data for later sending if offline #29
Closes #29
1 parent 234dfc2 commit b55b3e2

5 files changed

Lines changed: 221 additions & 61 deletions

File tree

README.rst

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,48 @@ UCI configuration options must go in ``/etc/config/monitoring``.
4040
- ``monitored_interfaces``: interfaces that needs to be monitored, defaults to ``*`` for all interfaces.
4141
- ``interval``: time after which device data should be sent to server, defaults to ``300``.
4242
- ``verbose_mode``: can be enabled (set to ``1``) to ease `debugging <#debugging>`__ in case of issues, defaults to ``0`` (disabled).
43+
- ``required_memory``: available memory required to save data temporarily, defaults to ``0.05`` (5 percent).
44+
- ``max_retries``: maximum number of retries in case of failures to send data to server in case of failure, defaults to ``5`` retries.
45+
46+
In case, `maximum retries are reached <#send-mode>`_, agent will try sending data again in next cycle.
47+
48+
Collecting vs Sending
49+
---------------------
50+
51+
We use two procd services in `monitoring agent <https://github.com/openwisp/openwrt-openwisp-monitoring/blob/master/openwrt-openwisp-monitoring/files/monitoring.agent>`_, one for collecting the data and other for sending the data.
52+
53+
This helps handle failure in sending the data in more flexible way. Old data saved during network connectivity issues can be sent while new data is being collected. If old data has piled up and takes several minutes to be uploaded, new data will be collected without waiting for the sending to complete.
54+
55+
Monitoring agent uses two different modes to handle this, ``send`` and ``collect``.
56+
57+
Collect Mode
58+
~~~~~~~~~~~~
59+
60+
If openwisp_monitoring agent is called with this mode, then the agent will keep charge of collecting and saving data.
61+
62+
Agent will periodically check if enough memory is available. If true, data will be collected and saved in temporary storage with the timestamp (in UTC timezone).
63+
64+
Once the data is saved, a signal will be sent to the other agent to ensure data is sent as soon as it is collected.
65+
66+
**Note:** Date and time on device should be set correctly. Otherwise, data will be saved with wrong timestamp in timeseries database.
67+
68+
Send Mode
69+
~~~~~~~~~
70+
71+
If openwisp_monitoring agent is called with this mode, then the agent will keep charge of sending data.
72+
73+
Agent will check if any data file is available in temporary storage.
74+
75+
If there is no data file, the agent will sleep for the time interval and check for the data file again. This will be continued until a data file is found.
76+
If a signal is received from the other agent, then the sleep will be interrupted and agent will start sending data.
77+
78+
If agent fails to send data to the server, an exponential backoff will be used to retry until `max_retries` is reached.
79+
If all attempts of sending data failed, the agent will try to send data in the next cycle.
80+
81+
If data is sent successfully, then the data file will be deleted and agent will look for another file.
82+
83+
**SIGUSR1** signals are used to instantly send the data when collected. However, the service will keep trying
84+
to send data periodically.
4385

4486
Compiling openwrt-openwisp-monitoring
4587
-------------------------------------
@@ -96,7 +138,6 @@ you will need to select the *openwisp-monitoring* variant and *netjson-monitorin
96138
./scripts/feeds update -a
97139
./scripts/feeds install -a
98140
make menuconfig
99-
# go to Base system, then select rpcd
100141
# go to Administration > admin > openwisp and select the packages you need interactively
101142
make tools/install
102143
make toolchain/install
@@ -124,6 +165,7 @@ If you are in that doubt openwisp-monitoring is running at all or not, you can c
124165

125166
You should see something like::
126167

168+
2712 root 1224 S /bin/sh /usr/sbin/openwisp_monitoring --interval 300 --monitored_interfaces ...
127169
2713 root 1224 S /bin/sh /usr/sbin/openwisp_monitoring --url http://192.168.1.195:8000 ...
128170

129171
You can inspect the version of openwisp-monitoring currently installed with::
Lines changed: 160 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/bin/sh
22

3-
VERSION=0 #default value of version
3+
VERSION=0 # default value of version
44
while [ -n "$1" ]; do
55
case "$1" in
66
--version|-v) export VERSION=1; break;;
@@ -10,7 +10,10 @@ while [ -n "$1" ]; do
1010
--verify_ssl) export VERIFY_SSL="$2"; shift;;
1111
--interval) export INTERVAL="$2"; shift;;
1212
--monitored_interfaces) export MONITORED_INTERFACES="$2"; shift;;
13-
--verbose_mode) export verbose_mode=$2; shift;;
13+
--verbose_mode) export VERBOSE_MODE="$2"; shift;;
14+
--required_memory) export REQUIRED_PERCENT="$2"; shift;;
15+
--mode) export MODE="$2"; shift;;
16+
--max_retries) export MAX_RETRY="$2"; shift;;
1417
-*)
1518
echo "Invalid option: $1"
1619
exit 1
@@ -26,76 +29,178 @@ if [ "$VERSION" -eq "1" ]; then
2629
exit 0
2730
fi
2831

29-
VERIFY_SSL=${VERIFY_SSL:-0}
3032
INTERVAL=${INTERVAL:-300}
31-
MONITORED_INTERFACES=${MONITORED_INTERFACES:-*}
32-
verbose_mode=${verbose_mode:-0}
33+
VERBOSE_MODE=${VERBOSE_MODE:-0}
34+
TMP_DIR="/tmp/openwisp/monitoring"
3335

34-
if [ -z "$BASE_URL" ]; then
35-
logger -s "missing required --url option" \
36-
-t openwisp_monitoring \
37-
-p daemon.err
38-
exit 1
39-
fi
36+
echoerr() { echo "$@" 1>&2; }
4037

41-
if [ -z "$UUID" ]; then
42-
logger -s "missing required --uuid option" \
43-
-t openwisp_monitoring \
44-
-p daemon.err
45-
exit 1
46-
fi
38+
check_available_memory(){
39+
total=$(ubus call system info | jsonfilter -e '@.memory.total')
40+
available=$(ubus call system info | jsonfilter -e '@.memory.available')
41+
required=$(echo - |awk -v percent="$REQUIRED_PERCENT" -v total="$total" '{printf("%.f",percent*total)}')
4742

48-
if [ -z "$KEY" ]; then
49-
logger -s "missing required --key option" \
50-
-t openwisp_monitoring \
51-
-p daemon.err
52-
exit 1
53-
fi
54-
55-
# Remove double quotes from interfaces
56-
MONITORED_INTERFACES=$(echo "$MONITORED_INTERFACES" | tr -d '"')
57-
58-
URL="$BASE_URL/api/v1/monitoring/device/$UUID/?key=$KEY"
59-
60-
CURL_COMMAND="curl -s -w "%{http_code}""
61-
[ "$VERIFY_SSL" = "0" ] && CURL_COMMAND="$CURL_COMMAND -k"
62-
[ "$verbose_mode" = "1" ] && CURL_COMMAND="$CURL_COMMAND -v"
43+
if [ "$available" -ge "$required" ]; then
44+
echo "0"
45+
else
46+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "Not enough memory available" \
47+
-p daemon.err
48+
echo "1"
49+
fi
50+
}
6351

6452
collect_data(){
6553
n=0
66-
[ "$verbose_mode" = "1" ] && logger -s "Collecting NetJSON Monitoring data" \
67-
-p daemon.info
54+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "Collecting NetJSON Monitoring data" \
55+
-p daemon.info
6856
until [ "$n" -ge 5 ]
6957
do
7058
echo "$(/usr/sbin/netjson_monitoring "$MONITORED_INTERFACES")" && break
7159

7260
if [ "$n" -eq 5 ]; then
73-
[ "$verbose_mode" = "1" ] && logger -s "Collecting data failed!" \
74-
-p daemon.err
61+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "Collecting data failed!" \
62+
-p daemon.err
7563
fi
7664
n=$((n+1))
7765
sleep 5
7866
done
7967
}
8068

81-
while true
82-
do
83-
data="$(collect_data)"
84-
#send data
85-
response_code=$($CURL_COMMAND -H "Content-Type: application/json" \
86-
-d "$data" \
87-
-v "$URL")
88-
if [ "$response_code" = "200" ]; then
89-
[ "$verbose_mode" = "1" ] && logger -s "Data sent successfully." \
90-
-p daemon.info
91-
else
92-
logger -s "Data not sent successfully. Response code is $response_code" \
93-
-t openwisp_monitoring \
94-
-p daemon.err
69+
set_url_and_curl(){
70+
if [ -z "$BASE_URL" ]; then
71+
echoerr "missing required --url option"
72+
exit 1
73+
fi
9574

96-
[ "$verbose_mode" = "0" ] && logger -s "Run with verbose mode to find more." \
97-
-t openwisp_monitoring \
98-
-p daemon.err
75+
if [ -z "$UUID" ]; then
76+
echoerr "missing required --uuid option"
77+
exit 1
9978
fi
100-
sleep "$INTERVAL" & wait $!
101-
done
79+
80+
if [ -z "$KEY" ]; then
81+
echoerr "missing required --key option"
82+
exit 1
83+
fi
84+
85+
URL="$BASE_URL/api/v1/monitoring/device/$UUID/?key=$KEY"
86+
87+
CURL_COMMAND="curl -s -w "%{http_code}""
88+
[ "$VERIFY_SSL" -eq "0" ] && CURL_COMMAND="$CURL_COMMAND -k"
89+
[ "$VERBOSE_MODE" -eq "1" ] && CURL_COMMAND="$CURL_COMMAND -v"
90+
MAX_RETRIES=${MAX_RETRIES:-5}
91+
FAILING=0
92+
return 0
93+
}
94+
95+
save_data() {
96+
while true
97+
do
98+
memory_available="$(check_available_memory)"
99+
if [ "$memory_available" -eq "0" ]; then
100+
data="$(collect_data)"
101+
file_name="$(date -u +'%d-%m-%Y_%H:%M:%S')"
102+
# make directory
103+
mkdir -p "$TMP_DIR"
104+
# save data with file_name
105+
echo "$data" > "$TMP_DIR/$file_name"
106+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "Data saved temporarily" \
107+
-p daemon.info
108+
fi
109+
# get process id of the process sending data
110+
pid=$(pgrep -f "openwisp_monitoring.*--mode send")
111+
kill -SIGUSR1 "$pid"
112+
sleep "$INTERVAL"
113+
done
114+
}
115+
116+
handle_sigusr1() {
117+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "SIGUSR1 received! Sending data" \
118+
-p daemon.info
119+
return 0
120+
}
121+
122+
send_data() {
123+
while true
124+
do
125+
for file in "$TMP_DIR"/*
126+
do
127+
if [ ! -f "$file" ]; then
128+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "No data file found to send." \
129+
-p daemon.info
130+
trap handle_sigusr1 USR1
131+
# SIGUSR1 signal received, interrupt sleep and continue sending data
132+
sleep "$INTERVAL" & wait $!
133+
continue
134+
fi
135+
trap "" USR1
136+
basefilename=${file##*/}
137+
filename=${basefilename%.*}
138+
# extra zeroes are added for nanoseconds precision
139+
url="$URL&time=$filename.000000"
140+
# retry sending data in case of failure
141+
failures=0
142+
timeout=1
143+
# check if the data is latest or old one
144+
[ "$(echo "$TMP_DIR"/* | awk '{print $2}')" ] || url="$url&current=true"
145+
while true
146+
do
147+
if [ "$failures" -eq "$MAX_RETRIES" ]; then
148+
if [ "$VERBOSE_MODE" -eq "1" ]; then
149+
logger -s "Data not sent successfully. Response received is $response_code" \
150+
-p daemon.err
151+
elif [ "$FAILING" -eq "0" ]; then
152+
FAILING=1
153+
logger -s "Data not sent successfully. Response received is $response_code" \
154+
"Run with verbose mode to find more." \
155+
-t openwisp_monitoring \
156+
-p daemon.err
157+
fi
158+
break
159+
fi
160+
# send data
161+
data=$(cat "$file")
162+
response_code=$($CURL_COMMAND -H "Content-Type: application/json" -d "$data" "$url")
163+
if [ "$response_code" = "200" ]; then
164+
if [ "$VERBOSE_MODE" -eq "1" ]; then
165+
logger -s "Data sent successfully." \
166+
-p daemon.info
167+
elif [ "$FAILING" -eq "1" ]; then
168+
logger -s "Data sent successfully" \
169+
-t openwisp_monitoring \
170+
-p daemon.info
171+
FAILING=0
172+
fi
173+
# remove saved data
174+
rm "$file"
175+
break
176+
else
177+
timeout=$((timeout*2))
178+
[ "$VERBOSE_MODE" -eq "1" ] && logger -s "Data not sent successfully. Retrying in $timeout seconds" \
179+
-p daemon.warn
180+
failures=$((failures+1))
181+
sleep "$timeout"
182+
fi
183+
done
184+
# retry sending same data again in next cycle
185+
[ "$failures" -eq "$MAX_RETRIES" ] && break
186+
done
187+
done
188+
}
189+
190+
if [ -z "$MODE" ]; then
191+
echoerr "missing required --mode option"
192+
exit 1
193+
fi
194+
195+
if [ "$MODE" = "collect" ]; then
196+
MONITORED_INTERFACES=${MONITORED_INTERFACES:-*}
197+
# remove double quotes from interfaces
198+
MONITORED_INTERFACES=$(echo "$MONITORED_INTERFACES" | tr -d '"')
199+
save_data
200+
elif [ "$MODE" = "send" ]; then
201+
VERIFY_SSL=${VERIFY_SSL:-0}
202+
set_url_and_curl && send_data
203+
else
204+
echoerr "The supplied mode is invalid. Only send and collect are allowed"
205+
exit 1
206+
fi

openwrt-openwisp-monitoring/files/monitoring.config

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,5 @@ config monitoring 'monitoring'
22
option monitored_interfaces '*'
33
option interval '300'
44
option verbose_mode '0'
5+
option required_memory '0.05'
6+
option max_retries '5'

openwrt-openwisp-monitoring/files/monitoring.init

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ start_service() {
5555
config_get monitored_interfaces monitoring monitored_interfaces "*"
5656
config_get interval monitoring interval "300"
5757
config_get_bool verbose_mode monitoring verbose_mode "0"
58+
config_get required_memory monitoring required_memory "0.05"
59+
config_get max_retries monitoring max_retries "5"
5860

5961
interval="$(time_to_seconds "$interval")"
6062
if [ "$interval" -lt 1 ]; then
@@ -66,13 +68,21 @@ start_service() {
6668
interval="--interval $interval"
6769
monitored_interfaces="--monitored_interfaces \"$monitored_interfaces\""
6870
verbose="--verbose_mode $verbose_mode"
71+
required_memory="--required_memory $required_memory"
72+
max_retries="--max_retries $max_retries"
6973

70-
procd_open_instance "openwisp_monitoring_monitoring"
71-
procd_set_param command $PROG $base_url $uuid $key $verify_ssl $interval $monitored_interfaces $verbose
74+
procd_open_instance "openwisp_monitoring_collect_data"
75+
procd_set_param command $PROG $interval $monitored_interfaces $verbose $required_memory --mode collect
7276
procd_set_param respawn "${respawn_threshold:-3600}" "${respawn_timeout:-5}" "${respawn_retry:-5}"
73-
[ "$verbose_mode" = "1" ] && procd_set_param stdout 1
74-
[ "$verbose_mode" = "1" ] && procd_set_param stderr 1
77+
[ "$verbose_mode" -eq "1" ] && procd_set_param stdout 1 && procd_set_param stderr 1
7578
procd_close_instance
79+
80+
procd_open_instance "openwisp_monitoring_send_data"
81+
procd_set_param command $PROG $base_url $uuid $key $verify_ssl $interval $verbose $max_retries --mode send
82+
procd_set_param respawn "${respawn_threshold:-3600}" "${respawn_timeout:-5}" "${respawn_retry:-5}"
83+
[ "$verbose_mode" -eq "1" ] && procd_set_param stdout 1 && procd_set_param stderr 1
84+
procd_close_instance
85+
7686
logger -s "$PROG_NAME started" \
7787
-t openwisp_monitoring \
7888
-p daemon.info

runbuild

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ BUILD_DIR=${BUILD_DIR-./build}
1111
DOWNLOADS_DIR=${DOWNLOADS_DIR-./downloads}
1212
START_TIME=${START_TIME-$(date +"%Y-%m-%d-%H%M%S")}
1313
VERSIONED_DIR="$DOWNLOADS_DIR/$START_TIME"
14+
COMPILE_TARGET=${COMPILE_TARGET-mips_24kc}
1415
LATEST_LINK="$DOWNLOADS_DIR/latest"
1516
CORES=${CORES:-1}
1617
CURRENT_DIR=$(pwd)
@@ -52,7 +53,7 @@ fi
5253

5354
make -j$CORES package/openwrt-openwisp-monitoring/compile || exit 1
5455

55-
mv $BUILD_DIR/openwrt/bin/packages/mips_24kc/openwisp $VERSIONED_DIR
56+
mv $BUILD_DIR/openwrt/bin/packages/$COMPILE_TARGET/openwisp $VERSIONED_DIR
5657

5758
rm $LATEST_LINK || true
5859
ln -s $VERSIONED_DIR $LATEST_LINK

0 commit comments

Comments
 (0)