Event listeners with failure ratio recover after first successful run
Assignee

Reporter

Sprint
None
Description
Steps to reproduce
None
Attachments
1
Activity
Show:

Filip Vaško October 19, 2020 at 8:49 AMEdited
Testing.
once the failure ratio is met and the listener has run at least 3 times (), the listener is marked as failing
create a failing File Event Listener (failing file check), let it fail a few times, manually reset the listener state - the failures are now still aggregated under the same old row in the task_log
table, but the failure ratio is still calculated correctly
successful checks (e.g. file checks of a file event listener) do not change the state to OK
the listener state changes to OK once the failure ratio dips below the configured threshold value
Tested on both standalone and cluster. Closing.
(server-CLO-19879 #12)
Some types of listeners (File, JMS, Universal) configured to be marked as failing when the failure ratio exceeds some threshold are incorrectly marked as working again immediately after the first successful execution.
Example:
File event listener periodically checks a remote directory for new files. It is configured with a 10% failure ratio.
Due to a network outage, the number of failures exceeds 10% and the listener is marked as failing.
When the connection is restored:
Expected behavior: The listener should stay failing until the following successful executions change the failure ratio back below 10%.
Actual behavior: The listener is incorrectly marked as working after the first successful run.
Affected types of event listeners:
File listener
JMS message listener
Universal listener
Fix test com.cloveretl.server.test.events.file.*ListenerChangeStatusTest.