File event listener is recreated after deletion

Assignee

Reporter

Sprint

Description

Recently, some of our CTE test fail (on core, worker, cluster) because of errors logged in all.log. The errors come from a failing file event listener.
Example of the error:

2020-08-16 22:46:21,148[ (self-tuning)'] userAction INFO execute job; user=User#1:clover jobExecutionCommand=GraphExecutionCommand[node=null/null#graph=cloveretl.test.scenarios:jobflow/ExecuteGraph_stopOnFail_async_true.jbf:user=null] ipAddress=172.23.2.65 2020-08-16 22:46:21,165[duledExecutor-5] erLivecycleAware WARN File event listener check failed com.cloveretl.server.a.a.a.a: java.lang.IllegalStateException: java.io.IOException: Directory listing failed at com.cloveretl.server.events.b.b.checkFiles(Main:161) ~[cs.jar:?] at com.cloveretl.server.events.listeners.livecycle.FileEventListenerLivecycleAware.checkFilesForActiveListeners(Main:162) [cs.jar:?] at com.cloveretl.server.timing.FileEventsJob.run(Main:37) [cs.jar:?] at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.1.14.RELEASE.jar:5.1.14.RELEASE] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_202] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_202] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_202] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_202] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202] Caused by: java.lang.IllegalStateException: java.io.IOException: Directory listing failed at com.cloveretl.server.events.b.b.obtainFilesList(Main:473) ~[cs.jar:?] at com.cloveretl.server.events.b.b.a(Main:201) ~[cs.jar:?] at com.cloveretl.server.events.b.b.checkFiles(Main:151) ~[cs.jar:?] ... 10 more Caused by: java.io.IOException: Directory listing failed at org.jetel.util.file.stream.ErrorMessageDirectoryStream.newException(ErrorMessageDirectoryStream.java:67) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.ErrorMessageDirectoryStream.next(ErrorMessageDirectoryStream.java:60) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.DelegateDirectoryStream.next(DelegateDirectoryStream.java:57) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.BufferedDirectoryStream.loadNext(BufferedDirectoryStream.java:59) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.BufferedDirectoryStream.hasNext(BufferedDirectoryStream.java:66) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.SafeIterator.hasNext(SafeIterator.java:33) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.FileManager.list(FileManager.java:1165) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.FileManager.list(FileManager.java:1234) ~[cloveretl.engine-5.8.0.jar:?] at com.cloveretl.server.events.b.b.obtainFilesList(Main:468) ~[cs.jar:?] at com.cloveretl.server.events.b.b.a(Main:201) ~[cs.jar:?] at com.cloveretl.server.events.b.b.checkFiles(Main:151) ~[cs.jar:?] ... 10 more Caused by: java.io.FileNotFoundException: File not found: "ftp://test:2N7z%25z63e%40a&n%233x@koule/file-event-listeners/tc672157/" at org.jetel.component.fileoperation.PooledFTPOperationHandler.list(PooledFTPOperationHandler.java:640) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.PooledFTPOperationHandler.list(PooledFTPOperationHandler.java:669) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.IOperationHandler.directoryStream(IOperationHandler.java:93) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.FileManager.listSingleUri(FileManager.java:1141) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.result.StreamingListResult.getCurrentResult(StreamingListResult.java:118) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.MapDirectoryStream.next(MapDirectoryStream.java:67) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.DelegateDirectoryStream.next(DelegateDirectoryStream.java:57) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.ErrorMessageDirectoryStream.next(ErrorMessageDirectoryStream.java:58) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.DelegateDirectoryStream.next(DelegateDirectoryStream.java:57) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.BufferedDirectoryStream.loadNext(BufferedDirectoryStream.java:59) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.BufferedDirectoryStream.hasNext(BufferedDirectoryStream.java:66) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.util.file.stream.SafeIterator.hasNext(SafeIterator.java:33) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.FileManager.list(FileManager.java:1165) ~[cloveretl.engine-5.8.0.jar:?] at org.jetel.component.fileoperation.FileManager.list(FileManager.java:1234) ~[cloveretl.engine-5.8.0.jar:?] at com.cloveretl.server.events.b.b.obtainFilesList(Main:468) ~[cs.jar:?] at com.cloveretl.server.events.b.b.a(Main:201) ~[cs.jar:?] at com.cloveretl.server.events.b.b.checkFiles(Main:151) ~[cs.jar:?] ... 10 more

The listener is created during FileEventListenerTest (specifically /cloveretl.server/src/integrationTest/java/com/cloveretl/server/test/events/file/suite/A2_ImportFileEventListenersTest.java). I've seen S3-remote-file-event-listener-test or FTP-slow-file-upload-listener-test. All the listeners are deleted after the test, so this should not happen. It seems that some of the listeners are sometimes recreated after deletion.

Example: https://linda.javlin.eu/jenkins/job/cloveretl.server-vte-test-worker-master/790/CONTAINER=weblogic-12-2-1,DATABASE=db2,OBFUSCATION=obfuscated,jdk=java-1.8/ - listener FTP-slow-file-upload-listener-test is active (and failing) after the tests finished.

Creation of the listener (jenkins build log + all.log):

22:00:19 com.cloveretl.server.test.events.file.FileEventListenerTestSuite > com.cloveretl.server.test.events.file.suite.A2_ImportFileEventListenersTest.testConfigImport STANDARD_OUT 22:00:19 22:00:19,329[ Test worker] equestDispatcher INFO : Using Clover Server root url 'http://172.23.3.50:7021/clover/' at com.cloveretl.server.test.utils.RequestDispatcher.<init>(RequestDispatcher.java:182) 22:00:19 22:00:19,353[ Test worker] HttpMethodBase WARN : Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended. at org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682) 22:00:19 22:00:19,842[ Test worker] equestDispatcher INFO : response: 22:00:19 Import successful at com.cloveretl.server.test.utils.RequestDispatcher.execute(RequestDispatcher.java:438) 22:00:19 22:00:19 com.cloveretl.server.test.events.file.FileEventListenerTestSuite > com.cloveretl.server.test.events.file.suite.A2_ImportFileEventListenersTest.testConfigImport PASSED
2020-08-16 22:00:19,840[e#92 taskId=115] erLifecycleAware INFO NodeDependent listener, initializing ... file APPEARANCE ftp://koule/file-event-listeners/${test.jobIdent}/listener-test.bin 2020-08-16 22:00:19,841[e#92 taskId=115] erLifecycleAware INFO NodeDependent listener, initialized EventListener#file#92 taskId=115

So the listener has id 92 and taskId 115.

Then the listener is deleted (jenkins build log + server-audit.log):

22:01:26 22:01:26,011[ Test worker] oveListenersTest INFO : Trying to delete listener with name FTP-slow-file-upload-listener-test at com.cloveretl.server.test.events.file.suite.D1_RemoveListenersTest.testDeleteListeners(D1_RemoveListenersTest.java:105) 22:01:26 22:01:26 com.cloveretl.server.test.events.file.FileEventListenerTestSuite > com.cloveretl.server.test.events.file.suite.D1_RemoveListenersTest.testDeleteListeners PASSED
2020-08-16 22:01:26,071 172.23.2.65 node=node01, user=clover, operation=ServerFacade.deleteServerEventListener, outputParams={ com.cloveretl.server.worker.commons.facade.api.Response@3e5af480[bean=null,messages=[ Msg#info:MSG_EVENT_LISTENER_DELETED, Msg#info:MSG_LISTENER_RELEASED [FILE, APPEARANCE ftp://koule/file-event-listeners/${test.jobIdent}/listener-test.bin] ] ] }

So the listener should be deleted, but it appears again with different id (from all.log):

2020-08-16 22:01:26,437[e#93 taskId=116] erLifecycleAware INFO NodeDependent listener, initializing ... file APPEARANCE ftp://koule/file-event-listeners/${test.jobIdent}/listener-test.bin 2020-08-16 22:01:26,438[e#93 taskId=116] erLifecycleAware INFO NodeDependent listener, initialized EventListener#file#93 taskId=116

Steps to reproduce

None

Attachments

3
  • 17 Aug 2020, 07:47 AM
  • 17 Aug 2020, 07:47 AM
  • 17 Aug 2020, 07:47 AM

Activity

Show:

Filip Reichman August 18, 2020 at 11:38 AM

Added a check: an EventListener is not updated if it has already been deleted.

Filip Reichman August 18, 2020 at 6:42 AM
Edited

It is a race condition

  1. there is a collection of FileEventProducers (NodeDependentEventListenerLifecycleAware.activeEventProducers), each FileEventProducer is holding an instance of FileEventListener

  2. a FileEventListener is deleted from DB

  3. in the meantime FileEventProducer holding reference to the deleted FileEventListener is executed and it updates the FileEventListener in the DB

    1. the FileEventListener is not in the DB anymore, so a brand new FileEventListener is created

  4. the original FileEventProducer is removed from the collection

  5. a new FileEventProducer is added to the collection

The FileEventProducer should be removed from the collection before it is deleted from DB, but there is a job ClusterNodeCheckJob (executing EventListenerManager.checkActiveListeners()) checking if listeners (located in the Cache) are active, so the (not in DB) listener can be initialized again.

Fixed

Details

Priority

Fix versions

Branch

QA Testing

UNDECIDED

Components

Environment

VTE, master

Created August 17, 2020 at 7:45 AM
Updated September 12, 2023 at 8:44 AM
Resolved August 18, 2020 at 11:38 AM