Uploaded image for project: 'CloverDX'
  1. CloverDX
  2. CLO-976

Jobs on server can get stuck and cannot be killed via server gui

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: rel-3-4-0
    • Fix Version/s: rel-3-4-1
    • Component/s: Server
    • Security Level: Users (General product issues)
    • Labels:
    • Environment:

      Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.35-2 x86_64 GNU/Linux
      Tomcat 6.0.36
      Clover 3.4.0
      Oracle JDK 1.6.0_26

    • QA Testing:
      JUnit test
    • QA Test Identification:
      com.cloveretl.server.graph.AbortableJobflowTest

      Description

      It seems that sometimes jobs (graphs or jobflows) can get stuck on the server and never end. I had this happen in several situations:

      • Job was running ok and finished (i.e. produced output as I expected), but server never realized that it actually ended. Thus the job was showing as running until I restarted the server.
      • Child job of the jobflow failed and jobflow never realized it finished and got stuck waiting for it.

      It is impossible to kill these jobs with server gui - nothing seems to happen. The only way of "fixing" this is to restart the whole server which might not be an option when it is running in production. The server also seems to be extremely slow when processing the kill request.

      The screenshots attached to the issue show the second case:

      • Execution history seems to think that the job is running
      • Thread monitor shows Watchdog for job 3330 as blocked.
      • There are no useful messages in the log on the server.

        Attachments

        1. catalina.zip
          131 kB
        2. execution-history.png
          execution-history.png
          44 kB
        3. graphs.zip
          11 kB
        4. stack-traces.txt
          64 kB
        5. stuck-graphs.png
          stuck-graphs.png
          13 kB
        6. thread_locks.txt
          46 kB
        7. threads.png
          threads.png
          86 kB
        8. Threads.xlsx
          11 kB

          Issue Links

            Activity

              People

              • Assignee:
                zatopekm Martin Zatopek
                Reporter:
                repcekb Branislav Repcek
              • Votes:
                1 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 0 minutes
                  0m
                  Remaining:
                  Remaining Estimate - 0 minutes
                  0m
                  Logged:
                  Time Spent - 6 hours
                  6h