Uploaded image for project: 'CloverDX'
  1. CloverDX
  2. CLO-18331

GarbageManJob should not check remote job execution - performance issue

    XMLWordPrintable

    Details

    • Story Points:
      5
    • Sprint:
      Blue Sprint 103, Magenta Sprint 22, Magenta Sprint 23, Magenta Sprint 24, Magenta Sprint 25

      Description

      GarbageManJob is a job, which is executed each 5 seconds (by default) and performs several data consistency checks. One of the checks is checkRunningJobs, which tries to detects zombie jobs, which should be cleaned up. Problem is that even all jobs running on remote cluster nodes are checked one by one, which can be very heavy on inter-cluster communication and moreover this is redundant activity, becuase this GarbageManJob is running on all cluster nodes.

      Few other details are available at https://wiki.cloverdx.com/display/DEVELOPMENT/Scaling+Cluster+Node+Count, where is estimated number of remote calls to j*n(n - 1), where number of nodes is n and the number of running jobs is j.

      Each cluster node should check only its own jobs and the remote jobs should be checked on respective cluster nodes. The remote jobs can be cleaned up by GarbageManJob, if the cluster node is somehow disconnected from cluster.

        Attachments

        1. aws-cpu-new.png
          aws-cpu-new.png
          101 kB
        2. aws-cpu-orig.png
          aws-cpu-orig.png
          105 kB
        3. aws-performance-new.zip
          3.60 MB
        4. aws-performance-orig.zip
          3.57 MB
        5. aws-threads-new.png
          aws-threads-new.png
          108 kB
        6. aws-threads-orig.png
          aws-threads-orig.png
          103 kB
        7. cpu-new.png
          cpu-new.png
          100 kB
        8. cpu-new.zip
          589 kB
        9. cpu-orig.png
          cpu-orig.png
          114 kB
        10. cpu-orig.zip
          516 kB
        11. GarbageManJob.zip
          2.00 MB
        12. jobsThreads-new.png
          jobsThreads-new.png
          66 kB
        13. jobsThreads-new.zip
          564 kB
        14. jobsThreads-orig.png
          jobsThreads-orig.png
          82 kB
        15. jobsThreads-orig.zip
          497 kB
        16. performance.log.new
          1.67 MB
        17. performance.log.orig
          1.20 MB

          Issue Links

            Activity

              People

              Assignee:
              slamam Martin Slama
              Reporter:
              zatopekm Martin Zatopek
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: