google compute engine - How can I automatically kill idle GCE instances based on CPU usage? -


i'm running unreliable software on instances in instance group. software installed , run startup script, , of time works without issue, ~10% of new instances run out of memory , crash due sort of memory leak in software. can't leak fixed myself, in meantime, i've been checking instances every few hours , killing show idle cpu (the software consumes available cpu power normally).

however, i'm using preemptible instances, , can killed off , restarted @ time, leaving dead instances running whenever i'm not actively monitoring them. after day of leaving things unattended, see ~80-85% cpu usage in dashboard, rest of wasted.

is there automated way can kill off these dead instances? restarting them handled instance group.

it seems there 2 parts question:

  1. identifying dead instances.
  2. killing off instances.

in terms of identifying dead instances, 1 way have separate, management instance not run software , keeps tabs on other instances. example, periodically sending health request various instances , marking non-responsive instances or instances reporting overly high cpu usage unhealthy.

once management instance has identified unhealthy instances need reset, should able reset other instances using api (i'm guessing reset command) or executing same operation using gcloud commandline tool.


Comments

Popular posts from this blog

angularjs - ADAL JS Angular- WebAPI add a new role claim to the token -

php - CakePHP HttpSockets send array of paramms -

node.js - Using Node without global install -