You are browsing the archive for 2014 December.

SCOM: Configure a monitor recovery task for a healthy state

12:06 am in operations manager, SCOM, SCOM 2012 by Dieter Wijckmans

During a recent project a client had a small request to create a monitor and run a command when a device was not accessible anymore. Easy right! But (yep there’s always a but) they wanted to run a command when the monitor was returning back to a healthy state to restart a service when the device came back online… Hmmm and all in 1 monitor.

So the conditions were as follows:

Monitor:

  • Action: Run a PowerShell based monitor to test the connection with the device
  • BAD: Device is down => Run recovery task to remediate
  • GOOD: Device is up again => Run recovery task to restart service

(note: Always do this small matrix of a monitor design to exactly know what the customer wants)

I don’t have the device to simulate but came up with a small example in my lab to show you how to get this working with just 1 monitor. The situation in my lab is very simple. I want to turn on my desk lighting when my pc is on (and I’m working) and turn it off when my pc is not online.

My conditions:

Monitor:

  • Action: Run Powershell based monitor to test the connection and pass the result to SCOM
  • BAD: PC is offline: => turn off my desk lighting
  • GOOD: PC is online:=> turn on my desk lighting

So first things first we need to test the connection to see whether my pc is running. To check this I’m using this small script:


param ([string]$target)
$API = New-Object -ComObject "MOM.ScriptAPI"
$PropertyBag = $API.CreatePropertyBag()

$value = Test-connection $target -quiet

$PropertyBag.AddValue("status", $value)

$PropertyBag
$API.Return($propertybag)

So I’m testing the connection and sending the response to SCOM. The  PowerShell “Test-Connection $target –quiet” command will just return true or false as a result whether the target is accessible or not

Creating the Monitor with Silect MP Author

The creation of this monitor consists of 2 parts:

  • Defining the class where the monitor will be targeted to and therefore the machine which will test the connection to the desktop
  • Passing the status from the machine to SCOM and take action by using a monitor

Defining a class:

To properly target this monitor we need to create a class in SCOM which identifies the servers that need to test the connection. In this case I’ve added a reg key to all servers who need to ping the desktop so I’m starting a Registry Target to create my class:

printscreen-0254printscreen-0255

I fill in a server that has the key already in there to make it much easier to browse the registry instead of typing it in with an increased margin for errors.

printscreen-0256

Select the Registry key you want to look for

printscreen-0257

In my case I’ve added a key under HKEY_LOCAL_MACHINE\Software\pingtestwatchernode

printscreen-0258

Select the key and press add and ok

printscreen-0259

Identify your registry target:

printscreen-0260

Identify your discovery for the target

printscreen-0261

In my case I just check whether the key is there. No check on the content.

printscreen-0263

The discovery will run once a day.

printscreen-0264

Review everything and press finish

printscreen-0265

At this point our class is ready to be targeted with our script monitor.

Next up is to create the monitor:

Create a new script monitor:

printscreen-0266

Browse to the PowerShell script and fill in the parameters. In this case I have 1 parameter which is “target” and will hold the IP of the desktop.

printscreen-0267

Define the conditions:

Healthy condition is when the status is true and type boolean

printscreen-0268

Critical condition is when the status is False

printscreen-0269

Note: I’m using a “boolean” Type

Configure the script and select the target you have created earlier on and the availability parent monitor

printscreen-0270

Identify your script based monitor

printscreen-0271

Specify a periodic: run every 2 minutes

printscreen-0272

No alert generation necessary.

printscreen-0273

Review all the parameters and create the script based monitor.

printscreen-0274

Load the management pack in your environment and locate the monitor:

printscreen-0278

Check the properties => recovery tasks and create 2 recovery tasks for the Health state “critical”.

Note that the screenshot below already shows the correct healthy state after config of the mp.

printscreen-0279

Export the managment pack and open it in an editor and locate the “recoveries” section to find your recovery tasks we just created:

printscreen-0280

scroll to the right and locate the “ExecuteOnState” parameter and change the one you want to run when the monitor goes back to healthy from “Error” to “Success”

Save the management pack and reload it in your environment.

printscreen-0281

So all we need to do is test it…

My pc is on: IT-Rambo has his cool backlight:

20141130_230930098_iOS

My pc is off and the light is automatically turned off…

20141130_230904267_iOS

Final Note: If you use this method you need to make sure to NOT save the recovery tasks in the console anymore otherwise the different settings we just changed in our management pack will be again overwritten as SCOM can’t natively configure a recovery task for a healthy state.

You can use this basically for anything where you want to run 2 conditions on the same monitor or even 3 if you have a 3 state monitor.

SCOM: Monitor the monitor part 1: PowerShell

5:56 pm in operations manager, SCOM 2012, sysctr by Dieter Wijckmans

Recently I got a question of an engineer during a community event why SCOM didn’t notify him when SCOM was down.

My first response was very similar to the response of my favorite captain below: printscreen_surf-0018

But this got me thinking actually because the engineer made a good point. That to have a full monitoring you should have another mechanism in place to monitor the monitoring system. Most companies still have a legacy monitoring system in place that can be leveraged to monitor the servers of SCOM but let’s face it: keeping another monitoring system alive just to monitor the SCOM servers only adds complexity to your environment for a small benefit.

That’s why I started building a small independent check with PowerShell. In part 1 of this series I’ll go over how to monitor whether your management servers are still up and running.

To do this we need to make sure that we have a watcher node which is able to ping the management servers. This watcher node may be any machine capable of running PowerShell and does not need to have operationsmanager PowerShell module available. This to make sure we are operating completely independent from SCOM.

Process used

The graph below shows the process used:

monitorthemonitor_servers

In my environment I have 2 management servers which are reachable from the watcher node. The first step is to dynamically determine how many management servers are in my environment. To do this I’m creating the input file which is generated by PowerShell on a management server and updated once a day. This is an automated process because face it: if we need to think about changing the infile.txt when we add or delete another management server we will forget.

This file will be available on the watcher node to do the ping commands even when the management servers are down.

Configuration on the Management server

(this is action 1 in the graph above)

To generate the infile containing all the management servers which are currently in our environment we need to execute the following PowerShell command on the watcher node:

#=====================================================================================================
# AUTHOR:    Dieter Wijckmans
# DATE:        03/12/2014
# Name:        Readms.PS1
# Version:    1.0
# COMMENT:    This script will read out all the Management servers in a management group and saves it
#           into a txt file which is used to ping the servers from an external watcher node.
#           This script is scheduled on a management server via scheduled tasks.
#           Make sure to fill in your destination (which is your watcher node) in the variable
#
# Usage:    readms.PS1
# Example:
#=====================================================================================================
$destination: "fill in the destination on the watchernode here"
$ms = get-scommanagementserver

foreach ($mstemp in $ms){
$ms.DisplayName | Out-File $destination
}

Schedule this script on the management server via scheduled tasks and run it once a day.

The program to run is: powershell.exe c:\scripts\readms.ps1

This will generate the infile for the ping command to check the management servers and will place it on the watcher node.

Configuration on the Watcher node

(this is action 2 in the graph above)

Next up is to configure the watcher node to monitor our management servers and alert when they are unreachable. This is done by executing the following PowerShell on a regular basis through schedule tasks. I schedule this task every 5 minutes. This means that you get a mail every 5 min until it’s resolved. Better annoy a little bit more than just send 1 mail which just drowns in the mail volume.

#=====================================================================================================
# AUTHOR:    Dieter Wijckmans
# DATE:        03/12/2014
# Name:        Pingtest.PS1
# Version:    1.0
# COMMENT:    This script will ping all the Management servers in a management group according to the
#           input file and escalate when a server is not reachable.
#           Make sure to fill in all the parameters in the parameter section.
#           This script is scheduled on the watcher node via a scheduled tasks.
#           Make sure to fill in your destination (which is your watcher node) in the variable
#
# Usage:    pingtest.PS1
# Example:
#=====================================================================================================

#parameter section: Fill in all the parameters below
$infile = "Location of file with management servers listed"
$outfile = "Location of file which will keep historical data on the pings"
$smtp = "fill in your smtp config to send mail"
$to = "The destination email address"
$from = "The from email address"

#reading the date when the test is executed for logging in the historical file
$testexecuted = Get-Date
#reading in all the objects listed in the infile
$objects = get-content $infile

#running through all the objects and taking action accordingly
foreach ($object in $objects)
{
$pingresult = Test-Connection $object -quiet
if ($pingresult -eq $True)
{
$pingresult = "Online"
}
else
{
$pingresult = "Offline"
$subject = "SCOM: Management Server " + $object + " is down!"
$body = "<b><font color=red>ATTENTION SCOM support staff:</b></font> <br>"
$body += "Management Server: " + $object + " is down! Please check the server!"
send-MailMessage -SmtpServer $smtp -To $to -From $from -Subject $subject -Body $body -BodyAsHtml -Priority high
}
$result = $object + " :ping result: " + $pingresult + " :" + $testexecuted | Out-File $outfile -append

}

#read the length of the inputfile and validate the same amount of lines in the outfile to validate whether all management
#servers are down.
$filelength= Get-content $infile | measure-object -Line
$numberoflines = $filelength.Lines
$file = Get-Content $outfile -Tail $numberoflines
$wordToFind = "Online"
$containsWord = $file | %{$_ -match $wordToFind}
If($containsWord -notcontains $True)
{
$subject = "SCOM: ALL Management Servers are down!"
$body = "<b><font color=red>ATTENTION SCOM support staff:</b></font> <br>"
$body += "All Management servers are down. Please take immediate action"
send-MailMessage -SmtpServer $smtp -To $to -From $from -Subject $subject -Body $body -BodyAsHtml -Priority high
}

Note: Make sure that you change all the parameters in the parameter section.

This script will ping all the machines which are filled in in the infile we created earlier and writes this to the out-file. The outfile is than evaluated and a mail is automatically send when a management server is down. If ALL management servers are down a separate mail is sent to notify that SCOM is completely down.

You can change the mail appearance in the $body fields in the PowerShell.

The outfile will have the following entries:

printscreen_surf-0020

My servers were Offline last night at 21:13:38. So the mailing was triggered and  the mail will look like below when SCOMMS2 is down:

printscreen_surf-0019

When all management servers are down it will look like this:

printscreen_surf-0021

So now we get completely independent from SCOM mails telling us there’s an issue with the SCOM management servers.

  • So what if our watcher node is down? Well I’ve installed a SCOM agent on this machine with a special subscription to notify me when it’s down.
  • So what if our management servers are down AND my watcher node is down… Well then you probably have a far greater problem and your phone will probably be already red hot by now…

You can find the PowerShell scripts and the files here on Technet Gallery:

download-button-fertig11

In Part 2 I’ll go over the ability to monitor your SQL connection of the management servers.