SCOM data warehouse troubles #2: The missing objects

June 15, 2015 at 7:40 pm in Uncategorized by Jan Van Meirvenne

The previous week I noticed that my customer’s reports were missing a lot of data in terms of recently added servers and their underlying objects. turns out they didn’t exist in the data warehouse at all, while they were certainly a couple of days old.

I troubleshooted the issue and found that there was a conflict between 2 tables in the data warehouse, effectively blocking the entire syncing process of SCOM!

You can read my adventure including the happy ending here

Service Manager: hiding the default incident offering from the portal

May 14, 2015 at 6:05 pm in Uncategorized by Jan Van Meirvenne

This week I was asked how one can remove the default incident offering. This might be important if a company wants to make sure a certain set of information is entered with each incident.

Although this seemed simple to do, it wasn’t that easy.

You can find the full explanation here: http://jvm-net.azurewebsites.net/?p=1421

‘Web Management service is stopped’

March 16, 2015 at 12:29 pm in Uncategorized by Jan Van Meirvenne

There is a small bug in the IIS 7.5 Management Pack which might cause false alerts of the type ‘Web Management service is stopped’ to show up. I have written a short blog post on how to tackle this bug, including an example: link

“Report subscription list could not be loaded” when trying to view report schedules in SCOM

January 13, 2015 at 12:59 pm in Uncategorized by Jan Van Meirvenne

A small year ago, I performed a very troublesome upgrade from SCOM 2007 to SCOM 2012 on a large company site. One of the issues forced us to reinstall the SCOM reporting component. In an attempt to retain the reports I backed up and restored the report server databases after the reinstall.

We did not use scheduled reports for a long time, that’s why the problem surfaced only when an application owner asked for a periodic performance report. When trying to open the ‘Scheduled Reports’ view in the reporting pane of the console, I got the following error (the screenshot is from SCOM 2007, but the problem also can occur in SCOM 2012):

After long trial and error and comparing settings with a fully functional reporting setup, I found the issue:

When opening the problematic view in the console, SCOM queries the ‘Subscriptions’ table in the reporting server database. Apparently, some entries were corrupted during the restore as some fields that sounded important like ‘Report Deliver Extension’ where blank. SCOM probably does not expect to have blanks returned, resulting in the aforementioned error.

I suspect that this might have been fixable, but because I had much on my todo-list and this was the first subscription needed on the report server, I deleted everything in the subscriptions table (present in the Report Server database):

delete  from Subscriptions
(note that this is probably unsupported and might be a showstopper when needing Microsoft support afterwards!)

After this action, the console could open the schedule-view without issues, and when I created a new schedule using the console, it appeared in the view.

I don’t suspect this is an issue you will encounter on a normal operational day, but if you were having a rough upgrade as well I hope this helps you out!

Troubleshooting the Service Manager 2012 ETL processes

December 15, 2014 at 3:53 pm in #scsm, #sysctr by Jan Van Meirvenne

This post will aid in troubleshooting the following issues concerning the Service Manager Data Warehouse:
– Slow execution of ETL jobs
– ETL jobs failing to complete
– ETL jobs failing to start

1. Troubleshooting

– Open a remote desktop session to the Service Manager Management Server

– Open the service manager management shell

– Request the data-warehouse jobs

Get-SCDWJob –computername <Your DW Server>|ft Name, Status, CategoryName,IsEnabled

– This will result in a list of data warehouse jobs and their state

image

– If there are jobs with a ‘stopped’status, then resume them:

Start-SCDWJob –jobname <The name of the job to start (eg ‘DWMaintenance’) –computername <Your DW Server>

– If there are jobs that are not enabled (IsEnabled column is ‘false’) AND the MPSyncJob or DWMaintenance jobs are not running (they disable some jobs at runtime) then re-enable them:

Enable-SCDWJob –jobname <The name of the job to start (eg ‘DWMaintenance’) –computername <Your DW Server>

– Run the following script to reset the jobs (it will rerun all jobs in the correct order). This script exists thanks to Travis Wright.

 

$DWComputer = “<Your DW Server>

$SMExtractJobName = "<Operational Management Group Name> "

$DWExtractJobName = "<DW Management Group Name> "

Import-Module ‘C:\Program Files\Microsoft System Center 2012\Service Manager\Microsoft.EnterpriseManagement.Warehouse.Cmdlets.psd1′

function Start-Job ($JobName, $Computer)

{

$JobRunning = 1

while($JobRunning -eq 1)

{

$JobRunning = Start-Job-Internal $JobName $Computer

}

}

function Start-Job-Internal($JobName, $Computer)

{

$JobStatus = Get-JobStatus $JobName

if($JobStatus -eq "Not Started")

{

Write-Host "Starting the $JobName Job…"

Enable-SCDWJob -JobName $JobName -Computer $Computer

Start-SCDWJob -JobName $JobName -Computer $Computer

Start-Sleep -s 5

}

elseif($JobStatus -eq "Running")

{

Write-Host "$JobName Job is already running. Waiting 30 seconds and will call again."

Start-Sleep -s 30

return 1

}

else

{

Write-Host "Exiting since the job is in an unexpected status"

exit

}

$JobStatus = "Running"

while($JobStatus -eq "Running")

{

Write-Host "Waiting 30 seconds"

Start-Sleep -s 30

$JobStatus = Get-JobStatus $JobName

Write-Host "$JobName Job Status: $JobStatus"

if($JobStatus -ne "Running" -and $JobStatus -ne "Not Started")

{

Write-Host "Exiting since the job is in an unexpected status"

exit

}

}

return 0

}

function Get-JobStatus($JobName)

{

$Job = Get-SCDWJob -JobName $JobName -Computer $Computer

$JobStatus = $Job.Status

return $JobStatus

}

#DWMaintenance

Start-Job "DWMaintenance" $DWComputer

#MPsyncJob

Start-Job "MPSyncJob" $DWComputer

#ETL

Start-Job $SMExtractJobName $DWComputer

Start-Job $DWExtractJobName $DWComputer

Start-Job "Transform.Common" $DWComputer

Start-Job "Load.Common" $DWComputer

#Cube processing

Start-Job "Process.SystemCenterConfigItemCube" $DWComputer

Start-Job "Process.SystemCenterWorkItemsCube" $DWComputer

Start-Job "Process.SystemCenterChangeAndActivityManagementCube" $DWComputer

Start-Job "Process.SystemCenterServiceCatalogCube" $DWComputer

Start-Job "Process.SystemCenterPowerManagementCube" $DWComputer

Start-Job "Process.SystemCenterSoftwareUpdateCube" $DWComputer

– If a particular job keeps stalling / failing during or after the script execution, check which job-module is having problems:

Get-SCDWJobModule –jobname <The name of the job experiencing issues> –computername <Your DW Server>

– Check how long the jobs has been failing / stalling

Get-SCDWJob –jobname <The name of the job experiencing issues> -NumberOfBatches 10 –computername  <Your DW Server>

– Check the ‘Operations Manager’ eventlog on the data warehouse server. Look for events with as source ‘Data Warehouse’. Error or Warning events might pinpoint the issue with the job.

– Check the CPU and Memory of the data warehouse server, and check if one or both are peaking a lot.

 

2. Common possible causes

 

2.1. Resource Pressure

The data warehouse server takes up a lot of resources to process data. Job duration and reliability can be greatly increased by providing sufficient CPU and memory resources. Exact requirements depend on each individual setup, but these are some guidelines:

CPU

Memory

Hard Drive

4-core 2.66Ghz

Server Component: 8-16GB

Databases: 8-32Gb

Server Component: 10Gb

Databases: 400Gb

2.2. Service Failure

The ETL process of the Data Warehouse depends on multiple services to function correctly:

– Microsoft Monitoring Agent

– System Center Data Access

– System Center Management Configuration

– SQL Server SCSMDW

– SQL Serer Analysis Services

– SQL Server Agent

– SQL Server

Verify if these services are running correctly (the ‘Application’ and / or ‘Operations Manager’ event logs can hold clues as to why a service can not run correctly.

2.3. Authentication Failure

Various runas-accounts are used to execute the ETL jobs:

– A workflow account that executes program logic on the data warehouse server. This account must have local administrator privileges on the data warehouse server.

– An operational database account that has access to the SCSM databases for data extraction. This account must be owner of all databases.

– A runas-account that has administrator privileges on both the operational and the data warehouse management groups.

Most of these accounts are entered during setup and should not be changed afterwards. If these accounts do not have the required permissions then some or all functionalities related to the ETL process can be impacted.

Should error events indicate that a permission issue is the cause, then verify and repair the necessary permissions for these accounts.

SCOM Quick Query: Logical Disk Space For My Environment

October 23, 2014 at 10:31 am in #scom, #sysctr by Jan Van Meirvenne

 

Sometimes I get questions in the style of “What is the current state of my environment in terms of…”. If there is no report in SCOM I can point to I usually create a quick query on the Data Warehouse and provide the data as an excel sheet to the requestor. Afterwards, should the question be repeated over and over, I create a report for it and provide self-service information.

In order to both prevent forgetting these kind of ‘quick and dirty’ queries, and also sharing my work with you I will occasionally throw in a post if I have a query worth mentioning.

Here we go for the first one!

If you are not interested in using the extended Logical Disk MP you can use this query on your DW to quickly get a free space overview of all logical disks in your environment :

select max(time) as time,server,disk,size,free,used from
(
select perf.DateTime as time,e.path as server, e.DisplayName as disk, round(cast(EP.PropertyXml.value(‘(/Root/Property[@Guid="A90BE2DA-CEB3-7F1C-4C8A-6D09A6644650"]/text())[1]’, ‘nvarchar(max)’) as int) / 1024,0) as size, round(perf.SampleValue / 1024,0) as free, round(cast(EP.PropertyXml.value(‘(/Root/Property[@Guid="A90BE2DA-CEB3-7F1C-4C8A-6D09A6644650"]/text())[1]’, ‘nvarchar(max)’) as int) / 1024,0) – round(perf.SampleValue / 1024,0) as used from perf.vPerfRaw perf inner join vManagedEntity e on perf.ManagedEntityRowId = e.ManagedEntityRowId
inner join vPerformanceRuleInstance pri on pri.PerformanceRuleInstanceRowId = perf.PerformanceRuleInstanceRowId
inner join vPerformanceRule pr on pr.RuleRowId = pri.RuleRowId
inner join vManagedEntityProperty ep on ep.ManagedEntityRowId = e.ManagedEntityRowId
where
pr.ObjectName = ‘LogicalDisk’
and
pr.CounterName = ‘Free Megabytes’
and
ep.ToDateTime is null
and Perf.DateTime > dateadd(HOUR,-1,GETUTCDATE())
) data
group by data.server,data.disk,data.size,data.free,data.used
order by server,disk

 

Available fields:

Time: the timestamp of the presented data
Server: the server the disk belongs to
Disk: The name of the logical disk
Size: the size of the disk in GB
Free: the free space on the disk in GB
Used: the used space on the disk in GB

 

Please note that I am not a SQL guru, so if you find a query containing war crimes against best practices, don’t hesitate to let me know!

 

See you in another knowledge dump!

SCOM authoring: the aftermath

October 7, 2014 at 6:52 am in #scom, #sysctr by Jan Van Meirvenne

 

For the people who attended my SCOM authoring session, thanks once again for your attention. While it was quite an advanced topic I hope it shed some light on how SCOM functions and how it can be optimized regarding creating and maintaining monitoring definitions.

My slide deck can be found here: http://www.slideshare.net/JanVanMeirvenne/scom-authoring

My demo project can be found here: http://1drv.ms/1t0OPIG

Please note that Microsoft announced the retirement of the Visio authoring method for SCOM. Although it is a useful tool especially with designing management packs alongside with customers, I guess it was a bit too much of an odd bird with many limitations. The recommendation is to use MPAuthor or the Visual Studio addin (links included in the wiki link below).

If you want to learn more on this topic and maybe try some things out for yourself, there are some excellent resources available:

– Microsoft Virtual Academy: http://channel9.msdn.com/Series/System-Center-2012-R2-Operations-Manager-Management-Packs

– Authoring section of the SCOM wiki: http://social.technet.microsoft.com/wiki/contents/articles/20796.the-system-center-2012-r2-operations-manager-survival-guide.aspx#Management_Packs_and_Management_Pack_Authoring

– MSDN Library (contains technical documentation of the modules used while authoring): http://msdn.microsoft.com/en-us/library/ee533840.aspx

If you have questions regarding these topics, don’t hesitate to drop me a comment, tweet @JanVanMeirvenne or mail to jan.vanmeirvenne@ferranti.be

See you in another blogpost!

SCOM DW not being updated with operational data?

July 1, 2014 at 7:38 am in #scom, #sysctr by Jan Van Meirvenne

With this blogpost I want to make a ‘catch-all’ knowledge-article containing problems and fixes I learned in the field regarding SCOM DW synchronization issues. I will update this post regularly if I encounter any new phenomenon on this subject.

Possible Cause 1: The synchronization objects and settings are missing from the management group

Diagnosis

Run the following powershell-commands in the SCOM powershell interface of the affected management group:

get-SCOMClass -name:Microsoft.SystemCenter.DataWarehouseSynchronizationService|Get-ScomClassInstance

If no objects are returned it means that the workflows responsible for synchronizing data are not running.

Add-pssnapin microsoft.enterprisemanagement.operationsmanager.client
Set-location OperationsManagerMonitoring::
New-managementgroupconnection <SCOM management server>
get-DefaultSetting ManagementGroup\DataWarehouse\DataWarehouseDatabaseName $DataWarehouseDatabaseName

get-DefaultSetting ManagementGroup\DataWarehouse\DataWarehouseServerName $DataWarehouseSqlServerInstance

If the default settings are not set this indicates that the DW registration has been broken.

Causes

The breakage and disappearance of the DW synchronization objects and settings can happen when a SCOM 2007->2012 upgrade fails and you have to recover the RMS. The issue is hard to detect (especially if you do not use reporting much) as no errors are generated.

Solution

The settings and objects need to be regenerated manually using the script below. This will add all necessary objects to SCOM with the correct server and database references. The DW properties will also be added to the default-settings section.

You will have to edit the script and enter the Operations Manager Database server name, Data Warehouse servername and console path in the script . This is a PowerShell script which needs to copied to text and rename to .ps1 after entering the required information to run under PowerShell.

#Populate these fields with Operational Database and Data Warehouse Information

#Note: change these values appropriately

$OperationalDbSqlServerInstance = “<OpsMgrDB server instance. If its default instance, only server name is required>”

$OperationalDbDatabaseName = “OperationsManager”

$DataWarehouseSqlServerInstance = “<OpsMgrDW server instance. If its default instance, only server name is required>”

$DataWarehouseDatabaseName = “OperationsManagerDW”

$ConsoleDirectory = “<OpsMgr Console Location by default it will be C:\Program Files\System Center 2012\Operations Manager\Console”

$dataWarehouseClass = get-SCOMClass -name:Microsoft.SystemCenter.DataWarehouse

$seviewerClass = get-SCOMClass -name:Microsoft.SystemCenter.OpsMgrDB.AppMonitoring

$advisorClass = get-SCOMClass -name:Microsoft.SystemCenter.DataWarehouse.AppMonitoring 

$dwInstance = $dataWarehouseClass | Get-SCOMClassInstance

$seviewerInstance = $seviewerClass | Get-SCOMClassInstance

$advisorInstance = $advisorClass | Get-SCOMClassInstance 

#Update the singleton property values

$dwInstance.Item($dataWarehouseClass.Item(“MainDatabaseServerName”)).Value = $DataWarehouseSqlServerInstance

$dwInstance.Item($dataWarehouseClass.Item(“MainDatabaseName”)).Value = $DataWarehouseDatabaseName 

$seviewerInstance.Item($seviewerClass.item(“MainDatabaseServerName”)).Value = $OperationalDbSqlServerInstance

$seviewerInstance.Item($seviewerClass.item(“MainDatabaseName”)).Value = $OperationalDbDatabaseName

$advisorInstance.Item($advisorClass.item(“MainDatabaseServerName”)).Value = $DataWarehouseSqlServerInstance

$advisorInstance.Item($advisorClass.item(“MainDatabaseName”)).Value = $DataWarehouseDatabaseName 

$dataWarehouseSynchronizationServiceClass = get-SCOMClass -name:Microsoft.SystemCenter.DataWarehouseSynchronizationService

#$dataWarehouseSynchronizationServiceInstance = $dataWarehouseSynchronizationServiceClass | Get-SCOMClassInstance 

$mg = New-Object Microsoft.EnterpriseManagement.ManagementGroup -ArgumentList localhost

$dataWarehouseSynchronizationServiceInstance = New-Object Microsoft.EnterpriseManagement.Common.CreatableEnterpriseManagementObject -ArgumentList $mg,$dataWarehouseSynchronizationServiceClass 

$dataWarehouseSynchronizationServiceInstance.Item($dataWarehouseSynchronizationServiceClass.Item(“Id”)).Value = [guid]::NewGuid().ToString()

#Add the properties to discovery data

$discoveryData = new-object Microsoft.EnterpriseManagement.ConnectorFramework.IncrementalDiscoveryData 

$discoveryData.Add($dwInstance)

$discoveryData.Add($dataWarehouseSynchronizationServiceInstance)

$discoveryData.Add($seviewerInstance)

$discoveryData.Add($advisorInstance)

$momConnectorId = New-Object System.Guid(“7431E155-3D9E-4724-895E-C03BA951A352″)

$connector = $mg.ConnectorFramework.GetConnector($momConnectorId) 

$discoveryData.Overwrite($connector)

#Update Global Settings. Needs to be done with PS V1 cmdlets

Add-pssnapin microsoft.enterprisemanagement.operationsmanager.client

cd $ConsoleDirectory

.\Microsoft.EnterpriseManagement.OperationsManager.ClientShell.NonInteractiveStartup.ps1

Set-DefaultSetting ManagementGroup\DataWarehouse\DataWarehouseDatabaseName $DataWarehouseDatabaseName

Set-DefaultSetting ManagementGroup\DataWarehouse\DataWarehouseServerName $DataWarehouseSqlServerInstance

If the script ran successfully and you run the commands specified in the diagnosis-section you should receive valid object- and settings information. The synchronization should start within a few moments.

Sources

http://support.microsoft.com/kb/2771934

Agents fail to connect to their management group with error message “The environment is incorrect”

May 15, 2014 at 4:20 pm in #scom, #sysctr by Jan Van Meirvenne

Symptoms

– Agents stop receiving new monitoring configuration
– When restarting the agent service, the following events are logged in the Operations Manager event log:

clip_image001

clip_image002

clip_image003

Cause

This indicates that the agent can not find the SCOM connection information in AD. This is usually because it is not permitted to do so.

Resolution

All connection info is found in the Operations Manager container in the AD root. If you do not see it using “Active Directory Users and Computers” then click “View” and enable “Advanced Features”.

image

(Screenshot taken from http://elgwhoppo.com/2012/07/25/scom-2012-ad-integration-not-populating-in-ad/)

The container will contain a subcontainer for each management group using AD integration. In the subcontainer there are a set of SCP-objects, containing the connection information for each management server, and 2 security groups per SP: PrimarySG… and SecondarySG…. These groups will be populated with computer objects using the LDAP queries you provided in the AD integration wizard of the SCOM console. So for example if your LDAP query specifies only servers ending with a 1, only those objects matching the criteria will be put in the group.

These security groups should normally both have read-access on their respective SCP-object (eg for management server “foobar” the groups with “PrimarySG_Foobar…” and “SecondarySG_Foobar…” should have read access on the SCP-object for this management server.

If the security is correct the agent can only see the SCP-objects to which it should connect in a normal and failover situation.

If these permissions are not correct then you can safely adjust them manually (only provide read access). The agents will almost immediately pick up the SCP once they have permission. If this is not the case, restart the agent service.

Fixing missing stored procedures for dashboards in the datawarehouse

April 1, 2014 at 1:45 pm in #scom, #sysctr by Jan Van Meirvenne

When you are creating a dashboard in SCOM, you might receive strange errors stating that a certain stored procedure was not found in the database. This is an issue I often encounter and which indicates that something went wrong during the import of the visualization management packs. These management pack bundles contain the SQL scripts that create these procedures. By extracting the MPB’s and manually executing each SQL script, you can fix this issue rather easily. However, this is not a supported fix and you should make a backup of your DW in case something explodes!

Extract the following MPB files using this script:

– Microsoft.SystemCenter.Visualization.Internal.mpb
(found on the installation media under ‘ManagementPacks’)

– Microsoft.SystemCenter.Visualization.Library.mpb
(use the one from the patch folder if you applied one: %programfiles%\System Center 2012\Operations Manager\Server\Management Packs for Update Rollups)

First, execute the single SQL script that came from the Internal MPB (scripts with ‘drop’ in the name can be ignored) on the Datawarehouse

Secondly, execute each SQL script from the Library MPB on the DW (again, ignore the drop scripts). The order does not matter.

The dashboards should now be fully operational. Enjoy SCOM at its finest!