Troubleshooting the Service Manager 2012 ETL processes

3:53 pm in #scsm, #sysctr by Jan Van Meirvenne

This post will aid in troubleshooting the following issues concerning the Service Manager Data Warehouse:
– Slow execution of ETL jobs
– ETL jobs failing to complete
– ETL jobs failing to start

1. Troubleshooting

– Open a remote desktop session to the Service Manager Management Server

– Open the service manager management shell

– Request the data-warehouse jobs

Get-SCDWJob –computername <Your DW Server>|ft Name, Status, CategoryName,IsEnabled

– This will result in a list of data warehouse jobs and their state

image

– If there are jobs with a ‘stopped’status, then resume them:

Start-SCDWJob –jobname <The name of the job to start (eg ‘DWMaintenance’) –computername <Your DW Server>

– If there are jobs that are not enabled (IsEnabled column is ‘false’) AND the MPSyncJob or DWMaintenance jobs are not running (they disable some jobs at runtime) then re-enable them:

Enable-SCDWJob –jobname <The name of the job to start (eg ‘DWMaintenance’) –computername <Your DW Server>

– Run the following script to reset the jobs (it will rerun all jobs in the correct order). This script exists thanks to Travis Wright.

 

$DWComputer = “<Your DW Server>

$SMExtractJobName = "<Operational Management Group Name> "

$DWExtractJobName = "<DW Management Group Name> "

Import-Module ‘C:\Program Files\Microsoft System Center 2012\Service Manager\Microsoft.EnterpriseManagement.Warehouse.Cmdlets.psd1′

function Start-Job ($JobName, $Computer)

{

$JobRunning = 1

while($JobRunning -eq 1)

{

$JobRunning = Start-Job-Internal $JobName $Computer

}

}

function Start-Job-Internal($JobName, $Computer)

{

$JobStatus = Get-JobStatus $JobName

if($JobStatus -eq "Not Started")

{

Write-Host "Starting the $JobName Job…"

Enable-SCDWJob -JobName $JobName -Computer $Computer

Start-SCDWJob -JobName $JobName -Computer $Computer

Start-Sleep -s 5

}

elseif($JobStatus -eq "Running")

{

Write-Host "$JobName Job is already running. Waiting 30 seconds and will call again."

Start-Sleep -s 30

return 1

}

else

{

Write-Host "Exiting since the job is in an unexpected status"

exit

}

$JobStatus = "Running"

while($JobStatus -eq "Running")

{

Write-Host "Waiting 30 seconds"

Start-Sleep -s 30

$JobStatus = Get-JobStatus $JobName

Write-Host "$JobName Job Status: $JobStatus"

if($JobStatus -ne "Running" -and $JobStatus -ne "Not Started")

{

Write-Host "Exiting since the job is in an unexpected status"

exit

}

}

return 0

}

function Get-JobStatus($JobName)

{

$Job = Get-SCDWJob -JobName $JobName -Computer $Computer

$JobStatus = $Job.Status

return $JobStatus

}

#DWMaintenance

Start-Job "DWMaintenance" $DWComputer

#MPsyncJob

Start-Job "MPSyncJob" $DWComputer

#ETL

Start-Job $SMExtractJobName $DWComputer

Start-Job $DWExtractJobName $DWComputer

Start-Job "Transform.Common" $DWComputer

Start-Job "Load.Common" $DWComputer

#Cube processing

Start-Job "Process.SystemCenterConfigItemCube" $DWComputer

Start-Job "Process.SystemCenterWorkItemsCube" $DWComputer

Start-Job "Process.SystemCenterChangeAndActivityManagementCube" $DWComputer

Start-Job "Process.SystemCenterServiceCatalogCube" $DWComputer

Start-Job "Process.SystemCenterPowerManagementCube" $DWComputer

Start-Job "Process.SystemCenterSoftwareUpdateCube" $DWComputer

– If a particular job keeps stalling / failing during or after the script execution, check which job-module is having problems:

Get-SCDWJobModule –jobname <The name of the job experiencing issues> –computername <Your DW Server>

– Check how long the jobs has been failing / stalling

Get-SCDWJob –jobname <The name of the job experiencing issues> -NumberOfBatches 10 –computername  <Your DW Server>

– Check the ‘Operations Manager’ eventlog on the data warehouse server. Look for events with as source ‘Data Warehouse’. Error or Warning events might pinpoint the issue with the job.

– Check the CPU and Memory of the data warehouse server, and check if one or both are peaking a lot.

 

2. Common possible causes

 

2.1. Resource Pressure

The data warehouse server takes up a lot of resources to process data. Job duration and reliability can be greatly increased by providing sufficient CPU and memory resources. Exact requirements depend on each individual setup, but these are some guidelines:

CPU

Memory

Hard Drive

4-core 2.66Ghz

Server Component: 8-16GB

Databases: 8-32Gb

Server Component: 10Gb

Databases: 400Gb

2.2. Service Failure

The ETL process of the Data Warehouse depends on multiple services to function correctly:

– Microsoft Monitoring Agent

– System Center Data Access

– System Center Management Configuration

– SQL Server SCSMDW

– SQL Serer Analysis Services

– SQL Server Agent

– SQL Server

Verify if these services are running correctly (the ‘Application’ and / or ‘Operations Manager’ event logs can hold clues as to why a service can not run correctly.

2.3. Authentication Failure

Various runas-accounts are used to execute the ETL jobs:

– A workflow account that executes program logic on the data warehouse server. This account must have local administrator privileges on the data warehouse server.

– An operational database account that has access to the SCSM databases for data extraction. This account must be owner of all databases.

– A runas-account that has administrator privileges on both the operational and the data warehouse management groups.

Most of these accounts are entered during setup and should not be changed afterwards. If these accounts do not have the required permissions then some or all functionalities related to the ETL process can be impacted.

Should error events indicate that a permission issue is the cause, then verify and repair the necessary permissions for these accounts.