How To Track Errors On ETL

Umair Ramzan
7 min readMar 19, 2019

Step by step guide to creating an auto-generating email to track SSIS & SQL Server Job Errors

As I started my job as a Trainee BI Engineer at Ayra Analytics, I was assigned to develop a package which will notify the developers about the errors which occurred on the previous day in SQL server jobs which already have been scheduled.

I just wanted to share how I developed a package in this my first ever blog.

SSIS

SQL Server Integration Services is a handy tool to perform the ETL process and automate them. SQL Server Data Tools is handy to design the ETL -Extract, Transform and Load- visually and it gives a very good performance in the data warehousing process.

Also, it facilitates us to develop SSIS packages as products and automate using SQL Server Agent.

SQL Server Job

When we automate an ETL or any other process using SQL Server Agent, we have to create a SQL Server Job using SQL Server Agent. SQL server agent’s Job scheduling will provide the ability to perform the defined task — let's say executing an SSIS package — repetitively on a required frequency.

Errors

Errors are common in every software-based solution, but the quality and the effectiveness depends on how fast we encounter the error and debug it. Also in SQL server jobs while executing SSIS package there are tremendous ways which may cause errors and make the ETL process fail.

As solutions, ultimately we have two options.

  1. Design packages with good error handling features
  2. Create notification systems with a brief introduction about the error which makes the debugging process easier and much faster.

A good product should contain these two features to ensure quality.

Here I’m going to discuss the second option, How can we create an error notification system for SSIS package and SQL Server Job based automated ETL processes.

Where to find error information?

Receiving notification about the error, without error information, makes no sense. But it’s not a hard task to find the relevant information when it comes to SQL Server. SQL server logs almost everything happening on the server. SSISDB and MSDB contain the information to high granularity. What you all have to do is understand which table means what and what information it contains.

By digging into MSDB database, you can just find the below tables in the system tables folder in SQL Server Mangement Studio that contains information about SQL Server Jobs

Tables contain logs of the server agent jobs.

You can play around with your SQL querying skills to extract the information as you need. Likewise, SSISDB contains a large number of tables which logs information about SSIS package executions.

For the notification email content — which I’m going to create later on — , I wrote the following query to get data from few tables to use as my content for the email.

This query may seem large and complex, but it’s nothing but retrieving some data combining MSDB and SSISDB according to my needs.

Importantly when I wrote this query I faced the strange situation where I couldn’t find any row with unique values that I could join the tables from these two different databases(MSDB and SSISDB). One is completely dependent on SQL Server Jobs and next is completely on SSIS projects and packages.

Hopefully, I found the message from server jobs containing the execution number.

Below SQL code I found on StackOverflow rescued me from that issue.

SUBSTRING(job_history.message, NULLIF(CHARINDEX(‘Execution ID: ‘, job_history.message),0)+14 ,PATINDEX(‘%[⁰-9]%’,SUBSTRING(job_history.message, NULLIF(CHARINDEX(‘Execution ID: ‘, job_history.message),0)+14 ,20))-1) AS execution_id

this code successfully extracts the execution number from the job message and opens the gate join two databases.

DECLARE @jobNames VARCHAR(100) = ‘ErrorLoggingEmail,ETL’;
DECLARE @backDate INT = 1;--one day back

Declared variables are just to filter the results according to job name and time period. eg: assigned values will filter the failed execution from SQL Server Jobs named “ErrorLoggingEmail” and “ETL” for one day back. if you need to get information back from two days set the backdate integer to 2.

Build an SSIS package with a script task

We are all set with data now, it’s time to build an SSIS package to get run the query running to generate an email to our support team.

I have designed a simple package with just an SQL task component and script task component.

SQL task component retrieves that data from the databases(MSDB and SSISSB) using the query I have mentioned above and stores the retrieved results in SSIS object variable.

note: make sure you map result set to an SSIS object variable

setting the result set to Full Result Set

I passed the variable into the script task so inside the script task I would write my code to generate a structured email.

This email generating task could be done more easily using the SSIS provided mail task, though mail tasks support only anonymous authentication and the windows authentication with default port id. it provides the only facility to state the SMTP server you are going to use. These limitations make me choose ‘script task’ over ‘mail task’ to generate the email. another interesting feature about script task is, it can be used for generating HTML emails, which enable you to structure your email body using HTML tags such as table, headings and other text formattings.

script task

Ultimately script task contains only the main method to write our code.

public void Main(){
// to write our code
}

Alternatively, I have defined another method to send mail on behalf of the main method, so I can just call that method by passing the required variables to generate and send the email with provided attributes.

This method sends the mail using Gmail’s SMPT server. In this example, I formatted the email body using the HTML tags.

My main method would look like this

The main method is completely focused on

  1. Confirm the existing parameter values
  2. Fill a Datatable with results set
  3. Iterate through the rows of Datatable object to design the email body dynamically.

Finally, call the send mail message to send the email using the parameters and message generated.

Now we are done with our package. Before testing whether it’s working we need to create some errors intentionally. You can do this in various ways.

eg: Create a SQL server agent job to execute some SSIS packages which will fail during the execution

Let’s have a look at the email message I would receive when my package runs.

Email output

As I formatted my email message as HTML table, my email will be generated like above. The error message which is generated, explains to me that no SSIS package execution happened within the time of job run.

Parameters I used

These are the parameters I would let the deployment environment to decide. So I could deploy the same package I created on multiple servers without developing a different package. Each of these parameters will be set when deploying the package to the server per specific requirements by the server.

Deploying the package to SSISDB

There are numerous ways to deploy SSIS packages on the SSIS Catalogs and using one of those methods we could deploy the package to the server.

I would deploy the above SSIS package on the server, I already have some packages to perform ETL. I may schedule this package to run on the specific time to check whether any errors happened on the previous day or number of days before.

Limitations

  1. Since the ELT and the package I developed run by SQL server agent, if the Server agent is down we will not get any email when the error occurs.
  2. This package does not generate emails as soon as an error occurs.

Future Developments

  1. Parameterize the SMTP server connection
  2. Rather than getting an email, we could design a dashboard to track the ETL processes.

Final thoughts

I have been reading articles on medium for months. I badly wanted to experience writing something here. I would like to thank Nisal Mihiranga -tech lead at Ayra Analytics for motivating and supporting me.

--

--

Umair Ramzan

Data enthusiastic, Full stack developer. Data science enthusiastic.