Editor's note: The following post was written by Data Platform MVP Sergio Govoni as part of our Technical Tuesday series. Mia Chang of the MVP Award Blog Technical Committee served as the technical reviewer for this piece.
In order to scale, it’s imperative that companies stay ahead of competition. So how can they identify, store, manage and deliver goods to customers efficiently? The answer is: Barcode! A barcode is the graphical representation of a sequence of numbers and other symbols. The representation is made by lines (bars) and spaces. A barcode typically consists of five parts  and one of these is the check character, also known as the check digit. Handwriting this is a hard work, and is also susceptible to legibility problems. Barcoding dramatically reduces human error, recognition errors and transcription errors.
This article explores the algorithm needed to calculate the checkdigit of an UCC/EAN barcode. It compares two possible implementations of the algorithm to calculate the check character for an EAN13 barcode using TSQL language for SQL Server.
UCC/EAN standard coding
The UCC/EAN standard coding requires that every (wellformed) code ends with a checkdigit that will be used by barcode readers to interpret the code properly. The checkdigit is a number between zero and nine and it is calculated according to the other digits in the code.
The calculation algorithm requires that every digit in the code is enumerated from right to left, is shown here.
After you have enumerated each digit of the given code, you need to do the following steps to determine the check character:

 Add up the digits in the even positions
 Multiply the result of the previous step by three
 Add up the digits in the odd positions
 Add up the results obtained in steps two and three
 Subtract the upper multiple of 10 from the result obtained in step four. For example, if the result of step four is 47, subtract 50 from 47, so the result is 3.
If the result of the four step is a multiple of ten, the checkdigit will be equal to zero. Otherwise the checkdigit will be the result of the fifth step.
Let’s implement the algorithm using TSQL language
Suppose that your boss asked you to implement a TSQL object, in an SQL Server database, that it is able to calculate the checkdigit for a given EAN13 code. How can you translate the standard algorithm to TSQL code?
One possible solution could be implemented by a userdefined scalarvalued function that receives an EAN13 code as input, and gives out the checkdigit calculated for the given code. In reading the algorithm stepbystep, the most natural and obvious solution is implemented by the following TSQL function.
CREATE FUNCTION dbo.udf_get_check_digit_ucc_ean13 ( @ACode AS VARCHAR(12) ) RETURNS SMALLINT AS BEGIN /* Author: Sergio Govoni Notes: This function is able to calculate the checkdigit of an EAN13 code Version: 1.0 */ DECLARE @tmpCode AS VARCHAR(12), @tmpMulSup AS VARCHAR(8000), @tmp AS VARCHAR(8000), @i AS INT, @j AS INT, @z AS INT, @SumDEven AS INT, @SumDOdd AS INT, @List AS VARCHAR(8000), @tmpList AS VARCHAR(8000), @CheckSum AS SMALLINT SET @SumDEven = 0 SET @SumDOdd = 0 SET @List = '' SET @tmpList = '' SET @tmp = '' SET @tmpCode = @ACode /* 0. List builder */ SET @j = LEN(@tmpCode) + 1 SET @i = 1 WHILE (@i <= LEN(@tmpCode)) BEGIN SET @List = @List + '' + LTRIM(RTRIM(STR(@j))) + ';' + SUBSTRING(@tmpCode, @i, 1) SET @j = (@j  1) SET @i = (@i + 1) END /* 1. Add up the digits in even position */ SET @i = 1 SET @tmpList = @List WHILE (CHARINDEX('', @tmpList) > 0) BEGIN SET @j = CHARINDEX('', @tmpList) SET @z = CHARINDEX(';', @tmpList) IF (CAST(SUBSTRING(@tmpList, (@j + 1), (@z  (@j + 1))) AS INTEGER) % 2) = 0 BEGIN SET @SumDEven = @SumDEven + CAST(SUBSTRING(@tmpList, (@z + 1), 1) AS INTEGER) END SET @tmpList = SUBSTRING(@tmpList, (@z + 2), LEN(@tmpList)) END /* 2. Multiply the result of the previous step (the first step) to 3 (three) */ SET @SumDEven = (@SumDEven * 3) /* 3. Add up the digits in the odd positions */ SET @i = 1 SET @tmpList = @List WHILE (CHARINDEX('', @tmpList) > 0) BEGIN SET @j = CHARINDEX('', @tmpList) SET @z = CHARINDEX(';', @tmpList) IF (CAST(SUBSTRING(@tmpList, (@j + 1), (@z  (@j + 1))) AS INTEGER) % 2) <> 0 BEGIN SET @SumDOdd = @SumDOdd + CAST(SUBSTRING(@tmpList, (@z + 1), 1) AS INTEGER) END SET @tmpList = SUBSTRING(@tmpList, (@z + 2), LEN(@tmpList)) END /* 4. Add up the results obtained in steps two and three */ SET @CheckSum = (@SumDEven + @SumDOdd) /* 5. Subtract the upper multiple of 10 from the result obtained in step four */ IF ((@CheckSum % 10) = 0) BEGIN /* If the result of the four step is a multiple of Ten (10), like Twenty, Thirty, Forty and so on, the checkdigit will be equal to zero, otherwise the checkdigit will be the result of the fifth step */ SET @CheckSum = 0 END ELSE BEGIN SET @tmpMulSup = LTRIM(RTRIM(STR(@CheckSum))) SET @i = 0 WHILE @i <= (LEN(@tmpMulSup)  1) BEGIN SET @tmp = @tmp + SUBSTRING(@tmpMulSup, @i, 1) IF (@i = LEN(@tmpMulSup)  1) BEGIN SET @tmp = LTRIM(RTRIM(STR(CAST(@tmp AS INTEGER) + 1))) SET @tmp = @tmp + '0' END SET @i = (@i + 1) END SET @CheckSum = CAST(@tmp AS INTEGER)  @CheckSum END RETURN @CheckSum END;
This implementation of the algorithm to calculate checkdigit works fine. If you try to calculate the check character for the sample barcode “801271210146” the function will return the number five, so, the check digit for the given barcode is five.
SELECT [checkdigit] = dbo.udf_get_check_digit_ucc_ean13('801271210146'); GO
If you compare the result of the function five  with the checkdigit calculated by the GS1 check digit calculator, we observe that they are equal. The result of GS1 check digit calculator is shown in the following picture.
The first version of the function dbo.udf_get_check_digit_ucc_ean13 has been implemented with a rowbyrow approach. But is it really the best solution? Let’s think outside the code. The code is just a tool. We have to try to find out the logical solution and then translate it into TSQL commands.
Usually, one immediately begins by realizing the first solution that comes to mind, and solving all the problems that arise  performance problems are the first type of issue that one usually takes on. But let’s give up this approach and try to find the best algorithm (languageindependent) that can solve the problem. This is called “setbased thinking”. In chatting with Joe Celko, I found a setbased solution to implement the checkdigit algorithm.
Setbased thinking
With the following, you can find the 2.0 version of the function dbo.udf_get_check_digit_ucc_ean13.
CREATE FUNCTION dbo.udf_get_check_digit_ucc_ean13 ( @ACode AS VARCHAR(12) ) RETURNS INTEGER AS BEGIN RETURN (10  (3* CAST(SUBSTRING('0' + @ACode, 1, 1) AS INTEGER) + CAST(SUBSTRING('0' + @ACode, 2, 1) AS INTEGER) + 3* CAST(SUBSTRING('0' + @ACode, 3, 1) AS INTEGER) + CAST(SUBSTRING('0' + @ACode, 4, 1) AS INTEGER) + 3* CAST(SUBSTRING('0' + @ACode, 5, 1) AS INTEGER) + CAST(SUBSTRING('0' + @ACode, 6, 1) AS INTEGER) + 3* CAST(SUBSTRING('0' + @ACode, 7, 1) AS INTEGER) + CAST(SUBSTRING('0' + @ACode, 8, 1) AS INTEGER) + 3* CAST(SUBSTRING('0' + @ACode, 9, 1) AS INTEGER) + CAST(SUBSTRING('0' + @ACode, 10, 1) AS INTEGER) + 3* CAST(SUBSTRING('0' + @ACode, 11, 1) AS INTEGER) + CAST(SUBSTRING('0' + @ACode, 12, 1) AS INTEGER) + 3* CAST(SUBSTRING('0' + @ACode, 13, 1) AS INTEGER) )%10 )%10 END;
Why is setbased thinking so difficult? Setbased thinking is difficult Because we usually think that data are stored in a specific order, fetch operations guarantee the orderly return of data portions, one portion at a time. Our mind sees data in these terms:
 In an orderly manner
 Manipulated in portions, one portion at a time
As the previous version, the 2.0 version of the function dbo.udf_get_check_digit_ucc_ean13 works properly, but it’s better because it uses a setbased approach instead of rowbyrow ones. And it also works fine on Azure SQL Database.
Conclusion
Barcodes offer companies automatic product identification, extremely fast warehouse movement and very fast data recognition and implementation. This means that barcodes improve productivity, and help companies save time and money. For each barcode that is generated (independently from the type: EAN8, EAN12, EAN13, EAN14, GTIN and SSCC) one has to calculate the checkdigit, in order to add it at the end of the barcode characters.
In this article, we learned how to implement the check digit algorithm with TSQL language using a userdefined function, saved in a user or system database in a SQL Server instance. We optimized the first version of the function using setbased theory. The function dbo.udf_get_check_digit_ucc_ean13 can be used directly in a TSQL statement to obtain the check character of a given barcode.
Enjoy working with the EAN13 checkdigit calculator!
Sergio Govoni has been a software developer for almost 20 years. He received a Computer Science degree from Italy State University in 2000. He’s since worked at Centro Software, which produces an international multicompany ERP on the Windows platform. At Centro Software, Sergio currently serves as Technical Lead and manages the architecture and missioncritical technical details on team projects. Based in Bologna, Italy, Sergio has been a Microsoft Data Platform MVP (SQL Server category) since 2010. Follow him on Twitter @segovoni.
Personally, I’m at a loss how anyone could consider the first version “natural” or “obvious”. (The only thing really obvious about that is clearly a database server is the wrong platform for calculating a checksum).
However, the important note I want to make is that I don’t see the second as being setbased at all. You are still essentially iterating through the string; you’ve just unrolled the loop (https://en.wikipedia.org/wiki/Loop_unrolling#Early_complexity).
Now, let’s imagine that we have a table like such : CREATE TABLE Barcode ( digit int; weight int), and it is populate properly for this calculation (not sure how to do that without iterating over the string, but we’ll handwave over that for now). Then we could calculate the sum with the query: Select Sum(digit * weight) as CheckSum from Barcode. That would be a proper setbased solution. For a true set based plan, each item should be independant of each other, and not care about their position. However, since the digits’ position is the number is vitally important to the calculation, this calculation would be a rather bad example for showing setbase processing.
Hi James,
Thanks for leaving a comment here. We used the SQL Server function dbo.udf_get_check_digit_ucc_ean13 to calculate the checkdigit of a given EAN13 barcode in every report that prints barcode on sticky labels for goods in the warehouse.
>> we could calculate the sum with the query:
>> Select Sum(digit * weight) as CheckSum from Barcode
Yes, this solution is setbased, but what about Performance? There are barcode that have more than 13 digit and, in my opinion, store a row for each digit in a table is not a good solution! Consider, that we can generate thousand or hundred of thousand barcode every day, tablebased solution will not scale properly in this case.
My first point was the TSQL was not designed to do things like the calculation, and I’d need to reevaluate the need to do it there. Does the checksumless barcode value originate on the data server, (or on the application side)? Does the checksum need to be saved, (or just the barcode value) ?? If the answer to either of those is “no”, then the checksum could be calculated in application code,
Given that, the rest of my commentary about the setbased calculation was just theortical, without concern for practicality or performance.
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=88376
Nice post, thank you!
If you wish to speed up your own code, an inline table valued function will help.
If you have a Tally\Numbers table available, the second bit of code may outperform that too.
ALTER FUNCTION [dbo].[udf_get_check_digit_ucc_ean13_TAB]
(
@ACode AS VARCHAR(12)
)
RETURNS TABLE
AS RETURN
(
SELECT ChkDigit=(10 – (3* CAST(SUBSTRING(‘0’ + @ACode, 1, 1) AS INTEGER)
+ CAST(SUBSTRING(‘0’ + @ACode, 2, 1) AS INTEGER)
+ 3* CAST(SUBSTRING(‘0’ + @ACode, 3, 1) AS INTEGER)
+ CAST(SUBSTRING(‘0’ + @ACode, 4, 1) AS INTEGER)
+ 3* CAST(SUBSTRING(‘0’ + @ACode, 5, 1) AS INTEGER)
+ CAST(SUBSTRING(‘0’ + @ACode, 6, 1) AS INTEGER)
+ 3* CAST(SUBSTRING(‘0’ + @ACode, 7, 1) AS INTEGER)
+ CAST(SUBSTRING(‘0’ + @ACode, 8, 1) AS INTEGER)
+ 3* CAST(SUBSTRING(‘0’ + @ACode, 9, 1) AS INTEGER)
+ CAST(SUBSTRING(‘0’ + @ACode, 10, 1) AS INTEGER)
+ 3* CAST(SUBSTRING(‘0’ + @ACode, 11, 1) AS INTEGER)
+ CAST(SUBSTRING(‘0’ + @ACode, 12, 1) AS INTEGER)
+ 3* CAST(SUBSTRING(‘0’ + @ACode, 13, 1) AS INTEGER)
) % 10)
)
*********************************************************************************
ALTER FUNCTION dbo.udf_get_check_digit_ucc_ean13_DEC_TAB
(
@ACode AS VARCHAR(12)
)
RETURNS TABLE
AS RETURN
(
SELECT ChkDigit = 10 – SUM(SUBSTRING(@ACode, N, 1) * IIF(N%2=0,1,3)) % 10
FROM dbo.Numbers
WHERE N <= LEN(@ACode)
)
Can’t the first line 3* CAST(SUBSTRING(‘0’ + @ACode, 1, 1) AS INTEGER) be removed? Since you are taking the substring of the first char and it is always ‘0’, it will always cast to 0. So, you can:
1 remove the first line
2 remove the ‘0’ from all of the other lines and decrement the start index to SUBSTRING by 1
3. I’d also consider grouping the substrings by even and odd so you could you multiple the (now) even indices by 3 as a group
I hope this makes sense….
When I’m teaching people to move from procedural code to setbased code, I use this example. If they started with C as their programming language, then left to right scanning of strings was used in many of the classic textbook algorithms. In particular, converting a string of digits into an integer inside a loop (or recursively), or parsing a CSV string into local variables. Because of that, your first approach of looping seems natural to them.
I tell them to look at each position in the tag number string and what happens to it. The first thing is that it’s sliced out, turned into an integer and multiplied by weight. Then all of the weights are summed. The next question is “how many of these operations can be done in parallel?” It quickly becomes obvious that multiplying by weights is parallelizable (is that a real word? Well, it should be), but summation is not. Summation only works on a completed set.
The other thing I really, really like about a declarative check digit computation is that I can move it into the DDL in SQL with a simple CHECK() clause. This means I don’t have to keep doing the same code over and over and over. I can do it one place one way one time and be safe.
Here’s my goto resource for this:
https://www.redgate.com/simpletalk/sql/tsqlprogramming/calculatingandverifyingcheckdigitsintsql/