Quantcast
Viewing latest article 6
Browse Latest Browse All 62

SQL Server BIT_COUNT and an Alternative

I was editing an article the other day that uses the BIT_COUNT function that was added to SQL Server 2022. The solution presented is excellent, and I will try to come back and link to it here when I remember. (It will be linked the other way.

Reading that did two things for me. First it cave me an idea of how the BIT_COUNT function might be actually be used in a useful way. The solution that was presented would only work in SQL Server 2022 (It will work in earlier compatibility levels, based on the tests I have done.)

Bitwise operations, really?

As a person who is BIG into relational design and getting a well normalized database, let’s start here. There is rarely a reason to use the bits of an integer value for any reason. I have argued with people plenty about this sort of thing, like when someone wanted to create a multi-valued domain table like the following example.

Note that this isn’t an optimal design, so I am not advocating this for any new build. But it does happen, including places like SQL Server’s system tables.

USE TempDB;
GO
CREATE SCHEMA DemoBits;
GO
CREATE TABLE DemoBits.Person
(
    PersonId int NOT NULL IDENTITY 
        CONSTRAINT PKPerson PRIMARY KEY,
    Name varchar(20) NOT NULL
        CONSTRAINT AKPerson UNIQUE,
    Attributes int NOT NULL
        CONSTRAINT DFLTPerson_Attributes DEFAULT (0)
)
GO
CREATE TABLE DemoBits.AttributeType
(
    Attribute   int NOT NULL 
        CONSTRAINT PKAttributeTypes PRIMARY KEY,
    Value varchar(20) NOT NULL
)
GO

Now you just load up the AttributeType table with values that have primary key of the values that use a single bit in an integer:

INSERT INTO DemoBits.AttributeType
VALUES (1, 'AttributeType 1');

INSERT INTO DemoBits.AttributeType
VALUES (2, 'AttributeType 2');

INSERT INTO DemoBits.AttributeType
VALUES (4, 'AttributeType 3');

INSERT INTO DemoBits.AttributeType
VALUES (8, 'AttributeType 4');

Then, you have a multi-valued relationship with one less table. You can’t join to it conveniently, but it does work. For example:

INSERT INTO DemoBits.Person(Name, Attributes)
VALUES ('John', 1+8);

Now, you have a Person row, and you can join to the attributes in this way:

SELECT *
FROM   DemoBits.Person
        JOIN DemoBits.AttributeType
            ON Person.Attributes & AttributeType.Attribute = AttributeType.Attribute

Which returns:

PersonId    Name                 Attributes  Attribute   Value
----------- -------------------- ----------- ----------- --------------------
1           John                 9           1           AttributeType 1
1           John                 9           8           AttributeType 4

And you can see that this person has two attributes with BIT_COUNT:

SELECT BIT_COUNT(Attributes)
FROM  DemoBits.Person 
WHERE Name = 'John'

This returns:

--------------------
2

Now, of course I am not advocating this solution at all. There have been a few articles here on Simple Talk that made nice use of bits, for example Aaron Bertrand’s Days of the Week article. But as a rule, this is not really a good design pattern, mostly for the reason that the join I created is very cryptic for 90% of people (and I didn’t exactly pull it out of active memory myself!).

But what I did want to do is fix is if you are using a pre-SQL Server 2022 version of SQL Server and want to count the bits in an integer value, you can still use the technique from the forthcoming article I mentioned at the start of this article.

The way this is done is basically shown in the join criteria. You and every single bit binary value, and when it is equal to the value you anded, you add one to the output. While the values you use are only the ones with a single bit, but it can also be used to find items with multiple bits. Like if you wanted to find the Person rows where they only have Attributes 1 and 4, you would just look for rows where Attributes = 9. But if you want to look for rows with at least those two:

SELECT Name
FROM  DemoBits.Person 
WHERE Attributes & 9 = 9;

And even if you updated it to 11:

UPDATE DemoBits.Person
SET    Attributes = 11
WHERE  Name = 'John'

The SELECT statement will still return ‘John’, and since you will still have the 1 and 8 bits set and now the SELECT with the JOIN from earlier in the chapter will return all three values:

PersonId    Name                 Attributes  Attribute   Value
----------- -------------------- ----------- ----------- --------------------
1           John                 11          1           AttributeType 1
1           John                 11          2           AttributeType 2
1           John                 11          8           AttributeType 4

And BIT_COUNT will return 3.

The real point here is

Before I go too far and make this an article that looks like it is advocating for multi-valued, poorly normalized, bit based relationships… let’s get to the point. Counting bits.

If BIT_COUNT isn’t available, you can use something like this:

SELECT CASE WHEN Attributes & 1 = 1 THEN 1 ELSE 0 END +
       CASE WHEN Attributes & 2 = 2 THEN 1 ELSE 0 END +
       CASE WHEN Attributes & 4 = 4 THEN 1 ELSE 0 END +
       CASE WHEN Attributes & 8 = 8 THEN 1 ELSE 0 END 
FROM DemoBits.Person

Which will return 3, just like BIT_COUNT will. And you can extend this pretty easily to cover all integers by doing all POWER(2) values that will fit in an integer. This is tedious to type, so I will do this with a bit of code.

Note: I am using the GENERATE_SERIES function that also was added to SQL Server 2022, but you can generate a numbers table using code like this one from Greg Larsen a few years ago if you are on an earlier version.

SELECT CONCAT('CASE WHEN @TestValue & ',
              POWER(2,Value), '=', 
              POWER(2,VALUE), ' THEN 1 ELSE 0 END + ')
--only 31 values since a signed integer has < a 32 bit integer
FROM   GENERATE_SERIES(0,30)

Now, copy that code and make the following function. The part after the --copied code is what i generated, including all the bits available.

I added an extra line to handle negative values. I would be wary of using that special bit using any method that is giving meaning to the bits, but it is available as a 32nd bit and counted by the BIT_COUNT function:

CREATE OR ALTER FUNCTION DemoBits.BitCount_Int
(
    @TestValue int
) 
RETURNS int AS
BEGIN
RETURN(
--copied code
CASE WHEN @TestValue & 1=1 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 2=2 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 4=4 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 8=8 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 16=16 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 32=32 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 64=64 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 128=128 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 256=256 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 512=512 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 1024=1024 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 2048=2048 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 4096=4096 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 8192=8192 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 16384=16384 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 32768=32768 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 65536=65536 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 131072=131072 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 262144=262144 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 524288=524288 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 1048576=1048576 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 2097152=2097152 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 4194304=4194304 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 8388608=8388608 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 16777216=16777216 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 33554432=33554432 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 67108864=67108864 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 134217728=134217728 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 268435456=268435456 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 536870912=536870912 THEN 1 ELSE 0 END + 
CASE WHEN @TestValue & 1073741824=1073741824 THEN 1 ELSE 0 END +
CASE WHEN @TestValue < 0 THEN 1 ELSE 0 END) --there is a bit to say it is negative 
END
GO

Now, you should be able to use this for most of your bit counting uses just like BIT_COUNT.

You could extend it for any type of uses but I will submit to you that if you find yourself doing bitwise operations for any reason other than legacy data/designs/or increasing processing performance with several orders of magnitude performance gains (like eliminating necessary loops in code that are executed many times over), you should rethink your plans.

Testing your solution

Like any code experiment, one last thing to do, make sure you have tested the code. This code can be tested like this using a SQL Server 2022 instance, though I wouldn’t try to run this on your laptop if you are on battery.

It is doing over 4 billion iterations of those functions and on my 4 core laptop with 11th Gen Intel Core i7-1185G7 @ 3.00GHz CPU (that is also running Discord, Slack, and especially Word as I edit this to a postable state while I wait for it to finish), it took “going out to lunch” time to complete.

DECLARE @StartTime datetime2(7) = sysdatetime();

SELECT Value
FROM GENERATE_SERIES(-2147483648,2147483647) --testing all of the integers
WHERE DemoBits.BitCount_Int(value) <> BIT_COUNT(value);

SELECT Datediff(millisecond,@StartTime,sysdatetime()) AS TEST_THEM_BOTH;

The CPU graph was fun to watch while this was running. It is not doing any parallel operations. I will show you the plan for a more performance oriented test in a bit.

For me, this returned: 5286682 milliseconds which is and, as SSMS shows me, over 128 minutes and 7 seconds. I mean, I did ask it to do that function and the BIT_COUNT function over 4 billion times each, so I am pretty happy with that.

Unlike most code, because I have tested every single possible value that can be entered, this code is actually verified correct for any input that is in the allowable range. Most functions cannot be tested this well, and to be fair, you wouldn’t want to use my test as your unit test either, since it takes so long.

When this returns 0 rows, you know it works like it should.

Which version is faster?

One of the only reasons (other than being stuck in a legacy version) you typically say “use your own function instead of a system function” is performance. The FORMAT function is one of those cases that Aaron Bertrand covered a few years ago on this SQL Performance post.

So while it is established that this function works correctly, and 100% better than not having one built in in previous versions, in this section, I want to run a quick test to see which one performs better or worse.

To do this, I will run the functions a few times on a fairly large set of data. Whichever one was the fastest, I ran it before and after the slower one, to make sure there was nothing weird about the test based on ordering. (I also altered the order a few times before deciding to accept my results.

DROP TABLE IF EXISTS #holdResults1;
DROP TABLE IF EXISTS #holdResults2;
GO
CREATE TABLE #holdResults1 (value int);
CREATE TABLE #holdResults2 (value int);
GO
--capture the current time, and set the range of values to test with. 
--This doesn’t need to be all values, just a lot
DECLARE @StartTime datetime2(7) = sysdatetime();
DECLARE @StartValue int = 0;
DECLARE @EndValue int = 1000000;

--test system function
INSERT INTO #holdResults1
SELECT Value
FROM GENERATE_SERIES(@StartValue, @EndValue)
WHERE BIT_COUNT(value ) = 10; --I didn’t want to do anything the optimizer
                       --could optimize away. It needs to run the functions for each row

SELECT DATEDIFF(millisecond,@StartTime,sysdatetime()) AS BIT_COUNT;

--reset the times, do test with the user function
SET @StartTime = sysdatetime();
INSERT INTO #holdResults2
SELECT Value
FROM GENERATE_SERIES(@StartValue, @EndValue)
WHERE DemoBits.BitCount_Int(value) = 10;

SELECT datediff(millisecond,@StartTime,sysdatetime()) AS MY_FUNCTION;

--output counts of matches, for comparison
SELECT COUNT(*) BIT_COUNT_TOTALROWS 
FROM #HoldResults1;

SELECT COUNT(*) MY_FUNCTION_TOTALROWS from #HoldResults2;

--then run the system one again, and insert into the same table
SET @StartTime = sysdatetime();
INSERT INTO #holdResults1
SELECT Value
FROM GENERATE_SERIES(@StartValue, @EndValue)
WHERE BIT_COUNT(value) = 10;

SELECT datediff(millisecond,@StartTime,sysdatetime()) AS BIT_COUNT_AGAIN;
GO

On my machine, these queries executed in considerably different times, and the system function blew it away by an order of magnitude.

BIT_COUNT
-----------
258

MY_FUNCTION
-----------
1425

BIT_COUNT
-----------
400

Mind you, these are milliseconds for calculating the value a million times. And 1.4 seconds versus .4 seconds would be a win for the system function, even if it was the slower of the two! But it definitely is faster using the system function.

In Conclusion

I started this experiment because of the usage of BIT_COUNT in an article that I also felt would be useful in previous versions of SQL Server and to be honest, today is the first Saturday of the year with no football, so I had some time on my hands.

I think the function: BitCount_Int would be a good enough replacement for BIT_COUNT in your earlier systems that should work in reasonable version of SQL Server.

And remember, flipping and counting bits should be reserved for only the most time sensitive, performance tuning operations. It makes code hard to understand, hard to use, and so many more negatives. But if you find a solution that executes 100- or 1000-times shorter execution time than not using it… I can’t fight you on that, though I still will try.

The post SQL Server BIT_COUNT and an Alternative appeared first on Simple Talk.


Viewing latest article 6
Browse Latest Browse All 62

Trending Articles