Saturday, February 25, 2012

Perform aggregate functions on uniqueidentifiers

For some reason, [on sql2k] one cannot perform "Count(X)" where X is of type
uniqueidentifier. Will future versions of sql server suffer from this
limitation? 2003 or 2005?
We came across this problem when we had to execute a query with multiple
table joins.Hasani,
The workaround that I use is to store them as BINARY(16).
"Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com> wrote
in message news:%233sMM$flEHA.3564@.TK2MSFTNGP14.phx.gbl...
> For some reason, [on sql2k] one cannot perform "Count(X)" where X is of
type
> uniqueidentifier. Will future versions of sql server suffer from this
> limitation? 2003 or 2005?
> We came across this problem when we had to execute a query with multiple
> table joins.
>|||clever, i'll tell my supervisor tomorrow.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:Oc23mQglEHA.592@.TK2MSFTNGP11.phx.gbl...
> Hasani,
> The workaround that I use is to store them as BINARY(16).
>
> "Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com>
> wrote
> in message news:%233sMM$flEHA.3564@.TK2MSFTNGP14.phx.gbl...
>> For some reason, [on sql2k] one cannot perform "Count(X)" where X is of
> type
>> uniqueidentifier. Will future versions of sql server suffer from this
>> limitation? 2003 or 2005?
>> We came across this problem when we had to execute a query with multiple
>> table joins.
>>
>|||"Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com> wrote
in message news:%23gbFEiglEHA.2892@.tk2msftngp13.phx.gbl...
> clever, i'll tell my supervisor tomorrow.
If you want to get even tricker, you can experiment with doing something
like this when you store the GUID:
SELECT CONVERT(BINARY(6), GETDATE()) + CONVERT(BINARY(10), NEWID()) AS
DateGUID
This reduces the uniqueness a bit (removes 6 of the 16 bytes), but not
too much because there are only so many rows you can insert every 3
milliseconds. The upside is that you can now cluster on your GUID column
without destroying INSERT performance.|||Will sql server allow binary columntypes as primary keys?
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:eb$oqkglEHA.3712@.TK2MSFTNGP15.phx.gbl...
> "Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com>
> wrote
> in message news:%23gbFEiglEHA.2892@.tk2msftngp13.phx.gbl...
>> clever, i'll tell my supervisor tomorrow.
> If you want to get even tricker, you can experiment with doing
> something
> like this when you store the GUID:
> SELECT CONVERT(BINARY(6), GETDATE()) + CONVERT(BINARY(10), NEWID()) AS
> DateGUID
> This reduces the uniqueness a bit (removes 6 of the 16 bytes), but not
> too much because there are only so many rows you can insert every 3
> milliseconds. The upside is that you can now cluster on your GUID column
> without destroying INSERT performance.
>|||"Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com> wrote
in message news:eP7D5uglEHA.712@.TK2MSFTNGP09.phx.gbl...
> Will sql server allow binary columntypes as primary keys?
Yes. When I have used GUIDs as primary keys (rarely, I don't think it's
a great idea most of the time), I have used the BINARY(16) technique. More
recently I've used the date concatenation technique in a project and it
worked out very well.|||What are you reasons for not using a guid as a primary key?
We currently use integers as a primary key, but we use a stored procedure to
generate a unqiue random non-sequential integer, and we store this value in
a table to stop duplicates. In that scenario, I'm arguing that we should
just use uniqueidentifier types because we seem to just be reinventing the
wheel, but then someone mention the aggregate function thing with
uniqueidentifier types. I'm not aware of any penalties associated with using
uniqueidentifier types though, other than, it will require more bytes per
column, than an int.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23OvcnxglEHA.596@.tk2msftngp13.phx.gbl...
> "Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com>
> wrote
> in message news:eP7D5uglEHA.712@.TK2MSFTNGP09.phx.gbl...
>> Will sql server allow binary columntypes as primary keys?
> Yes. When I have used GUIDs as primary keys (rarely, I don't think
> it's
> a great idea most of the time), I have used the BINARY(16) technique.
> More
> recently I've used the date concatenation technique in a project and it
> worked out very well.
>|||Hasani (remove nospam from address) wrote:
> What are you reasons for not using a guid as a primary key?
> We currently use integers as a primary key, but we use a stored
> procedure to generate a unqiue random non-sequential integer, and we
> store this value in a table to stop duplicates. In that scenario, I'm
> arguing that we should just use uniqueidentifier types because we
> seem to just be reinventing the wheel, but then someone mention the
> aggregate function thing with uniqueidentifier types. I'm not aware
> of any penalties associated with using uniqueidentifier types though,
> other than, it will require more bytes per column, than an int.
You're right in that it's a lot more bytes per row using a UID as
opposed to an INT IDENTITY. Four times the storage, which translates to
a much larger index when using a uniqueidentifier. And as Adam
eloquently mentioned, using a UID as a clustered key does not work well
because you get a lot of page splitting and head movement on the drives.
Adding a date component as a prefix to the UID prevents much of th epage
splitting, increasing insert performance. However, using a UID as
clustered key means propagating that key to all non-clustered indexes,
making them much larger as well.
If you can, I would stick with an INT IDENTITY column for a PK.
David G.|||"Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com> wrote
in message news:OUyrc5glEHA.2892@.tk2msftngp13.phx.gbl...
> What are you reasons for not using a guid as a primary key?
> We currently use integers as a primary key, but we use a stored procedure
to
> generate a unqiue random non-sequential integer, and we store this value
in
> a table to stop duplicates. In that scenario, I'm arguing that we should
> just use uniqueidentifier types because we seem to just be reinventing the
> wheel, but then someone mention the aggregate function thing with
> uniqueidentifier types. I'm not aware of any penalties associated with
using
> uniqueidentifier types though, other than, it will require more bytes per
> column, than an int.
I think David G pointed out most of the issues in his post, so I'll
instead refer to the only times I have had to use a GUID, which is when the
application itself was responsible for creating the key. Applications
cannot reliably create unique integers, so GUIDs are pretty much the only
choice (or natural primary keys, if there's one available).
Also, why would you want to use a non-sequential random integer instead
of an IDENTITY?|||Maybe I contradicted myself when I said non-sequential random...
We essentially need a random number generator to use as a primary key value.
I don't know if sql supports it. All I've seen is a unique number generator
that increments by one on every insert. It's unique but not random. The
problem is is, this value is going to be made public and we don't want to
make it obvious that it's just an incrementing value (think cookies and
websessions).
What we currently do (sometimes) is have 2 columns, I that's an
autoincrementing int that's a primary key, and the other is a
uniqueidentifer column that isn't a primary key (but may have a constraint
to make sure there are no duplicates), and we would make the uniqueidentifer
value public so in a cookie, it would always look random.
I don't feel comfortable in the scenario because you have 2 columns that are
doing the same thing (preserving/ensuring uniqueness). So I'm trying to look
at all the tradeoffs of using a uniqueidentifier instead of an int, and vice
versa.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23IUly9mlEHA.1652@.TK2MSFTNGP09.phx.gbl...
> "Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com>
> wrote
> in message news:OUyrc5glEHA.2892@.tk2msftngp13.phx.gbl...
>> What are you reasons for not using a guid as a primary key?
>> We currently use integers as a primary key, but we use a stored procedure
> to
>> generate a unqiue random non-sequential integer, and we store this value
> in
>> a table to stop duplicates. In that scenario, I'm arguing that we should
>> just use uniqueidentifier types because we seem to just be reinventing
>> the
>> wheel, but then someone mention the aggregate function thing with
>> uniqueidentifier types. I'm not aware of any penalties associated with
> using
>> uniqueidentifier types though, other than, it will require more bytes per
>> column, than an int.
> I think David G pointed out most of the issues in his post, so I'll
> instead refer to the only times I have had to use a GUID, which is when
> the
> application itself was responsible for creating the key. Applications
> cannot reliably create unique integers, so GUIDs are pretty much the only
> choice (or natural primary keys, if there's one available).
> Also, why would you want to use a non-sequential random integer instead
> of an IDENTITY?
>|||"Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com> wrote
in message news:%23Aar%23RnlEHA.1356@.TK2MSFTNGP09.phx.gbl...
> Maybe I contradicted myself when I said non-sequential random...
> We essentially need a random number generator to use as a primary key
value.
> I don't know if sql supports it. All I've seen is a unique number
generator
> that increments by one on every insert. It's unique but not random. The
> problem is is, this value is going to be made public and we don't want to
> make it obvious that it's just an incrementing value (think cookies and
> websessions).
If you're only generating one at a time, why not just use RAND()?|||We'll there's a stored procedure someone created that uses RAND to create a
unique integer, by storing all values created by the stored proc in a table,
to stop duplicates, but, unfortunately, when a record is deleted that has a
value generated by the stored procedure, it doesn't remove the generated
value from the lookup table used by the stored procedure. That's the only
reason why I'm was RAND, but I can modify the code to make sure deleted
records 'release' the generated RAND value. But I do like the uid because
it's alphanumeric, which is secure more in a cookie, well in cracking time,
than an all numeric cookie.
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:OGss9inlEHA.748@.TK2MSFTNGP15.phx.gbl...
> "Hasani (remove nospam from address)" <hblackwell@.n0sp4m.popstick.com>
> wrote
> in message news:%23Aar%23RnlEHA.1356@.TK2MSFTNGP09.phx.gbl...
>> Maybe I contradicted myself when I said non-sequential random...
>> We essentially need a random number generator to use as a primary key
> value.
>> I don't know if sql supports it. All I've seen is a unique number
> generator
>> that increments by one on every insert. It's unique but not random. The
>> problem is is, this value is going to be made public and we don't want to
>> make it obvious that it's just an incrementing value (think cookies and
>> websessions).
> If you're only generating one at a time, why not just use RAND()?
>|||Hasani (remove nospam from address) wrote:
> We'll there's a stored procedure someone created that uses RAND to
> create a unique integer, by storing all values created by the stored
> proc in a table, to stop duplicates, but, unfortunately, when a
> record is deleted that has a value generated by the stored procedure,
> it doesn't remove the generated value from the lookup table used by
> the stored procedure. That's the only reason why I'm was RAND, but I
> can modify the code to make sure deleted records 'release' the
> generated RAND value. But I do like the uid because it's
> alphanumeric, which is secure more in a cookie, well in cracking
> time, than an all numeric cookie.
>
You could add a computed column to the table to do the same thing (which
would eliminate the overhead of using a uniqueidentifier altogether).
And you can start the identity value higher if you don't want it to
start at 0.
Something like:
Create Table #test (
ID INT IDENTITY NOT NULL,
SomeText nvarchar(10),
"CookieID" as N'ALPHA-STUFF' + RIGHT(N'0000000000' + CAST(ID as
NVARCHAR(10)), 10))
Insert into #test values ('ABC')
Insert into #test values ('ABC')
Insert into #test values ('ABC')
Insert into #test values ('ABC')
Insert into #test values ('ABC')
Select * from #test
David G.

No comments:

Post a Comment