When “UTC everywhere” isn’t enough - storing time zones in PostgreSQL and SQL Server
When “UTC everywhere” isn’t enough
I’ve been dealing a lot with timestamps, timezones and database recently - especially on PostgreSQL (see this blog post), but also in general. Recently, on the Entity Framework Core community standup, we also hosted Jon Skeet and chatted about NodaTime, timestamps, time zones, UTC and how they all relate to databases - I highly recommend watching that!
Now, a lot has been said about “UTC everywhere”; according to this pattern, all date/time representations in your system should always be in UTC, and if you get a local timestamp externally (e.g. from a user), you convert it to UTC as early as possible. The idea is to quickly clear away all the icky timezone-related problems, and to have a UTC-only nirvana from that point on. While this works well for many cases - e.g. when you just want to record when something happened in the global timeline - it is not a silver bullet, and you should think carefully about it. Jon Skeet already explained this better than I could, so go read his blog post on this. As a very short tl;dr, time zone conversion rules may change after the moment you perform the conversion, so the user-provided local timestamp (and time zone) may start converting to a different UTC timestamp at some point! As a result, for events which take place on a specific time in a specific time zone, it’s better to store the local timestamp and the time zone (not offset!).
So let’s continue Jon’s blog post, and see how to actually perform that on two real databases - PostgreSQL and SQL Server. Following Jon’s preferred option, we want to store the following in the database:
- The user-provided local timestamp.
- The user-provided time zone ID. This is not an offset, but rather a daylight savings-aware time zone, represented as a string.
- A UTC timestamp that’s computed (or generated) from the above two values. This can be used to order the rows by their occurrence on the global timeline, and can even be indexed.
In Jon’s NodaTime library, the ZonedDateTime type precisely represents the first two values above. Unfortunately, databases typically don’t have such a type; SQL Server does have datetimeoffset
, but an offset is not a time zone (it isn’t daylight savings-aware). So we must use separate columns to represent the data above.
We’ll start with PostgreSQL, but we’ll later see how things work with SQL Server as well. The code samples below will show Entity Framework Core, but the same should be doable with any other data access layer as well.
PostgreSQL
PostgreSQL conveniently has a type called timestamp without time zone
for local timestamps in an unspecified time zone, and a badly-named type called timestamp with time zone
, for UTC timestamps (no time zone is actually persisted); those are perfect for our two timestamps. We also want the UTC timestamp to be generated from the two other values, so we’ll set up a PostgreSQL generated column (called computed column by EF Core) to do that. Here’s the minimal EF Core model and context, using the NodaTime plugin:
public class EventContext : DbContext
{
public DbSet<Event> Events { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder.UseNpgsql(@"Host=localhost;Username=test;Password=test", o => o.UseNodaTime());
protected override void OnModelCreating(ModelBuilder modelBuilder)
=> modelBuilder.Entity<Event>(b =>
{
b.Property(b => b.UtcTimestamp)
.HasComputedColumnSql(@"""LocalTimestamp"" AT TIME ZONE ""TimeZoneId""", stored: true);
b.HasIndex(b => b.UtcTimestamp);
});
}
public class Event
{
public int Id { get; set; }
public LocalDateTime LocalTimestamp { get; set; }
public Instant UtcTimestamp { get; set; }
public string TimeZoneId { get; set; }
}
This causes the following table to be created:
CREATE TABLE "Events" (
"Id" integer GENERATED BY DEFAULT AS IDENTITY,
"LocalTimestamp" timestamp without time zone NOT NULL,
"UtcTimestamp" timestamp with time zone GENERATED ALWAYS AS ("LocalTimestamp" AT TIME ZONE "TimeZoneId") STORED,
"TimeZoneId" text NULL,
CONSTRAINT "PK_Events" PRIMARY KEY ("Id")
);
A few notes on the above:
- The
AT TIME ZONE
operator in the generated column definition converts our local timestamp to a UTC timestamp, using the time zone recorded in the other column. - PostgreSQL uses IANA/Olson timezone IDs - this is what you need to store in
TimeZoneId
. These time zones look likeEurope/Berlin
, and are not the Windows time zones that .NET developers are usually used to. The good news is that .NET 6.0 contains time zone improvements which allow working with IANA/Olson time zones. UtcTimestamp
is a stored generated column, meaning that its value gets computed whenever the row is modified, and gets persisted in the table just like any other column. Databases usually also support non-stored generated columns, which get computed every time upon select, but PostgreSQL does not support these yet. This distinction will actually be important further down.- We create an index over our generated column, which allows us to efficiently perform queries on our events, e.g. get all of them sorted on the global timeline.
Perfect, job done… or is it?
The astute reader will have noticed that since our UTC timestamp is a stored generated column, it’s computed when we insert the row, and is not recomputed again unless the row changes. So what happens if the time zone database actually changes after that? That’s right - our UTC timestamp may not longer be correct, and that’s exactly the problem we wanted to fix by preserving the original, user-provided local time and time zone! To “resync” the UTC timestamp, we can recreate the column after a time zone database change (or just periodically):
ALTER TABLE "Events" DROP COLUMN "UtcTimestamp";
ALTER TABLE "Events" ADD COLUMN "UtcTimestamp" timestamp with time zone GENERATED ALWAYS AS ("LocalTimestamp" AT TIME ZONE "TimeZoneId") STORED;
Note that all this assumes you actually need the UTC timestamp as a database column; an alternative would be to omit it, and to perform the time zone conversion in your queries. For example, with the NodaTime plugin you can do the following:
var events = await ctx.Events
.OrderBy(e => e.LocalTimestamp.InZoneLeniently(DateTimeZoneProviders.Tzdb[e.TimeZoneId]).ToInstant())
.ToListAsync();
This will translate to the following query:
SELECT e."Id", e."LocalTimestamp", e."TimeZoneId"
FROM "Events" AS e
ORDER BY e."LocalTimestamp" AT TIME ZONE e."TimeZoneId"
This effectively does the same thing as the generated column above, but doing the time zone conversion at query time; this ensures the up-to-date time zone database is always used, and does not take up any disk space. The main disadvantage, of course, is that you can’t have an index over the UTC timestamp, so operations like sorting will be slow.
SQL Server
Let’s see how this whole thing works on another database - SQL Server. We’ll do pretty much the same thing, but to change things up, we’ll just use the native BCL DateTime
type instead of NodaTime (although a NodaTime plugin for the SQL Server provider does exist). As before, here’s the minimal EF Core model and context:
public class EventContext : DbContext
{
public DbSet<Event> Events { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder.UseSqlServer(@"<connection string>")
protected override void OnModelCreating(ModelBuilder modelBuilder)
=> modelBuilder.Entity<Event>(b =>
{
b.Property(b => b.UtcTimestamp)
.HasComputedColumnSql(@"[LocalTimestamp] AT TIME ZONE [TimeZoneId] AT TIME ZONE 'UTC'", stored: true);
b.HasIndex(b => b.UtcTimestamp);
});
}
public class Event
{
public int Id { get; set; }
public DateTime LocalTimestamp { get; set; }
public DateTimeOffset UtcTimestamp { get; set; }
public string TimeZoneId { get; set; }
}
A couple of notes, comparing this to PostgreSQL:
- On SQL Server,
AT TIME ZONE
returns adatetimeoffset
type - that’s whyUtcTimestamp
is aDateTimeOffset
. If you really want aUtcTimestamp
to be aDateTime
, you can add a conversion back fromdatetimeoffset
todatetime2
. - The computed column SQL is a bit more complicated: we first convert the local timestamp to a
datetimeoffset
in the user’s time zone, and then to a UTCdatetimeoffset
.
Looks great… except that trying to create the table gives us the following error: Computed column 'UtcTimestamp' in table 'Events' cannot be persisted because the column is non-deterministic
. SQL Server is stricter than PostgreSQL here: since the AT TIME ZONE
operator depends on an external time zone database - which can change at any time - it is non-deterministic, and therefore cannot be used in a computed column definition. In effect, SQL Server is alerting you to the danger discussed above - your UTC timestamp may become out of sync with its inputs.
If you’re willing to give up the index, then unlike PostgreSQL you can use a non-stored computed column instead:
modelBuilder.Entity<Event>()
.Property(e => e.UtcTimestamp)
.HasComputedColumnSql(@"[LocalTimestamp] AT TIME ZONE [TimeZoneId] AT TIME ZONE 'UTC'");
Note that we removed the stored: true
we had before (the default is non-stored). This column cannot be indexed, and effectively fulfils the same purpose as the PostgreSQL query we saw above. If you do want an indexed column, then you’ll have to set up a database trigger to keep UtcTimestamp
up to date:
CREATE OR ALTER TRIGGER Events_UPDATE ON Events
AFTER INSERT, UPDATE
AS
BEGIN
SET NOCOUNT ON;
DECLARE @Id INT
DECLARE @TimeZone NVARCHAR(MAX)
DECLARE @LocalTimestamp DATETIME2
SELECT @Id = INSERTED.Id FROM INSERTED
SELECT @LocalTimestamp = INSERTED.LocalTimestamp FROM INSERTED
SELECT @TimeZone = INSERTED.TimeZoneId FROM INSERTED
UPDATE [Events]
SET [UtcTimestamp] = @LocalTimestamp AT TIME ZONE @TimeZone AT TIME ZONE 'UTC'
WHERE Id = @Id
END;
If you’re using EF Core Migrations, you can use raw SQL to define this trigger. Note that it’s now your responsibility to redo the conversions when the time zone database changes, just like with PostgreSQL above.
Some closing words
It’s interesting to compare PostgreSQL and SQL Server on what is considered a non-deterministic function (and therefore, what can be used in a computed column). I sent a message about this to the PostgreSQL maintainers, and Tom Lane explained that if we’re absolutely strict, then even string comparison isn’t really deterministic, since it depends on collation rules which may also change. One could claim that if users need an auto-updating column that uses AT TIME ZONE
, they’ll end up doing it with a trigger in any case, like we’ve done above for SQL Server; so we may as well make it easier and not disallow it in generated columns. It’s the user’s responsibility to take care of resyncing in any case.
Finally, if you think that converting a local date to UTC is simple - even when we know the time zone - then I encourage you to read the “Ambiguous and skipped times” section in Jon Skeet’s post. Timestamps are just so much fun.
Comments