SQL: Soft Deleting and Unique Constraint
One issue with soft delete, it's not compatible with unique constraints.
Introduction
There are two ways to delete data in a database: soft delete and hard delete. Soft delete is a process where the data is not permanently deleted but rather marked as deleted by adding a column like a deleted_at
timestamp. This allows the data to be retrieved later if needed. Hard delete, on the other hand, is a process where the data is permanently deleted from the database and cannot be retrieved. Soft delete is often preferred over hard delete as it provides a safety net in case data is accidentally deleted. It also allows for auditing and tracking of deleted data.
The Problem
One issue with soft delete is that it is not compatible with unique constraints. This is because a unique constraint is enforced across all rows, including those that have been marked as deleted with a deleted_at
timestamp. As a result, if a row is soft deleted, the unique constraint will prevent the insertion of a new row with the same values, even though the original row is no longer active. This can lead to data integrity issues and should be considered when implementing soft delete in a database.
Consider the following table:
CREATE TABLE users(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
deleted_at TIMESTAMP NULL,
PRIMARY KEY (id),
UNIQUE KEY (email)
);
-- We are using a soft-delete mechanism, we won't be deleting any row.
REVOKE DELETE ON users FROM 'halimsamy'@'localhost';
As you can see, The deleted_at
column is used to determine whether it's deleted or not, and when exactly it has been deleted. We are revoking/removing the delete permission on this table because we will not hard-deleting anything for sure.
Now let's play a bit with our table and see what happens:
-- Add a new user.
INSERT INTO users(name, email) VALUES ('Halim Samy', 'me@halimsamy.com');
-- Delete the added user
UPDATE users SET deleted_at = NOW() WHERE id = 1;
-- Ok, The user was deleted, Let's add him again.
INSERT INTO users(name, email) VALUES ('Halim Samy', 'me@halimsamy.com');
-- ERROR:
-- [23000][1062] Duplicate entry 'me@halimsamy.com' for key 'users.email'
Umm... It seems like I can't add the user again because it's already there theoretically but it's not actually there it's deleted. Well, That's a problem!
The Solution
Dropping the unique constraints it's even an option, so don't even think about it. It has to be there to maintain the consistency of the data. Let's get to the real world:
Partial Indexes (PostgreSQL, MSSQL, ...)
PostgreSQL and MSSQL approached this problem by implementing what is called "Partial Indexes", Which can be applied to a subset of entries according to a specific condition in the WHERE
clause.
CREATE UNIQUE INDEX users_email_unique ON users(email, deleted_at) WHERE deleted_at IS NULL;
Virtual Columns (MySQL)
Although the approach of using Partial Indexes is pretty convenient and easy, things aren't that easy when it comes to MySQL for a single reason:
- If the Unique Constraint contains any
NULL
values, MySQL will ignore the constraint completely. (e.g.UNIQUE KEY (email, deleted_at)
)
We will be using this behavior alongside Virtual Columns as a hack to get soft-delete working with Unique Constraint. Consider the table after modification:
CREATE TABLE users(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
deleted_at TIMESTAMP NULL,
unarchived BOOLEAN GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) VIRTUAL,
PRIMARY KEY (id),
UNIQUE KEY (email, unarchived)
);
What is the GENERATED VIRTUAL
column? It's an auto-generated column based on some expression that gets computed upon selecting a row from that table (not saved to the data storage). We are using that to automatically set unarchived
column to 1
if the row is not soft-deleted, otherwise we leave NULL
as well.
Setting the unarchived
column to 1
when not soft-deleted, gives it a value. The value doesn't matter, it can be anything but it should be constant, so it's not actually a uniqueness factor, the only factor is the email
column. On the other hand, when the unarchived
column is NULL
(row is soft-deleted), MySQL will ignore the Unique Constraint completely (as said above), and that's fine, we don't need to check the consistency of archived data because It was already checked before it became archived.
Conclusion
I think in the case of MySQL we have to hack the technology if the technology is missing something, I think that's called Anti-Pattern. Yet, I am totally OK with that because it's almost at zero cost with no notable downsides except for the new unarchived
.
That was a brief P/S Article (Problem and Solution Article) as I would like to call it. I hope that it was quick, enjoyable, and of course useful. See you later and Don't forget to hit me with a Like!