SQL: Soft Deleting and Unique Constraint

SQL: Soft Deleting and Unique Constraint

One issue with soft delete, it's not compatible with unique constraints.

Introduction

There are two ways to delete data in a database: soft delete and hard delete. Soft delete is a process where the data is not permanently deleted but rather marked as deleted by adding a column like a deleted_at timestamp. This allows the data to be retrieved later if needed. Hard delete, on the other hand, is a process where the data is permanently deleted from the database and cannot be retrieved. Soft delete is often preferred over hard delete as it provides a safety net in case data is accidentally deleted. It also allows for auditing and tracking of deleted data.

The Problem

One issue with soft delete is that it is not compatible with unique constraints. This is because a unique constraint is enforced across all rows, including those that have been marked as deleted with a deleted_at timestamp. As a result, if a row is soft deleted, the unique constraint will prevent the insertion of a new row with the same values, even though the original row is no longer active. This can lead to data integrity issues and should be considered when implementing soft delete in a database.

Consider the following table:

CREATE TABLE users(
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    deleted_at TIMESTAMP NULL,
    PRIMARY KEY (id),
    UNIQUE KEY (email)
);

-- We are using a soft-delete mechanism, we won't be deleting any row.
REVOKE DELETE ON users FROM 'halimsamy'@'localhost';

As you can see, The deleted_at column is used to determine whether it's deleted or not, and when exactly it has been deleted. We are revoking/removing the delete permission on this table because we will not hard-deleting anything for sure.

Now let's play a bit with our table and see what happens:

-- Add a new user.
INSERT INTO users(name, email) VALUES ('Halim Samy', 'me@halimsamy.com');

-- Delete the added user
UPDATE users SET deleted_at = NOW() WHERE id = 1;

-- Ok, The user was deleted, Let's add him again.
INSERT INTO users(name, email) VALUES ('Halim Samy', 'me@halimsamy.com');

-- ERROR:
-- [23000][1062] Duplicate entry 'me@halimsamy.com' for key 'users.email'

Umm... It seems like I can't add the user again because it's already there theoretically but it's not actually there it's deleted. Well, That's a problem!

The Solution

Dropping the unique constraints it's even an option, so don't even think about it. It has to be there to maintain the consistency of the data. Let's get to the real world:

Partial Indexes (PostgreSQL, MSSQL, ...)

PostgreSQL and MSSQL approached this problem by implementing what is called "Partial Indexes", Which can be applied to a subset of entries according to a specific condition in the WHERE clause.

CREATE UNIQUE INDEX users_email_unique ON users(email, deleted_at) WHERE deleted_at IS NULL;

Virtual Columns (MySQL)

Although the approach of using Partial Indexes is pretty convenient and easy, things aren't that easy when it comes to MySQL for a single reason:

  • If the Unique Constraint contains any NULL values, MySQL will ignore the constraint completely. (e.g. UNIQUE KEY (email, deleted_at))

We will be using this behavior alongside Virtual Columns as a hack to get soft-delete working with Unique Constraint. Consider the table after modification:

CREATE TABLE users(
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    deleted_at TIMESTAMP NULL,
    unarchived BOOLEAN GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) VIRTUAL,
    PRIMARY KEY (id),
    UNIQUE KEY (email, unarchived)
);

What is the GENERATED VIRTUAL column? It's an auto-generated column based on some expression that gets computed upon selecting a row from that table (not saved to the data storage). We are using that to automatically set unarchived column to 1 if the row is not soft-deleted, otherwise we leave NULL as well.

Setting the unarchived column to 1 when not soft-deleted, gives it a value. The value doesn't matter, it can be anything but it should be constant, so it's not actually a uniqueness factor, the only factor is the email column. On the other hand, when the unarchived column is NULL (row is soft-deleted), MySQL will ignore the Unique Constraint completely (as said above), and that's fine, we don't need to check the consistency of archived data because It was already checked before it became archived.

Conclusion

I think in the case of MySQL we have to hack the technology if the technology is missing something, I think that's called Anti-Pattern. Yet, I am totally OK with that because it's almost at zero cost with no notable downsides except for the new unarchived.

That was a brief P/S Article (Problem and Solution Article) as I would like to call it. I hope that it was quick, enjoyable, and of course useful. See you later and Don't forget to hit me with a Like!