gerardo | Jan. 5, 2024, 8:51 p.m.
tldr: In Django and Django's ORM (Rails as well but my Rails knowledge is -5 and also others) watch out for the n+1 query problem. In Django, use the select_related/prefetch_related functions when querying models that have foreign keys and use the Django Debug Toolbar as well.
I am writing this as a mental note of sorts. I use Django and Django's ORM a lot for personal projects, and I'm aware I should watch out for the n+1 query problem. I know I should use the _related functions, but I always forget why and I end up searching for the answer every single time. So I'm hoping that by writing this I will break this cycle and be able to remember it and maybe also help someone.
So, from Sentry's documentation (which is actually really good go check it out):
N+1 queries are a performance problem in which the application makes database queries in a loop, instead of making a single query that returns or modifies all the information at once. Each database connection takes some amount of time, so querying the database in a loop can be many times slower than doing it just once. This problem often occurs when you use an object-relational mapping (ORM) tool in web frameworks like Django or Ruby on Rails.
It's probably easier to use a simple example and show first what the end goal is. Let's say you have two Django models (which will translate to two sql tables), Author and Book (again, based on the example from Sentry). In Django your models will look something like this:
from django.db import models class Author(models.Model): name = models.CharField(max_length=255) class Book(models.Model): title = models.CharField(max_length=255) author = models.ForeignKey(Author, on_delete=models.CASCADE)
books = Book.objects.all() for book in books: print(f"{book.title} written by {book.author.name}")
Before diving into the problem more, let's briefly look at it now from a SQL (this is SQLite) point of view. You will have two tables, Author and Book:
Table "public.Author" Column | Type | Collation | Nullable | Default --------+-----------------------+-----------+----------+------------------------------------ id | integer | | not null | name | character varying(255)| | | Indexes: "Author_pkey" PRIMARY KEY, btree (id)
Table "public.Book" Column | Type | Collation | Nullable | Default -----------+-----------------------+-----------+----------+------------------------------------ id | integer | | not null | title | character varying(255)| | | author_id | integer | | | Indexes: "Book_pkey" PRIMARY KEY, btree (id) Foreign-key constraints: "Book_author_id_fkey" FOREIGN KEY (author_id) REFERENCES Author(id)
SELECT b.book, a.author FROM author a JOIN book b ON b.author_id = a.id;
Back to Django, the way the ORM is actually working under the hood is that, instead of doing 1 query like in sql, it is doing multiple queries. It first queries the Book table to get all the books (1 query), but then proceeds to get all the authors for each book (n queries i we have n books). And you end up with... n+1 queries.
Depending on your system particulars, this might range from a meh problem to geez problem. Let's say you are using Django to host a small application with a sqlite database, everything in the same machine. Probably still not ideal and for sure a bad practice, but your problems won't be that big. But if you are running a more serious distributed application with a PostgreSQL database running on a different server and a high amount of database operations, you have a bigger problem. Here things like network latency (communication between the application server and database server), database load, connection creation delays (if you are not using a pool of connections), resource utilization, and caching (both at the database and application level) will begin to matter. And if n is big, well...
There's a very simple Django application here. Clone the repository and follow the instructions in the README. Make sure you populate the database according to the instructions.
Make a query using the .all() method for the first 10 books. You should see the following 11 queries in your logs. Note that the first query retrieves the 10 books, and the next 10 queries each retrieve the author of each book.
DEBUG 2024-01-07 14:52:59,643 utils (0.000) SELECT "nplusone_book"."id", "nplusone_book"."title", "nplusone_book"."author_id" FROM "nplusone_book" LIMIT 10; args=(); alias=default DEBUG 2024-01-07 14:52:59,647 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1001 LIMIT 21; args=(1001,); alias=default Book of Author 0 by Author 0 DEBUG 2024-01-07 14:52:59,648 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1002 LIMIT 21; args=(1002,); alias=default Book of Author 1 by Author 1 DEBUG 2024-01-07 14:52:59,651 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1003 LIMIT 21; args=(1003,); alias=default Book of Author 2 by Author 2 DEBUG 2024-01-07 14:52:59,652 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1004 LIMIT 21; args=(1004,); alias=default Book of Author 3 by Author 3 DEBUG 2024-01-07 14:52:59,653 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1005 LIMIT 21; args=(1005,); alias=default Book of Author 4 by Author 4 DEBUG 2024-01-07 14:52:59,655 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1006 LIMIT 21; args=(1006,); alias=default Book of Author 5 by Author 5 DEBUG 2024-01-07 14:52:59,656 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1007 LIMIT 21; args=(1007,); alias=default Book of Author 6 by Author 6 DEBUG 2024-01-07 14:52:59,657 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1008 LIMIT 21; args=(1008,); alias=default Book of Author 7 by Author 7 DEBUG 2024-01-07 14:52:59,658 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1009 LIMIT 21; args=(1009,); alias=default Book of Author 8 by Author 8 DEBUG 2024-01-07 14:52:59,658 utils (0.000) SELECT "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_author" WHERE "nplusone_author"."id" = 1010 LIMIT 21; args=(1010,); alias=default Book of Author 9 by Author 9
Now query again the first 10 books, but this time using the _related method. You will see only one query performing an inner join, and the same output for the first 10 books.
DEBUG 2024-01-07 14:53:27,927 utils (0.001) SELECT "nplusone_book"."id", "nplusone_book"."title", "nplusone_book"."author_id", "nplusone_author"."id", "nplusone_author"."name" FROM "nplusone_book" INNER JOIN "nplusone_author" ON ("nplusone_book"."author_id" = "nplusone_author"."id") LIMIT 10; args=(); alias=default Book of Author 0 by Author 0 Book of Author 1 by Author 1 Book of Author 2 by Author 2 Book of Author 3 by Author 3 Book of Author 4 by Author 4 Book of Author 5 by Author 5 Book of Author 6 by Author 6 Book of Author 7 by Author 7 Book of Author 8 by Author 8 Book of Author 9 by Author 9
So there you have it, now you know about the n+1 query problem, why it happens, and how to solve it. Go write some queries.