I have a small Django project I have been playing with for a while. There’s more about it in this series of posts.
The application just keeps a list of items (mostly weblinks etc) and lets me organize them. I’ve been experimenting with using embeddings for finding similar items or suggesting tags etc.
I am using OpenAI text-embedding-ada-002
to compute the embeddings for the objects. Because I was too lazy to actually make or use a vector database, I just built a table of all the item-item distances and used that to query for similar stuff. This was a great hack to see how I wanted to compute and use the embeddings but its obviously not going to scale really well.
After looking around at various options out there like pinecone etc. I decided I wanted to try pgvector since I was already using postgres and I didn’t want to deal with another database. This turned out to work better than I expected as pgvector has good Django support built in. Just
pip install pgvector
I really only had to do two things to make it work that were not obvious from the docs.
First was that I was using a container version of postgres from a standard distribution and I wanted to keep that workflow. pgvector is available as a docker container called ankane/pgvector. My only issue was I had been using an image based on Postgres 16 and so my database was incompatible with the pgvector distribution. Fortunately they provide a Dockerfile and it was easy to build my own.
Next was that PgVector says you can enable the extension via a migration that looks like:
from pgvector.django import VectorExtension
class Migration(migrations.Migration):
operations = [
VectorExtension()
]
That’s correct but you really want to start with a blank migration that is part of your migration chain. Django will make this for you if you just do:
python manage.py makemigrations <app_name> --name <migration_name> --empty
Just tell it the app name and name the migration then add that operation in.
After that, everything worked fine.
Well not exactly. Now I can’t use the standard postgres service in the Gitlab CI/CD to run my tests… hmm
Running a custom Postgres image in Gitlab CI/CD
Fortunately its pretty easy to fix up the CI/CD config once you try a lot of other things. Normally setting up the postgres service in the .yml looks something like this:
services:
- postgres:12.2-alpine
But you can use your own docker image with this syntax. Alias is how the other parts of the ci/cd environment will access this container etc.
services:
# use the pgvector image from docker hub for testing
- name: ankane/pgvector:latest
alias: postgres
And then everything was ok.
I could also use my custom docker file if I put in a place where the CI/CD could find it.
Leave a Reply