Hacker News

An Introduction to Solid Queue for Ruby on Rails

124 points by amalinovic 4 months ago

57 Comments
I like Solid Queue and the direction things are heading, but its hard to overlook the performance. A system that does tens to hundreds of thousands of jobs/s on Sidekiq + Redis, will now get bottlenecked by transactional performance with solid queue / PG - https://github.com/sidekiq/sidekiq/wiki/Active-Job#performan...
My choice of design pattern here is - Use PG (PostgreSQL) for orchestration + decision making and Sidekiq + Redis as message bus. Just can't beat the time it takes for job to get picked up once it has landed on a queue.

By shayonj 4 months ago
You can get pretty high job throughput while maintaining transactional integrity, but maybe not with Ruby and ActiveRecord :) https://riverqueue.com/docs/benchmarks
That River example has a MacBook Air doing about 2x the throughput as the Sidekiq benchmarks while still using Postgres via Go.
Can’t find any indication of whether those Sidekiq benchmarks used Postgres or MySQL/Maria, that may be a difference.

By bgentry 4 months ago
this is really cool and I'd love to use it, but it seems they only support workers written in Go if I'm not mistaken? My workers can be remote and not using Go, behind a NAT too. I want a them to periodically pull the queue, this way I do not need to worry about network topology. I guess I could simply interface with PG atomically via a simple API endpoint for the workers to connect, but I'd love to have the UI of riverqueue.

By karolist 4 months ago
very interesting and TIL about the project, thanks for sharing. What DB is River Queue for 46k jobs/sec (https://riverqueue.com/docs/benchmarks)
UPDATE: I see its PG - https://riverqueue.com/docs/transactional-enqueueing

By shayonj 4 months ago
IIUC - Does this mean if I am making a network call, or performing some expensive task from within the job, the transaction against the database is held open for the duration of the job?

By shayonj 4 months ago
Definitely not! Jobs in River are enqueued and fetched by worker clients transactionally, but the jobs themselves execute outside a transaction. I’m guessing you’re aware of the risks of holding open long transactions in Postgres, and we definitely didn’t want to limit users to short-lived background jobs.
There is a super handy transactional completion API that lets you put some or all of a job in a transaction if you want to. Works great for making other database side effects atomic with the job’s completion. https://riverqueue.com/docs/transactional-job-completion

By bgentry 4 months ago
very nice! cool project

By shayonj 4 months ago
My experience has been the durability of jobs is much more important than the execution speed. I've worked on extremely large Rails apps that would not use Redis for a queue store and preferred data integrity over jobs per sec. I've also seen large, scale financial job queues built on Redis.
Teams should know and be comfortable with what happens to a long running job during a deployment, what happens when a worker disappears (at all points in the jobs lifecycle), is there a non-zero chance the job will be ran more than once? do jobs run in the order they are queued? can your jobs be re-ran? can your jobs be ran simultaneously? Do you know if you are dropping jobs? How are you catching them?

By bradly 4 months ago
Likewise - that’s why I mentioned that orchestration backed by PG and treating sidekiq + Redis message bus goes a long way. You get the durability, and the ability to re-enqueue jobs in case of availability issues with Redis.
Sidekiq’s reliable fetch provides other kind of durability, esp when recovering from process crash extra.
Agree on all points re: job lifecycle

By shayonj 4 months ago
The majority of projects have their workers idle 98% of the time. When you do get to the point where you need the kind of optimizations you are talking about it is relatively easy to make the change. Solid Queue is easy, out of the box, works as is.

By xutopia 4 months ago
> A system that does tens to hundreds of thousands of jobs/s
I mean - yes - but how many systems really do tens of thousands of jobs a second? Some, obviously. But that's a buttload of jobs for the vast majority of verticals.
Like imagine a hypothetical dental SaaS that has 100.0% market penetration in the US and needs to handle the 500M dentist visits per year that happen in the US. That's 16/s, let's say they all run during working hours (x3), there are 10 async jobs on average per dentist visit (x10), that gets us up to 500 jobs/s.
You could even add 100% market penetration for this unicorn SaaS on all veterinary visits (+200M visits/yr), surgeries (+50M), emergency department visits (+130M), chiropractic visits (+35M), and hair salon visits (1B/yr)... and only get to 2000 jobs/s.

By arcticfox 4 months ago
I have dozens of customers doing 1b+ jobs/day. The world is bigger than you think.

By mperham 4 months ago
I'm not sure that's a particularly fair conclusion to make Mike.
1bn a day is order 10k/s. if you the creator of Sidekiq know "dozens" of people hitting that rates, if anything you're supporting arcticfox's point that these systems are quite rare.
No shade intended. I have been a very grateful Sidekiq Enterprise customer in past Ruby jobs.

By mattbessey 4 months ago
It's a piece of mind thing. Everywhere I worked did a lot of stuff on the background. The db is already so full on a normal Rails application that adding more reads doesn't make sense to me when Sidekiq works so well. You never have to worry about Sidekiq throughput, or when you do is to be careful to not overload the db because Sidekiq is so fast.
Rails is already on a slow language, I don't understand why we want to overload even more with slow things.

By zanellato19 4 months ago
Having both available is a huge win for the Rails ecosystem.
If you’re just starting or have a small project Solid Queue is invaluable and reduces costs and friction.
If and when you need more performance or power Sidekiq is still available. Both run in the context of your Rails app so moving jobs from one to the other shouldn’t be terribly hard.

By pythonaut_16 4 months ago
[deleted]

By 4 months ago
Dozens is just the number I've talked to who have told me their scale. I'm sure there's a lot more.
And ultimately the precise number is not as important as the fact that it *is normal* for midsized Ruby apps to operate at this level of scale. The more you start to use the background job pattern, the more you tend to create. "A Job for every ProductVariant record" can easily create 10-20m jobs.
You want tools that scale to the size of "success". That would be a good marketing tagline for Sidekiq.

By mperham 4 months ago
That's easy to do in an IoT setting. Less common if human action is involved. But if you need that kind of throughput, just use a more suitable technology like Scala that can work with in memory queues and background tasks and can handle that much without any trickiness.

By ndriscoll 4 months ago
[deleted]

By 4 months ago
That is a good example, but don't forget your DB will also be doing many other things besides managing jobs.
In the Rails world of using the DB for everything you have your primary DB, solid queue, solid cache and solid cable.
If you lean into modern Rails features (using Turbo Streams which involves jobs, async destroying dependents, etc.) it doesn't seem that hard to rack up a lot of DB activity.
Combine this with pretty spotty SSD I/O performance for most VPS servers, I don't know about hosting everything on 1 box on a $20-40 / month server but I feel very comfortable doing that with Redis backing queue + cache + cable while PG focuses as being a primary DB. I haven't seen any success stories yet of anyone using the DB for everything on an entry level VPS. It's mainly DHH mentioning it works great with splitting things into different databases and running it on massive dedicated hardware systems with the best SSDs you can buy.
I like the idea on paper but I don't think I'll switch away from Redis until I see a bunch of social proof of people showing how they are using all of these DB backed tools on a VPS to serve a reasonable amount of traffic. We have 10+ years of proof that Redis works great in these cases.

By nickjj 4 months ago
Solid Queue (by default actually) uses a second, segregated database for the queue. You can scale it independently of your app database.

By jprosevear 4 months ago
Yep but using 2 DBs within the same PG instance won't improve performance since it's using the same disk.
Managing multiple DB instances across multiple hosts requires a lot more effort, configuration and doubles your hosting costs vs running Redis on the same host with Sidekiq IMO.

By nickjj 4 months ago
Building for scale is something that should usually be done incrementally. For systems where the speed at which workers pick up jobs isn't super important, let's be real, almost anything durable can work. Simple queues are easy enough to swap out in my experience that making the sub-optimal choice at first is probably fine, so long as you remain flexible to the idea of switching to something else later. Peace of mind comes from observability. The one thing I would caution is that it is a lot more difficult to evolve complex systems into simpler ones (e.g., going from a tool meant for event streaming to one just focused on queuing).

By evantbyrne 4 months ago
For background jobs, personally I value data correctness more than raw throughput. I like that my jobs are in a solid database, and not redis.
In Elixir I use oban a lot, and now with Rails I have solid queue. this is quite exciting! :D

By sergiotapia 4 months ago
[deleted]

By 4 months ago
I've used a philosophy of reduced infra complexity till you really need it for years.
Having to just manage a database is far easier infra wise than a complex system for small rails operations.
Scaling up there will always be strong needs for complexity, but doesn't mean you can't get really far without it.

By Justsignedup 4 months ago
Modern Postgres in particular can take you really far with this mindset. There are tons of use cases where you can use it for pretty much everything, including as a fairly high throughput transactional job queue, and may not outgrow that setup for years if ever. Meanwhile, features are easier to develop, ops are simpler, and you’re not going to risk wasting lots of time debugging and fixing common distributed systems edge cases from having multiple primary datastores.
If you really do outgrow it, only then do you have to pay the cost to move parts of your system to something more specialized. Hopefully by then you’ve achieved enough success & traction to justify doing so.
Should be the default mindset for any new project if you don’t have very demanding performance & throughput needs, IMO.

By bgentry 4 months ago
PostgreSQL is quite terrible at OLAP, though. We got a few orders of magnitude performance improvement in some aggregation queries by rewriting them with ClickHouse. It's incredible at it.
My rule of thumb is: PG for transactional data consistency, Clickhouse for OLAP. Maybe Elasticsearch if a full-text search is really needed.

By zernie 4 months ago
This. Don't loose your time and sanity trying to optimize complex queries for pg's non deterministic query planner, you have no guarantee your indexes will be used (even running the same query again with different arguments). Push your data to clickhouse and enjoy good performance without even attempting to optimize. If even more performance is needed, denormalize here and there.
Keep postgres as the source of truth.

By tacone 4 months ago
I find that postgres query planner is quite satisfactory for very difficult use cases. I was able to get 5 years into a startup that wasn't basically trying to be the next Twitter on a 300 dollar postgres tier with heroku. The reduced complexity was so huge we didn't need a team of 10. The cost savings were yuge and I got really good at debugging slow queries to a point of I could tell when postgres would cough at one.
My point isn't that this will scale. It's that you can get really really far without complexity and then tack on as needed. This is just another bit of complexity removal for early tech. I'd use this in a heart beat.

By Justsignedup 4 months ago
It’s simple high throughput queries that often bite you with the PG query planner.

By jashmatthews 4 months ago
Redis, PG, Clickhouse sounds like a good combo to scale workloads that require OLTP and OLAP workloads, at scale (?).

By shayonj 4 months ago
That is a smart approach.
That said, putting the queue, cache, etc in the database is actually relatively new in the Rails world.
As a result, the process of moving an existing app with a presumably stable infrastructure to a db-only one (often a separate db from your primary app) actually adds complexity/uncertainty, at least until you restabilize things.

By bdcravens 4 months ago
> As a result, the process of moving an existing app with a presumably stable infrastructure to a db-only one (often a separate db from your primary app) actually adds complexity/uncertainty, at least until you restabilize things.
This has been my experience too, unfortunately. Before Que hit v1.0 things were definitely dicey trying to find an ACID compliant queue library for Rails–everyone was so into Redis and Sidekiq, but Que is still the only ACID queue lib around for longer than four years if that is meaningful to you.

By bradly 4 months ago
I like the idea of a db-backed background processor, but I still feel like Good Job is a better option. It has much more parity with Sidekiq in terms of features, UI, etc than Solid Queue.

By bdcravens 4 months ago
I like this approach but seems a missed opportunity to use the pgmq library: https://github.com/pgmq/pgmq
Here's a neat project built on top of pgmq & supabase deno edge functions, but a similar thing could be done in other stacks:
https://www.pgflow.dev
Played with it a bit and it's very promising.

By cpursley 4 months ago
If this was part of reducing operational overhead, why not implement something functionally like GCP Cloud Tasks [0]?
Since this is part of Rails, all you would need to do is implement regular http endpoints, no need for workers/listeners. Submit a "job" to the queue (which itself is just a POST) and the message details: the endpoint and some data to POST to said endpoint.
The queue "server" processes the list of jobs, hits the specified endpoint and when it gets a 200 response, it deletes it. Otherwise, it just keeps retrying.
[0] https://cloud.google.com/tasks/docs

By latchkey 4 months ago
How does this reduce operational overhead? You still need a queue and a worker to dispatch the tasks to your api.

By ryeguy 4 months ago
The end user doesn't need to implement the queue or the worker. They just need to implement the http api for receiving POST messages. The queue "server" is effectively just a http client that reads messages from the database and sends them off.

By latchkey 4 months ago
That makes sense as a general pattern, but probably less so in the context of rails which is typically a self contained monolith. It would be adding another hop, more indirection, and more complexity. It would introduce new problems like the need to segment your "real" api from your worker api for the purpose of load isolation.

By ryeguy 4 months ago
If it is a self contained monolith, that's perfect for this. You have a long running process, which scans a database for new work and then posts to localhost.

By latchkey 4 months ago
We use to do this by making use of the cloudtasker gem.
However we've recently just migrated to Sidekiq. We regularly found that the time to enqueue a job with cloud tasks was anywhere from 200-800ms. We often have workflows from users where they need do something that would involve a batch of say 100+ jobs. The time to process each job was minuscule, but actually enqueuing them took a long time.
Time to enqueue with Sidekiq / redis was brought down to under 20ms. We can also make use of bulk enqueuing and enqueue all 100+ jobs in one redis call.
However when we were first starting out, it was a godsend. Made our infrastructure significantly more easy to manage and not have to worry.

By ARama 4 months ago
Cloudtasker is a wrapper around the actual GCP Cloud Tasks service, that's not what I'm talking about here. What I'm talking about is the implementation detail of Solid Queue itself.

By latchkey 4 months ago
Why would GCP Cloud Tasks be part of the default libraries? If it’s that simple surely a community maintained gem would be fine?

By pythonaut_16 4 months ago
To clarify: I'm talking about implementing the core functionality of GCP Cloud Tasks, not using that specific product.

By latchkey 4 months ago
Mastodon runs on Rails and currently utilises and relies on Redis and Sidekiq. I’ve heard redis/sidekiq adds some additional complication and workload to setting up and maintaining a mastodon instance especially for those less familiar with the stack.
I’d love some opinions from those with any insight. Would it benefit the Mastodon project to switch across to Solid Queue generally or as a default? Or is mastodon one of those use cases where the current Redis/Sidekiq setup really is more suitable?
Please explain your reasoning :)

By evolve2k 4 months ago
This is cool, but I will just continue to use Sidekiq. I know the API well, it's crazy fast and scalable, and it's easy to setup. A Redis dependency is dead simple these days too.

By Trasmatta 4 months ago
I have been a big fan of delayed_job for a while. For a time went with sidekiq + Redis but found the juice not worth the squeeze. The biggest issue was complex logic that got run before the current sql transaction finalized. Weird timing bugs and wacky solutions with after_commit hooks and random delays. Not an issue if the database is the sole source of state.

By rietta 4 months ago
We didn't know how "UPDATE SKIP LOCKED" worked. We lookoed into and wrote a blog on it. https://www.bigbinary.com/blog/solid-queue

By neerajdotname2 4 months ago
I am wondering whether there have been any efforts to implement something similar in other languages or frameworks.

By wang_zuo 4 months ago
Oban has been around for many years, and is used widely in the Elixir ecosystem. I’m sure the creators of Solid Queue were heavily inspired by it.
https://hexdocs.pm/oban/Oban.html

By agd 4 months ago
What like pg boss for Node? https://github.com/timgit/pg-boss

By loloquwowndueo 4 months ago
PGQueuer does something fairly similar in python: https://github.com/janbjorge/pgqueuer

By asdman123 4 months ago
> Job is an ActiveRecord model
hmm is it? was that a typo for "MyJob is" or "is an ActiveJob model" ?

By thinkingemote 4 months ago
Nothing solid about high-level interfaces.

By nwhnwh 4 months ago