I like Solid Queue and the direction things are heading, but its hard to overlook the performance. A system that does tens to hundreds of thousands of jobs/s on Sidekiq + Redis, will now get bottlenecked by transactional performance with solid queue / PG - https://github.com/sidekiq/sidekiq/wiki/Active-Job#performan...
My choice of design pattern here is - Use PG (PostgreSQL) for orchestration + decision making and Sidekiq + Redis as message bus. Just can't beat the time it takes for job to get picked up once it has landed on a queue.
By shayonj a day ago
You can get pretty high job throughput while maintaining transactional integrity, but maybe not with Ruby and ActiveRecord :) https://riverqueue.com/docs/benchmarks
That River example has a MacBook Air doing about 2x the throughput as the Sidekiq benchmarks while still using Postgres via Go.
Can’t find any indication of whether those Sidekiq benchmarks used Postgres or MySQL/Maria, that may be a difference.
By bgentry 21 hours ago
this is really cool and I'd love to use it, but it seems they only support workers written in Go if I'm not mistaken? My workers can be remote and not using Go, behind a NAT too. I want a them to periodically pull the queue, this way I do not need to worry about network topology. I guess I could simply interface with PG atomically via a simple API endpoint for the workers to connect, but I'd love to have the UI of riverqueue.
IIUC - Does this mean if I am making a network call, or performing some expensive task from within the job, the transaction against the database is held open for the duration of the job?
By shayonj 21 hours ago
Definitely not! Jobs in River are enqueued and fetched by worker clients transactionally, but the jobs themselves execute outside a transaction. I’m guessing you’re aware of the risks of holding open long transactions in Postgres, and we definitely didn’t want to limit users to short-lived background jobs.
There is a super handy transactional completion API that lets you put some or all of a job in a transaction if you want to. Works great for making other database side effects atomic with the job’s completion. https://riverqueue.com/docs/transactional-job-completion
By bgentry 21 hours ago
very nice! cool project
By shayonj 21 hours ago
The majority of projects have their workers idle 98% of the time. When you do get to the point where you need the kind of optimizations you are talking about it is relatively easy to make the change. Solid Queue is easy, out of the box, works as is.
By xutopia 16 hours ago
My experience has been the durability of jobs is much more important than the execution speed. I've worked on extremely large Rails apps that would not use Redis for a queue store and preferred data integrity over jobs per sec. I've also seen large, scale financial job queues built on Redis.
Teams should know and be comfortable with what happens to a long running job during a deployment, what happens when a worker disappears (at all points in the jobs lifecycle), is there a non-zero chance the job will be ran more than once? do jobs run in the order they are queued? can your jobs be re-ran? can your jobs be ran simultaneously? Do you know if you are dropping jobs? How are you catching them?
By bradly 18 hours ago
Likewise - that’s why I mentioned that orchestration backed by PG and treating sidekiq + Redis message bus goes a long way. You get the durability, and the ability to re-enqueue jobs in case of availability issues with Redis.
Sidekiq’s reliable fetch provides other kind of durability, esp when recovering from process crash extra.
Agree on all points re: job lifecycle
By shayonj 18 hours ago
> A system that does tens to hundreds of thousands of jobs/s
I mean - yes - but how many systems really do tens of thousands of jobs a second? Some, obviously. But that's a buttload of jobs for the vast majority of verticals.
Like imagine a hypothetical dental SaaS that has 100.0% market penetration in the US and needs to handle the 500M dentist visits per year that happen in the US. That's 16/s, let's say they all run during working hours (x3), there are 10 async jobs on average per dentist visit (x10), that gets us up to 500 jobs/s.
You could even add 100% market penetration for this unicorn SaaS on all veterinary visits (+200M visits/yr), surgeries (+50M), emergency department visits (+130M), chiropractic visits (+35M), and hair salon visits (1B/yr)... and only get to 2000 jobs/s.
By arcticfox 20 hours ago
I have dozens of customers doing 1b+ jobs/day. The world is bigger than you think.
By mperham 20 hours ago
I'm not sure that's a particularly fair conclusion to make Mike.
1bn a day is order 10k/s. if you the creator of Sidekiq know "dozens" of people hitting that rates, if anything you're supporting arcticfox's point that these systems are quite rare.
No shade intended. I have been a very grateful Sidekiq Enterprise customer in past Ruby jobs.
By mattbessey 19 hours ago
It's a piece of mind thing. Everywhere I worked did a lot of stuff on the background. The db is already so full on a normal Rails application that adding more reads doesn't make sense to me when Sidekiq works so well. You never have to worry about Sidekiq throughput, or when you do is to be careful to not overload the db because Sidekiq is so fast.
Rails is already on a slow language, I don't understand why we want to overload even more with slow things.
By zanellato19 19 hours ago
Having both available is a huge win for the Rails ecosystem.
If you’re just starting or have a small project Solid Queue is invaluable and reduces costs and friction.
If and when you need more performance or power Sidekiq is still available. Both run in the context of your Rails app so moving jobs from one to the other shouldn’t be terribly hard.
By pythonaut_16 19 hours ago
That's easy to do in an IoT setting. Less common if human action is involved. But if you need that kind of throughput, just use a more suitable technology like Scala that can work with in memory queues and background tasks and can handle that much without any trickiness.
By ndriscoll 19 hours ago
That is a good example, but don't forget your DB will also be doing many other things besides managing jobs.
In the Rails world of using the DB for everything you have your primary DB, solid queue, solid cache and solid cable.
If you lean into modern Rails features (using Turbo Streams which involves jobs, async destroying dependents, etc.) it doesn't seem that hard to rack up a lot of DB activity.
Combine this with pretty spotty SSD I/O performance for most VPS servers, I don't know about hosting everything on 1 box on a $20-40 / month server but I feel very comfortable doing that with Redis backing queue + cache + cable while PG focuses as being a primary DB. I haven't seen any success stories yet of anyone using the DB for everything on an entry level VPS. It's mainly DHH mentioning it works great with splitting things into different databases and running it on massive dedicated hardware systems with the best SSDs you can buy.
I like the idea on paper but I don't think I'll switch away from Redis until I see a bunch of social proof of people showing how they are using all of these DB backed tools on a VPS to serve a reasonable amount of traffic. We have 10+ years of proof that Redis works great in these cases.
By nickjj 15 hours ago
Solid Queue (by default actually) uses a second, segregated database for the queue. You can scale it independently of your app database.
By jprosevear 15 hours ago
Yep but using 2 DBs within the same PG instance won't improve performance since it's using the same disk.
Managing multiple DB instances across multiple hosts requires a lot more effort, configuration and doubles your hosting costs vs running Redis on the same host with Sidekiq IMO.
By nickjj 13 hours ago
Building for scale is something that should usually be done incrementally. For systems where the speed at which workers pick up jobs isn't super important, let's be real, almost anything durable can work. Simple queues are easy enough to swap out in my experience that making the sub-optimal choice at first is probably fine, so long as you remain flexible to the idea of switching to something else later. Peace of mind comes from observability. The one thing I would caution is that it is a lot more difficult to evolve complex systems into simpler ones (e.g., going from a tool meant for event streaming to one just focused on queuing).
By evantbyrne 17 hours ago
For background jobs, personally I value data correctness more than raw throughput. I like that my jobs are in a solid database, and not redis.
In Elixir I use oban a lot, and now with Rails I have solid queue. this is quite exciting! :D
By sergiotapia 18 hours ago
I've used a philosophy of reduced infra complexity till you really need it for years.
Having to just manage a database is far easier infra wise than a complex system for small rails operations.
Scaling up there will always be strong needs for complexity, but doesn't mean you can't get really far without it.
By Justsignedup 21 hours ago
Modern Postgres in particular can take you really far with this mindset. There are tons of use cases where you can use it for pretty much everything, including as a fairly high throughput transactional job queue, and may not outgrow that setup for years if ever. Meanwhile, features are easier to develop, ops are simpler, and you’re not going to risk wasting lots of time debugging and fixing common distributed systems edge cases from having multiple primary datastores.
If you really do outgrow it, only then do you have to pay the cost to move parts of your system to something more specialized. Hopefully by then you’ve achieved enough success & traction to justify doing so.
Should be the default mindset for any new project if you don’t have very demanding performance & throughput needs, IMO.
By bgentry 20 hours ago
PostgreSQL is quite terrible at OLAP, though. We got a few orders of magnitude performance improvement in some aggregation queries by rewriting them with ClickHouse. It's incredible at it.
My rule of thumb is: PG for transactional data consistency, Clickhouse for OLAP. Maybe Elasticsearch if a full-text search is really needed.
By zernie 17 hours ago
This. Don't loose your time and sanity trying to optimize complex queries for pg's non deterministic query planner, you have no guarantee your indexes will be used (even running the same query again with different arguments). Push your data to clickhouse and enjoy good performance without even attempting to optimize. If even more performance is needed, denormalize here and there.
Keep postgres as the source of truth.
By tacone 15 hours ago
I find that postgres query planner is quite satisfactory for very difficult use cases. I was able to get 5 years into a startup that wasn't basically trying to be the next Twitter on a 300 dollar postgres tier with heroku. The reduced complexity was so huge we didn't need a team of 10. The cost savings were yuge and I got really good at debugging slow queries to a point of I could tell when postgres would cough at one.
My point isn't that this will scale. It's that you can get really really far without complexity and then tack on as needed. This is just another bit of complexity removal for early tech. I'd use this in a heart beat.
By Justsignedup 9 hours ago
Redis, PG, Clickhouse sounds like a good combo to scale workloads that require OLTP and OLAP workloads, at scale (?).
By shayonj 16 hours ago
That is a smart approach.
That said, putting the queue, cache, etc in the database is actually relatively new in the Rails world.
As a result, the process of moving an existing app with a presumably stable infrastructure to a db-only one (often a separate db from your primary app) actually adds complexity/uncertainty, at least until you restabilize things.
By bdcravens 20 hours ago
> As a result, the process of moving an existing app with a presumably stable infrastructure to a db-only one (often a separate db from your primary app) actually adds complexity/uncertainty, at least until you restabilize things.
This has been my experience too, unfortunately. Before Que hit v1.0 things were definitely dicey trying to find an ACID compliant queue library for Rails–everyone was so into Redis and Sidekiq, but Que is still the only ACID queue lib around for longer than four years if that is meaningful to you.
By bradly 18 hours ago
I like the idea of a db-backed background processor, but I still feel like Good Job is a better option. It has much more parity with Sidekiq in terms of features, UI, etc than Solid Queue.
If this was part of reducing operational overhead, why not implement something functionally like GCP Cloud Tasks [0]?
Since this is part of Rails, all you would need to do is implement regular http endpoints, no need for workers/listeners. Submit a "job" to the queue (which itself is just a POST) and the message details: the endpoint and some data to POST to said endpoint.
The queue "server" processes the list of jobs, hits the specified endpoint and when it gets a 200 response, it deletes it. Otherwise, it just keeps retrying.
We use to do this by making use of the cloudtasker gem.
However we've recently just migrated to Sidekiq. We regularly found that the time to enqueue a job with cloud tasks was anywhere from 200-800ms. We often have workflows from users where they need do something that would involve a batch of say 100+ jobs. The time to process each job was minuscule, but actually enqueuing them took a long time.
Time to enqueue with Sidekiq / redis was brought down to under 20ms. We can also make use of bulk enqueuing and enqueue all 100+ jobs in one redis call.
However when we were first starting out, it was a godsend. Made our infrastructure significantly more easy to manage and not have to worry.
By ARama 14 hours ago
Cloudtasker is a wrapper around the actual GCP Cloud Tasks service, that's not what I'm talking about here. What I'm talking about is the implementation detail of Solid Queue itself.
By latchkey 13 hours ago
How does this reduce operational overhead? You still need a queue and a worker to dispatch the tasks to your api.
By ryeguy 18 hours ago
The end user doesn't need to implement the queue or the worker. They just need to implement the http api for receiving POST messages. The queue "server" is effectively just a http client that reads messages from the database and sends them off.
By latchkey 18 hours ago
That makes sense as a general pattern, but probably less so in the context of rails which is typically a self contained monolith. It would be adding another hop, more indirection, and more complexity. It would introduce new problems like the need to segment your "real" api from your worker api for the purpose of load isolation.
By ryeguy 10 hours ago
If it is a self contained monolith, that's perfect for this. You have a long running process, which scans a database for new work and then posts to localhost.
By latchkey 10 hours ago
Why would GCP Cloud Tasks be part of the default libraries? If it’s that simple surely a community maintained gem would be fine?
By pythonaut_16 19 hours ago
To clarify: I'm talking about implementing the core functionality of GCP Cloud Tasks, not using that specific product.
By latchkey 18 hours ago
> Job is an ActiveRecord model
hmm is it? was that a typo for "MyJob is" or "is an ActiveJob model" ?
By thinkingemote 5 hours ago
I have been a big fan of delayed_job for a while. For a time went with sidekiq + Redis but found the juice not worth the squeeze. The biggest issue was complex logic that got run before the current sql transaction finalized. Weird timing bugs and wacky solutions with after_commit hooks and random delays. Not an issue if the database is the sole source of state.
This is cool, but I will just continue to use Sidekiq. I know the API well, it's crazy fast and scalable, and it's easy to setup. A Redis dependency is dead simple these days too.
By shayonj a day ago
By bgentry 21 hours ago
By karolist 16 hours ago
By shayonj 21 hours ago
By shayonj 21 hours ago
By bgentry 21 hours ago
By shayonj 21 hours ago
By xutopia 16 hours ago
By bradly 18 hours ago
By shayonj 18 hours ago
By arcticfox 20 hours ago
By mperham 20 hours ago
By mattbessey 19 hours ago
By zanellato19 19 hours ago
By pythonaut_16 19 hours ago
By ndriscoll 19 hours ago
By nickjj 15 hours ago
By jprosevear 15 hours ago
By nickjj 13 hours ago
By evantbyrne 17 hours ago
By sergiotapia 18 hours ago
By Justsignedup 21 hours ago
By bgentry 20 hours ago
By zernie 17 hours ago
By tacone 15 hours ago
By Justsignedup 9 hours ago
By shayonj 16 hours ago
By bdcravens 20 hours ago
By bradly 18 hours ago
By bdcravens 20 hours ago
By cpursley 12 hours ago
By latchkey 19 hours ago
By ARama 14 hours ago
By latchkey 13 hours ago
By ryeguy 18 hours ago
By latchkey 18 hours ago
By ryeguy 10 hours ago
By latchkey 10 hours ago
By pythonaut_16 19 hours ago
By latchkey 18 hours ago
By thinkingemote 5 hours ago
By rietta 14 hours ago
By neerajdotname2 20 hours ago
By wang_zuo 20 hours ago
By agd 15 hours ago
By loloquwowndueo 19 hours ago
By asdman123 19 hours ago
By Trasmatta 17 hours ago
By nwhnwh 21 hours ago