This is in my opinion a shallow and poor comparison.
First of all, Redis is not a database. Although it can be used as one in some specific use cases, it was never designed to be a database. Redis and MongoDB have been created for completely different purposes and have a profoundly different principles and architecture. Comparing them as "databases" makes no sense. The only value of such comparison is to understand how different they are.
The author clearly did not make much effort of studying these tools beyond some basic knowledge and I would bet he never used any of them in a real, production use case. The arguments are very general, shallow, lazy and unconvincing (e.g. "Redis does not scale as well as MongoDB").
Proposing KEYS *
as an equivalent of collection.find()
is a joke - simply reading the documentation of that command disqualifies it as such.
The paragraph about replication/clustering is also completely missed - it mentions nothing about Redis Cluster (completely free, no "Redis Enterprise" required) or MongoDB sharding.
This article smells to me like a cheap excuse to gain views and advertise a product.
> Could you elaborate on why the masses should adopt this? I'd imagine that those queues offset computation heavy jobs (like cache building) to task-runners instead of endusers
Anything that users need to wait for, really. It's such a smoother experience when you don't have to stick around and wait looking at a spinner, but can merrily carry on. Plus it helps with server load if all your UI pages are light - leave the email sending / image processing / cache building / backups zipping and uploading to S3 / whatever to the background jobs. It also lends itself nicely to testing - if your system is set up to defer heavy tasks to the background, supplying a mock Queue manager when doing full page tests will let your suite progress much faster, and will let you inspect the built queue separately, without actually executing it.
> but that does not really fit to my understanding of redis which seems to be a key-value storage in memory
Redis just collects the tasks to be executed and releases them when it's time. Is it the in-memory aspect that worries you? Redis has persistence which you can turn on. It makes it a little slower, but safer - and there are two types, RDB and AOF, each of which has its pros and cons as described in the link. Basically, it flushes to disk periodically (or frequently) so even if your server goes down and loses RAM, your queues are safe to a reasonable degree, depending on persistence settings.
Redis can be set to do the following when you run out of memory and try to make a write that requires more:
There are additional options atop those basic ones that give you a little more control. This page has a lot more detail.
We use redis extensively as our primary data store for our dotnet application so I've become a sort of defacto expert.
Caching is storing your data locally in in order to provide quick access to it. Caches usually provide only simple access to data, but can work very quickly.
For example, let's say your application fetches current weather data for a postal code from an API. The first time a user asks for weather for a postal code, you send a request to the API for the data. Then you store the data locally in whatever data store (DB) you're using, with the postal code as the key and the weather data as the value.
Later, when people want weather data for that postal code you try to use the local copy instead of calling the API. You first look in the cache, and if the data isn't in the cache, then you call the API.
However weather changes, so you have to set your cached data to expire after a certain amount of time, called the time to live. You can usually tell a cache-friendly DB to automatically delete data after it's a certain age. Sometimes you will need to delete the data in the cache for other reasons as well.
As /u/MildlySerious says, Redis is a good tool for caching.
> Just look at any page and refresh it. Caching indicates static, not dynamic updates. If the page is updated with every refresh, it is dynamically updated.
I don't think you understand what caching means in this context. Caching simply means storing data, with the goal of serving it faster than you would by instead reading the data from a persistent source (e.g. database) or by recomputing it.
Sites like YouTube, reddit, etc often use caches for low-write, high-read data - such as subscriber counts. Take a look at things like redis and memcached.
It's more common in distributed contexts (e.g. redis or elasticsearch clusters)
I find it kind of funny that some people will defend to the death the word "slave" as necessary because it fits the context perfectly, yet most of these distributed systems have a protocol for "electing" one "slave" to "master" in case the existing "master" dies (somehow I missed that in my history books)
They aren't easily comparable because they work in very different ways - also note that Kafka number is for a cluster running on three machines: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://redis.io/topics/benchmarks should be useful here though. They get 770,000 LPUSH/LPOP per second on a single node Intel(R) Xeon(R) CPU E5520 @ 2.27GHz using Redis pipelining.
Maybe not 9999999 but their benchmarks show around 70k requests per second. If you have anywhere near 70k votes per second from users around the world, I think you'll be able to pay someone to design a better system.
There is no client that works in multiple languages...
Use this, pick one with a gold star.
While the clients may differ for each language, Redis commands are the same for each, all clients will implement slightly different ways to use the same commands (set, get, del, expire, etc)
P.S. I work for Redis 😁
Every commands have documented Big O values. Redis is used regularly in the development of quite mundane web apps. I think it's quite valuable to at least understand basic performance aspects of the data structures you're going to rely upon.
> Redis is not a database….?
.... What?
https://en.wikipedia.org/wiki/Database: > In computing, a database is an organized collection of data stored and accessed electronically from a computer system.
https://redis.io/: > Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker.
Sure sounds like a database to me. Even redis's own website notes database
as a use case.
You matching may work just like in every other environment or language, i think it would be best to operate on the database.
Just a thought on where to implement this: you might want to have a look at Celery (which allows you to run code asyncronously on another server) and Django Management Commands. You can write a management command an run it i.e. by cron for recurring jobs.
It's also possible to store this data in-memory at a Redis server if the database is not fast enough. But then you have to roll your own persistency and integrity.
Depending on the settings, redis can periodically dump the whole in-memory dataset in the RDB format, or it can log every command modyfing the dataset into an append-only-file (AOF), which is then replayed against the server to reconstruct the state. You can also use both RDB and AOF persistence at the same time.
If you need more info, check the persistence docs.
Documentation is pretty explicit:
For the del command: Integer reply: the number of keys that were removed.
Source: https://redis.io/commands/del
For the set command: Simple string reply: OK if SET was executed correctly. Null reply: a Null Bulk Reply is returned if the SET operation was not performed because the user specified the NX or XX option but the condition was not met.
Source: https://redis.io/commands/set
You can use SCAN but scanning the entire keyspace is sub-optimal unless the database is dedicated to Rack-Attack:
https://redis.io/commands/scan
Imagine Redis with 1,000,000 cache keys and 10 Rack-Attack keys. That's a lot of scanning to find 10 keys. Better to put 10 entries in a SET and use SSCAN on it.
Just assign a TTL to the key in redis. On each API call, try to retrieve the value from redis. If redis sends nothing back, hit the external api, cache the response, assign a TTL and repeat.
Super simple pattern, basically what redis was designed to do. No need to overcomplicate. ioredis is a great implementation of the redis client for node.
Looks like it uses .json files containing installer info, rather than NuGet packages. According to the docs, "Scoop installs programs to your home directory by default", so I guess it is basically a fancy .zip extractor?
{ "homepage": "https://redis.io", "version": "3.2.100", "url": "https://github.com/MicrosoftArchive/redis/releases/download/win-3.2.100/Redis-x64-3.2.100.zip", "hash": "73775183186ebd1917353a8ae62303a328cedfff58164c9bf46e2b46977a9475", "bin": [ "redis-benchmark.exe", "redis-check-aof.exe", "redis-cli.exe", "redis-server.exe" ], "checkver": { "url": "https://github.com/MicrosoftArchive/redis/releases/", "re": ">win-(\d+\.\d+\.\d+)<" } }
Hello and please forgive my educational tone.
TL;DR There are no free lunches so every work is "bad" for the CPU, but on the other hand an idle CPU is just a waste of resources. That said, the CPU will probably be ok.
First, you should read more about Redis' expiration - the truth is out there.
If you've read the above carefully, you see that expiry's CPU usage is managed - keys expire on access passively or actively 10 times every second (the hz
configuration directive). This ensures that even if the entire keyspace expires at the same microsecond (a gnab gib of sorts), the server will still be responsive.
> they all will be ticking every second right?
I hope that by now you understand that there are no ticking keys - that would be an extremely inefficient way to manage expiration.
Note: actually, the real price (CPU-wise) of expiration is the deletion of the value. Bigger values (think a List w/ 10K elements) require more work to free (i.e. dealocate
]. A major improvement in v4 is "lazy deletion" (see the <code>UNLINK</code> command for details), that can be used in expiration by setting the lazyfree-lazy-expire
configuration directive to "yes".
Last week I put a lot of time on Sucredb. Now it has basic support for Redis hash/set types using CRDTs.
This week I plan to write more docs and start work on support for MULTI/EXEC operations on the same shard key (made possible by Keys hash tags like: user:{123}
, friends_set:{123}
, ...), after that Sucredb should be usable in real projects.
So, you have the right idea that these should probably be their own applications. The reason you can't easily do both in the same program is both of those frameworks "block." Essentially they hit the "run" portion and go into a while
loop waiting for things to happen. And, as you've found out, you can't enter both their loops at the same time. But it's ok!
Instead of polling a text file for changes, you could instead write your PySide application such that it does <code>requests.get</code> calls to get the data from the endpoints. Or, you could use a database like SQLite to read and write changes from each. Another kind of database used for sharing state between processes is an in-memory key value store like redis, it's really fast and perfect for this kind of thing.
You're on the right track! Just need to upgrade your text file :)
Sorry no advice on exactly what you're asking, but more recently i've been seeing a trend of talks about using Redis as the persistent storage (yep it's possible, Redis can persist data to disk). I'm not sure if that helps in anyway but more info here: https://redis.io/topics/persistence
​
If you're building this for production at a company, i'd probably advise against using redis persistence (i cant give advice on if it would fit your business needs and i've never used the persistence feature in prod) but if its a side project, i would just look into that and remember, KISS (keep it simple stupid)
> That would result in the query being triggered everytime to get the count of all the orders.
Are you using redis? Use INCR? Something like:
Rails.cache.pool.with { |redis| redis.incr("order-count") }
Though I guess that could get out of sync though depending on your concurrency.
Really though this seems like premature optimization to me. Unless you're talking real Web Scale™ stuff here, I wouldn't stress too much about trying to save yourself the occasional COUNT()
query.
Redis is unauthenticated by default and anyone who can access it can probably easily escalate to root access from there. Your firewall is pretty much the entirety of Redis's security model so you likely just made your VM vulnerable.
Yeah there is no default expiration for elastic cache. If you aren't doing it explicitly then maybe if you are using a library it is setting the expiration or flushing the db behind the scenes.
You can easily check this on your keys by using the TTL command on redis cli https://redis.io/commands/ttl
You could use an in-memory cache like Redis to persist data in memory. The information probably belongs in the database though. What you consider ephemeral information today might be useful at a later date (e.g. users want to view their conferencing history).
Thanks for asking!
First, you don't have to choose one or the other. A lot of people who use Redis do so with other databases.
But why do people use Redis, in a nutshell? Redis is fast. You typically use it for the speed.
A lot of people also use it in part because they like how it works. SQLite and PostgreSQL are both amazing relational database systems, and their primary interface is SQL. Redis doesn't use SQL.
Instead of SQL, Redis uses commands. You can see them all here: https://redis.io/commands
One of the big differences between e.g. SQLite and Redis is how you think about "what the database is" when you build your apps. People usually think about relational databases as tables of data with rows. You write SQL queries to get the data.
Redis is more like a data structures server. You manipulate Lists, Hashes, Sets, Sorted Sets, and more -- all for whatever nefarious purposes your app has in mind. That in itself is a big draw for some folks.
Finally, I'll end with this. If you're new at working with databases, I highly recommend you learn two systems: Postgres and Redis. Postgres is a great relational database system. Meanwhile, Redis is truly the Swiss-army knife of databases.
If you already have a redis service dependency, you can use it's INCR with EXPIRE to implement this easily. Just have a key like "ratelimit-#{current_user.id}" that tracks the number of requests. Then right after `INCR ratelimit-12345` you do `EXPIRE ratelimit-12345 100`. The result of the INCR command should be checked by your ruby code, and if it's above your limit per time window, you simply return a 429. This approach is actually described roughly in the redis docs
I strongly recommend looking through the Ruby Enumerable methods. They're powerful and encapsulate tons of useful patterns for dealing with classes that include Enumerable
, e.g., Array
, Hash
, Set
, etc.
Some useful ones here:
map
index_by
max_by
sort_by
Right now the code you're writing is issuing one Redis query per key. For each query it issues, it creates an Array
in Ruby and then calculates the size of that Array
.
That works, but it should feel fishy to you. You're going to instantiate a Ruby object for every member in Redis? That's a ton of time and memory wasted.
Redis should have all the information you need to answer this question. That is, internally, Redis almost certainly knows how many members there are for a particular key. It'd be much nicer if you could ask Redis, "How many members does the set with key foo
have?" Or even better if you could ask it what you want directly.
I'm no Redis expert, but with that thought in mind I googled "redis set member count" and the first result is the SCARD operation.
With this in mind, you want the two keys whose sets have the most members. This is exactly what Enumerable#max_by is for. You can even pass an integer argument to max_by
to tell it whether you want the largest, the two largest, the three largest, etc.
This should give you the two most populated keys, issuing one query per key but only instantiating one integer per query:
top_two_keys = redis.keys("*").max_by(2) { |key| redis.scard(key) }
You don't really use it that way; it's a queue.
https://www.elastic.co/guide/en/logstash/current/plugins-inputs-redis.html
https://redis.io/commands/lpush to push it in and then uses blpop If I recall correctly to get the data.
Usually give it a few threads and it's been nothing but working great. Had Logstash crash because of some custom data that made it use up it's heap and just had to fix it and let logstash re-play the data to get back in sync.
Sure filebeat should resend the data I just am nervous of that kind of thing; expecting things to work... I'm not good at that.
Hi, for example redis implements a fast in-memory queue that may lose writes during failover. On the other hand, CockroachDB is an ACIDly replicated implementation of SQL.
Set an expiration for the redis key (check out the EXPIRE and EXPIREAT commands).
You then will need to add heartbeat functionality to the client (ping/pong). When the client responds, reset the expiration timer.
While not “real-time,” this solution would allow you to clean up dead data in redis even if your node server crashed.
Run three copies of redis on different ports, using three configs, three rdb files, etc. Redis is inherently a single-port service using a single-threaded event loop. You're probably thinking about redis cluster, which can shard data across multiple redis instances, but you'd still be running multiple copies of redis. Follow these instructions to try that: https://redis.io/topics/cluster-tutorial
I didn't see the existing TLS PR, and I'm not finding it now. Do you have a link?
As for why fork and not PR, Salvatore already closed the Transactions PR and said he didn't want Redis to go in that direction. And when searching about SSL/TLS in Redis itself, I found this: https://redis.io/topics/encryption , read the implementation of spiped (it uses fixed 1k block sizes), then realized that SSL/TLS is the right answer in this situation.
Could transactions be a module? I was about halfway through the cluster transaction bits as a module when I hit a collection of "oh wait, I can't even call this entire class of things unless I create new module wrappers for both directions" problems. Then I just added a new .c file, new .h, did the right includes, a make clean && make, and my life was 10x better.
Edit: Also, this just includes redis-benchmark, redis with SSL/TLS, etc., is still a couple weeks out. I need to get redis-cli and redis-sentinel speaking SSL/TLS.
The block of commands between MULTI and EXEC are executed by the Redis server in one atomic action. Redis does not execute commands from other clients while the MULTI / EXEC block is executing. You can put a read command and a write command in the block, but unfortunately you can't make a decision in between the commands. Your client code receives the results of the commands in the block after the whole block has been executed.
You can achieve what you're looking for in a Lua script. The script constitutes a single "command" from the Resis server perspective, so no other clients can alter the data while the Lua script is executing. Since it's a script, it can read keys, make decisions, and write keys as it executes.
https://redis.io/commands/eval is the starting point for investigating the use of Lua scripts with Redis server. There are also some good tutorials published on the Web.
One thing to be aware of is that a complex Lua script will make the other clients wait longer to have Redis process their commands. I.e., a Lua script that does too much will make Redis slower. Keep your Lua scripts as small and fast as you can.
Pretty much a perfect use case for redis.
Use ZINCRBY, and then ZCARD in combination with ZREMRANGEBYRANK to trim to whatever maximum length you're looking for.
The process would look something like this (pseudocode):
ZINCRBY top:domains 1 $domain $count = ZCARD top:domains $trim = $count - MAX_LEN - 1; if ($trim > 0) { ZREMRANGEBYRANK top:domains 0 $trim }
That should be pretty performant but keep in mind that the trimming of the low scoring elements can be expensive if you allow the set to grow way beyond the intended maximum and then attempt to trim an enormous number of entries in one go.
First learn redis, then learn to use the hiredis client. Look for books or tutorials that introduce redis. Or just read the docs if you're brave: https://redis.io/topics/data-types-intro
There's a redis-cli
you can use to interactively query redis: https://redis.io/topics/rediscli
Don't use KEYS *
, especially not on a production instance and one with lots of keys. Redis is single-threaded and one long-running command (like KEYS *
) can bring a lot of other connections to a grinding halt.
Instead try to use something like redis-cli --scan ... | sort | gzip > keys.gz
to produce a sorted, compressed list of all keys on the server. Most likely you'll see a pattern emerging for all the keys and then you can use the TYPE key:name
command (inside redis-cli
) to find out what some of the keys represent (sets, lists, zsets, etc.)
If the developers followed redis recommendations and used something like ":" as the separator for key-name components (think "/" for file paths), then you can use the following to get a general idea about the distribution of top-level key components (which is useful if you want to find out where most of the data is stored): zcat keys.gz | awk -F\: '{print $1}' | uniq -c
.
Happy hunting and, please, don't use KEYS *
(see Warning on this page: https://redis.io/commands/keys)
> Redis would be great since it has a ttl data. But also a disadvantage in case redis crashes since im gonna lose the count. Logging every api request would be necessary since from the business standpoint is highly essential for it to be tracked.
You should take a look at Redis persistence.
What part of the config file do you mean?
AFAIK Redis still is completely in-memory and there are no plans to change that, see the FAQ: https://redis.io/topics/faq. If the configured memory limit is reached keys are being evicted.
Maybe some sort of pub/sub system can send a message from one app to the other.
I would normally use something like rabbitmq for that, but if you are already using sidekiq, you are already probably using redis.
https://redis.io/topics/pubsub
https://stackoverflow.com/questions/22644761/redis-pub-sub-on-rails
Just use incr on a key each page load... If you wanted to track hits by day just name the key the date or something and then have a process to collect the stats later. You can even expire the stats automatically by setting an expiration on the key.
I have allot of process performance counters that I do this to to track stuff like the number of users connected to a stream or number of lazy write processes running / waiting (I cache db writes so my clients do not have to wait for persistence and then have a script that is continuously checking for new inserts to run on the database). hyperloglog is another option, you could even use lists if you wanted to but incr is the way to go when tracking exactly one metric.
Hey, that's a great point. I'm fairly sure the client library would escape \r and \n. The RESP page at the bottom suggests just finding \r and forgetting about the rest.
I should probably verify the next character is \n though.
From my understanding there are a few reasons you might choose to use Redis:
I've liked using Redis so far, so I'd suggest to give that a look. It's fast, persists to disk if desired, simple (key value look-up) but supports more advanced stuff, noSQL, etc.
Redis: https://redis.io/
Go-lib: https://github.com/mediocregopher/radix --> Other libs exist of course, this is just one I've used before.
No, there is no implicit way to do that.
What you need is a locking system, just store information somewhere that it has already run, or only one minion is running it, and you could use execution modules to store that data.
One way it could be done is to use a redis database, you will need to add setnx support to the current redis module, but you could setup a lock https://redis.io/commands/setnx
Other than doing something like that in jinja, nothing native in salt exists to do what you want.
You can do this incredibly easy with Redis lists and blocking pops. Put the jobs into a list , have the clients pull the task with BLPOP when the client is finished it puts the task into a "complete" list and the "task manager" listens on that list with BLPOP
Turns out what I said is not correct.
https://redis.io/topics/cluster-spec
The reason for requiring a majority of masters appears to be to do with netsplits. When a master goes down, the remaining masters have to decide what replica to promote, however if there is a netsplit then it's possible that actually all the masters are functioning and connected to clients but unable to talk to each other. In that case, the minority side would be expected to stop accepting writes until they can reconnect to the other side of the split.
Replicas participating in majority consensus could result in two masters operating on the same key partition, leading to merge conflicts when they reconnect. Instead, if a majority of nodes are down, the cluster assumes it is on the small side of a split and shuts down to maintain data integrity for when they reattach to the potentially still running cluster.
First idea out of the top of my head is that you could use a series of redis key-value pairs, with an appropriate expiration time. In this way they get automatically deleted after a set time. See https://redis.io/commands/set
Wait! Redis does not evict keys, it has the possibility to, but by default it does not.
Redis, by default works in memory, so 256Gb of data is too much for Vanilla Redis, however RediSQL has the possibility to store data in the hard disk.
When you create your database you just specify the file where you want to save it, like so:
REDISQL.CREATE_DB DB /home/ZoMoBloNo87654321/project/database.sqlite
Of course, it will go a little bit slower, but it will work without any issue.
If the file already exists, RediSQL will try to use it, so you can load external SQLite databases.
If you don't mind I will leave you with the documentation to redis lru: https://redis.io/topics/lru-cache Please note that this can be switched off.
Also have a look at the RediSQL doc: http://redbeardlab.tech/rediSQL/references/ where you can find more details on how redisql works.
If you have any other question, please let me know :)
GEOHASH has some interesting properties. For example, you can remove characters from the right hand side to get a hash that points to the same area, but less precisely. Maybe you could draw inspiration from that.
However, keep in mind that clever methods exist for unmasking users based on "anonymized" data. See the Netflix Prize incident for an example.
I don't know what is NCache, so can't say if it is better or not.
For Redis you can store data on disk with a small delay, more is here https://redis.io/topics/persistence. So you can use it in "traditional database" fashion. Still getting advantages of Redis.
I heard about cases when it was used as a primary data source. But never tried for myself, tho. There are many ways to store such session data. Probably you can do some architecture review to consider better options. Since there is a lot of ideas:
Agree, documentation can more friendly.
Publish, Subscribe and PSubscribe are general commands. You can create your own channels, publish messages and subscribe to channels. The documentation for psubscribe addresses the general case where you can define any pattern you like.
Keyspace notifications are system generated notifications, and hence the naming convention is designed to not have conflicts with user defined channels. The specific patterns are described in this article - https://redis.io/topics/notifications.
16 is the default number of separate databases redis uses: https://github.com/antirez/redis/blob/unstable/redis.conf#L183
For more details check the SELECT command: https://redis.io/commands/select
I only use DB 0 for my applications. In some cases I use a DB other than 0 for unit-testing.
Bee-queue creator here - glad to hear it has been useful for you :)
Point of clarification - we do utilize pub/sub for some things like notifying the producer of a job when their job has been completed, but it's not the main mechanism by which we queue/deliver jobs. If we used pub/sub to deliver jobs, every worker would get a copy of every job and it would be hard to make sure just one worker processes each job - plus, if there were no active workers online when a job is enqueued, it would be lost forever.
We actually use a Redis List as our queue, which allows us to survive failure scenarios like:
If you run your Redis without persistence, then yes, Redis restarting would mean you lose all waiting/in-flight jobs, but Redis is actually better at persistence than most people think and some folks even use Redis as their primary data store.
While other options like SQS or RabbitMQ are more purpose-built message brokers, and there are situations where you would prefer them to Redis, Redis is much easier to get started with and is also a much more versatile tool - you can eventually use it for much more than just your job queue. Redis is also pretty easy to run/maintain, and if you don't want to set it up yourself you can even start in a few clicks with a RedisLabs free-tier instance, which is more than enough to play around with a small Bee-queue setup. Their paid tiers can easily take care of persistence for you, too.
Hi,
There is actually a lot of good documentation on redis.io. Here is the quickstart guide that will walk you through installing Redis and using redis-cli:
https://redis.io/topics/quickstart
Also, there are packages for Redis on pretty much all Linux distros and it's available via brew on OSX.
Once you've familiarized yourself with using Redis in redis-cli you can start playing with hiredis. The hiredis API is quite good and should be easy to learn if you know C.
It takes a huge amount of load to use up any significant amount of cpu on a redis instance.
The high latency is likely from the inheritant slowness of the underlying virtual machine.
I suggest running some of the diagnostics listed in the docs on this page: https://redis.io/topics/latency to test for the intrinsic latency of the setup you have. If that is the case, you're somewhat out of luck. Newer vms with better virtualization at the hardware level often have a lot better latency as do bare metal machines, so upgrading your hosting might be the only thing you can do.
I have found a reference on incompatibility Note however that Redis Cluster 4.0 is not compatible with Redis Cluster 3.2 at cluster bus protocol level, so a mass restart is needed in this case.
Fundamentally, Resque uses Redis' BRPOP function to perform a blocking pop off of a list. Redis is single-threaded, and clients are served in a first-come-first-served fashion, so if you have Ruby clients listening on a given list, you aren't guaranteed to be able to receive jobs to your non-Ruby worker (since they'll get spooled off and handed to the first waiting client).
Fundamentally, all you need to be able to do is something like:
# Produce jobs in your client redis_client.lpush("list_name", some_job.to_json)
# Consume jobs in your worker (Ruby code, but you get the idea) while true job = JSON.parse(redis_client.brpop("list_name")) # Process your job end
Redis deals only in primitives and strings, so the job is likely a string in some format. I'm not sure how Resque encodes its jobs - if it's encoded as JSON, then that's easy enough - just parse it in your receiving worker and decide how to handle it. If it's a marshaled Ruby object, then you're out of luck - it's nontrivial to unmarshal from non-Ruby.
Personally, I wouldn't use Resque here since all you really want is an interface to LPUSH/BRPOP, and you plan to write the workers yourself, which is where the bulk of Resque's value will be. IMO, just use Redis directly.
Looking over the documentation of phpredis it looks like that version of set uses setex under the hood. https://redis.io/commands/setex
Setex uses seconds by default, so you're actually setting that key to expire in 2500 seconds rather than miliseconds.
Going by the docs here, to use the higher precision milliseconds api you'll want something in this form instead.
// Will set a key, if it does exist, with a ttl of 1000 miliseconds $redis->set('key', 'value', Array('xx', 'px'=>1000));
A late response, hoping it is still useful.
If you are using sentinels and your master/slave set is not fronted by a proxy that will auto failover your connections, you should use a sentinel-aware client (i.e. Redis driver) to connect. Check your client documentation to find out about it. For e.g. node_redis doesn't support sentinels yet but ioredis does. List of all Node.js Redis clients: https://redis.io/clients#nodejs
If you have already gone the polling the sentinel route as discussed in the other response, ensure that you read this doc (https://redis.io/topics/sentinel-clients). It has guidelines for Redis client developers on how to interact with sentinels and would be useful since you will be attempting something similar.
Redis fairly easily clusters. -- https://redis.io/topics/cluster-tutorial
I'd split postgres off onto it's own node and have the two web front ends talk to it that way, but if you want redundancy you'd need to have two postgres nodes replicating in master/master -- https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling
Nginx is fairly easy, you don't really need to do anything special to make it cluster aware, unless as a previous poster mentioned, you're allowing uploads and need to sync those, to which I'd probably say the easiest solution would be to have your upload target be a network share mounted on both machines or run a frequent rsync as a scheduled task.
You'd need to do something like haproxy in front of the nginx nodes though, or go a little ghetto and do round robin DNS>
It's not just a simple key-value store. It supports hashes or single level arrays if you will. So you could store at key "user:35000" the data of:
username: foo
email:
something: else
Or in your case, each key can hold a single MySQL row. I would highly recommend reading this: https://redis.io/topics/data-types-intro
There's so many different value types that you can use.
I'll disagree with the other comments: struct
is not just for legacy software. Any time you need to handle binary data, it's extremely useful. Most protocols you'll run into on the web (HTTP, etc) are text-based, but not all.
As an example, here's the redis protocol - it's mostly text based but has some sections where treating it as binary would be very useful. Now, in most cases if the protocol's popular enough someone's already written the code to do all this for you (redis definitely has at least one python library for it), but if you ever end up wanting to be the person who writes those libraries in the first place, you're going to want struct
:)
Thanks. I always thought of Redis just as a cache and was not aware of Redis RDB and AOF. For those who are interested here is a link
I think that sums it up pretty well. The description of redis at https://redis.io/ is "Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker.". Without knowing anything about the application that you want to build, I personally wouldn't have good feeling to use Redis, that's defined as an "in-memory data structure" as the primary database for it - compared to a NoSQL database like MongoDB or a relational database like MySQL or PostgreSQL. I've been using Redis in projects as a cache (similar to memcached) besides the main database.
Good summary, few remarks:
>redis: It has the least querying capability.
This part is debatable, yes, you have look everything up by a single key (unlike DDB) but it has a very rich set of data structures and as a result operations you can implement on top of it. https://redis.io/topics/data-types-intro You can have things like queues, topics, shared locks which are very performant.
Also worth mentioning MemoryDB which is redis with persistence.
Sleeping for five seconds can certainly solve the problem you've presented, but I'm concerned about two things:
If you are interested in queueing the action up, one of these will work for you:
For us, it's Redis - specifically, the combination of MULTI/EXEC for the atomic behaviour, https://redis.io/topics/streams-intro for the request queue, and the various data structures for the state changes (SET, INCR, PFADD etc.).
That's only one option, though - it works extremely well for us, but requirements do vary, and it may not always be the best option. In other scenarios, this might be handled by database transactions - or distributed transactions as you scale up (e.g. in the past, this might have been something you'd build with https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms681291(v=vs.85) MSDTC).
The concept can also be applied via journalling or other strategies, and it's quite possible to build this on top of HTTP or websockets or similar protocols as a transport mechanism... it just gets much harder to reason about things if a microservice request can end up in a partial state.
If you're looking to save data like a hashmap, look into redis, although this is only an in-memory DB, if I remember correctly. You can try using it in combination with H2, which someone else already mentioned.
What kind of Redis Data type are you trying to work with? A Hash, List, Set, Sorted Set, string?
You mention a Set, but that only takes single value, so naturally it will only return a single value.
I find it's easier for me to work with Go-Redis(which is great!) when I initially manually input my data in Redis with the standard Redis CLi.
If you want to make multiple inputs into a data structure quickly, use the go-redis `TxPipeline` method.
Auto Scaling redis for ram usage is pretty complicated. I believe you're going to need to have redis in clustering mode in order for it to split records amongst shards. Then you'll need to set up auto scaling policies based on ram usage or number of keys. If you want to walk this road I'd recommend something like AWS elasti-cache as your managed redis implementation.
https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-elasticache-redis/
However, it's probably overkill for you. I bet if you just set a reasonable maxmemory limit and set maxmedmory-policy to volitile-lru your problems will go away.
When you load an image on sites like this, they are sent to the browsers local cache, only a certain amount of these are stored. If you were to click on an image on twitter, and then press off that modal (get rid of the picture), and then click on it again, it would not make an additional HTTP request, that would be slow! What if your internet went out for just a second, (with this implemented, you wouldnt even notice) (and all sorts of other stuff)!
I'm no expert, but... This may be what you are referring to above.
Firstly, Redis provides out-of-the-box hit and miss ratios aggregate for the entire keyspace (via the <code>INFO</code> command). Currently, there's no built-in way to easily get these metrics per key or pattern.
You could achieve something similar by turning on Keyspace Notifications for key-events and key-misses, and then processing the generated notifications to produce aggregates.
That being said, it makes sense to extend Redis so it can provide per-key hit/miss ratios (as well as perhaps hot keys) during runtime as an opt-in feature (at the cost of memory and CPU). I'm not quite sure what the OP means by "similar stats" and "these kind of stats", so perhaps there are more stats that I'm missing. In any case, one way to help make this happen would be to open a feature request in the Redis repository (https://github.com/redis/redis) and have a proper discussion about it.
Redis is written in C and doesn't use Java - infact, it's open-source and any journalist would have been able to figure this out with a single search in their repository at https://github.com/redis/redis.
I think it's people just regurgitating what they see from others, doing no fact checking and then that false information spreads like wildfire - especially when everyone is simultaneously panicing at the vulnerability.
At best, there may be clients created by the community or other organisations that use Java and leverage Log4j - but this has nothing to do with Redis itself. https://redis.io/clients#java
I think you are running redis outside of a docker container. See the instructions here
https://redis.io/topics/quickstart
"Installing Redis more properly"
It looks like you did the
sudo mkdir /etc/redis
sudo mkdir /var/redis
​
But are not running redis as root. You created the /etc/redis folder as root, but when running redis not as root (which is a good thing) it can't read files in that directory. You need to open up permission to this redis folder so that the user running redis (likely yourself) can read and write files in both of these directories.
What you are looking for is likely replication. That allows you to store data on all instances and not have to worry about which instance has the data you need. You may also want to look into sentinel or cluster
As far as how common it is, I would think it would be fairly common for large data consumption. The last thing you would want is to have a single source of failure.
A reverse proxy is different and likely wouldn't help in this case.
Your info command displays:
uptime_in_seconds: 8346
Which means that redis was started ~2 hours ago. When the issue happens, if you run info command, does uptime_in_seconds resets?
If it was, there is something restarting the server. Inspect the container with docker inspect
and check the logs for possible issues with docker logs
command.
If it wasn't, there could be a script running flushdb or similar commands in Redis. In this case, you can use https://redis.io/commands/MONITOR to watch all commands that are run in Redis in order to debug the problem.
There is a "DEBUG" command that is apparently undocumented. You can use it with redis-cli to fill the cluster with fake data.
You might also try mass insert? https://redis.io/topics/mass-insert
Lua simply returns nil for any out of range referencing of array indexes since the non zero-based indexing is just a convention and not mandated by the interpreter. So, the index example is not considered a failure. But to answer your question, we could probably use the debugger for more complex scripts if the error message returned by EVAL call is not good enough? I am not aware of any other diagnosis mechanism I'm afraid. https://redis.io/topics/ldb
I recently replaced my custom Redis queue code (using the poplpush pattern[1]) with Broadway and the Redis adapter[2] (which uses the same pattern so no code change to push into the queue) and it was mostly quick and painless.
Assuming you have root access to the server try running mysqltuner to get an idea of what performance optimization might be possible server-side. Make sure you know what you're doing in terms of configuring the MySQL server though, you could create more problems than you solve, depending. You may consider consulting a sysadmin or DBA for this.
Often MySQL performance bottlenecks are long running SQL queries that suffer from poorly optimized code. You may consider enabling slow query logging and reviewing the slow query logs. Optimizing your code to perform more efficient SQL queries may help address some performance issues on this front.
You can also get an idea of what SQL queries are running in real-time with mysqladmin pr
. You can get a status report for the MySQL service at a specific time with mysqladmin stat
.
You may consider using a tool like htop
or top
to review server load and processes in real-time while hitting your page with HTTP requests. This may help you identify bottlenecks that are causing MySQL CPU load avg or memory utilization to spike.
If you're not able to address MySQL performance issues by configuring the server, optimizing your code, or by other means, you may consider implementing some type of database caching. One example would be Redis.
If your web-server is overloaded by serving your site plus serving database queries, you may consider either scaling up by upgrading your server to provide additional resources to those processes or scaling out with some type of load-balancing solution or perhaps setting up an additional server just as a remote database thereby separating the web-server and database server.
The results are expected because Redis is somehow single threaded. As described in the redis benchmark page:
> Redis is, mostly, a single-threaded server from the POV of commands execution (actually modern versions of Redis use threads for different things). It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one single Redis instance to a multi-threaded data store.
One thing about Redis is it's far from being an "install and run it" system. You need to spend a lot of time tuning it to our operating system and hardware. Even running a VM on a different hypervisor can be a huge difference. For example, the forking Redis does doesn't really like XEN VMs because XEN is slow at copying page files. Redis actually has a great article about latency, including how to run the built-in latency tests and monitor things.
https://redis.io/topics/latency
Having said that, Redis can handle a large amount of commands very fast. Running redis-benchmark on a 4gb Linode using the default docker image (no custom config), I can easily get 65,000+ gets and sets per second.
​
On the Laravel/App side, something to keep in mind with caching in high traffic scenarios is contention. Say you got an uncached resource and it is used on a bunch of pages (on content systems, authors are a great example of this). You get a handful of requests to pages that need this uncached resource. Each request doesn't know about the other, so each request is attempting to load the resource then cache it. So you now get #of_commands_to_cache*#of_request commands to cache the same resource. There's a couple of options to counter this. One is to use cache warming, then refresh cache in a background process, like a queue. The other option is to use atomic locks when caching items. Laravel actually has atomic lock support built in:
Good to hear! What’s there looks like a good start. I suggest taking the time to read through An introduction to Redis data types and abstractions on their site. It’s a pretty long read but has realllyyyy good info and a comprehensive overview of the majority of Redis. In fact all of their documentation is pretty good.
Reddit may archive posts like that to reduce server load. They could relocate content that is old to another server that doesn't get modified. There's a fair chance that a lot of the data we see is stored in an in-memory database like Redis (just similar in name by coincidence), but there's a limit to how much data can fit in it, so by moving old content to a static non-memory server it is able to keep ridiculous amounts of data while keeping things running smoothly.
Just one another idea for storing information with automated age based deletion.
Use redis: https://redis.io/commands/expire
This of course only makes sense in case I can utilize tedious for something else.
Also you need to consider if the data needs to be persisted or it’s ok to be lost and only kept in memory.
ActionCable uses Redis pub/sub. Redis pub/sub automatically forwards messages to all nodes of the cluster, so it doesn't matter which node you connect to. Redis will handle scaling of nodes for you. You don't have to worry about that.
https://redis.io/commands/PUBSUB
>Cluster note: in a Redis Cluster clients can subscribe to every node, and can also publish to every other node. The cluster will make sure that published messages are forwarded as needed. That said, PUBSUB's replies in a cluster only report information from the node's Pub/Sub context, rather than the entire cluster.
Thx for details! Was interesting to read.
Regarding Redis: we experienced problems with because on replication Redis copy itself, if copy-on-write is not enabled in Linux kernel and there is no enough RAM (if Redis ate 500MB total RAM should be at least 1GB) Redis just killed by OOM.
You have to configure something like:
echo vm.overcommit_memory=1 | sudo tee /etc/sysctl.d/overcommit.conf
& maxmemory 200mb
& maxmemory-policy allkeys-lru
.
Other processes in system (like uncontrolled Apache forks or PHP) could cause Redis legitimately crash and setting limits (mem/fork) per app could be very complicated.
I see no problem if you go with containers because you can guaranty RAM per app. Like there is no problem with having everything on a huge single server if you can slice resources. It doesn't matter if it is LXC/LXD, systemd-nspawn or Docker. They solve problem of resource contention.
re: inter-service communication, if you're already using redis and you're not extremely latency sensitive, check out redis streams.
re: "accurate view of what 'cash' is available" - your worry sounds like a dirty read error. In your case, your portfolio service shouldn't rely on raw balances from an exchange but instead some view of "certified balances." If say you're transferring assets between exchanges, the transfer process would wait to updated the certified balances until the transfer has been completed on both ends.
Perhaps something like Redis if latency is a real concern. Might need to think about how to integrate that with a more traditional DB as well, as someone else mentioned Postgres is probably the right answer.
It is because if you want to use a non-default database in Redis, Laravel has to send a command to select the configured database. This means an extra network request for each of your Redis command. Reducing I/O is key to scalability.
To be honest, the very simplest method is to issue a SAVE operation which will export the dataset to disk, then copy and load it into the new cluster. SAVE is synchronous and blocking, so depending on your environment that might not be possible. There is also BGSAVE which is async.
https://redis.io/commands/save https://redis.io/commands/bgsave
Installing Horizon will provide a nice little UI to manage Redis jobs. As somebody else suggested, you should check if your job is being dispatched to the queue multiple times on accident.
Without a queue worker running, ssh into Homestead or your dev environment, and run `redis-cli`. Then `KEYS *`. You should see one that is for a default queue. Then `GET queuenamehere`. More info: https://redis.io/commands
Most databases don't have an automatic expiration; periodic jobs are pretty much exactly how to deal with this kind of thing.
However, there are some that do support expiration. These are not traditional relational databases though. Redis is an example of one such. It's not a traditional database at all; it doesn't have relationships, it's not used for persistence, it's primarily used as a cache. You can set a key to expire at a specific time, or after some amount of time has passed.
Also look into Redis. It has sorted sets that are perfect for keeping global scoreboards. It is far less complex to run than Postgres, and could be far faster (because it does so much less).
There was a big hoopla about redis's use of the terms master and slave and the decision was made to change the command "slaveof" to "replicaof" https://redis.io/commands/slaveof
The "slaveof" command also took parameters "of noone" to remove the slave as a replica of the master, and was chosen specifically to denounce slavery.
If you’re talking about looking up a single identifier, that’s not necessarily true. Redis HGET has a time complexity of O(1), for example, so it shouldn’t get slower as your list grows.
> Also Memcached is limited by the amount of memory available in it’s machine whilst Redis may swap the least used data to disk, to free up space in memory. This is done via a child thread.
Is this referring to virtual memory? That was removed a long time ago.
Redis doesn't explicitly swap data to disk. It either relies on the OS to swap (which you probably don't want), or it uses your configured eviction policy when the max_memory
setting is exceeded.