Set the "topics_pattern" setting to a regex like " .*" https://www.elastic.co/guide/en/logstash/current/plugins-inputs-kafka.html#plugins-inputs-kafka-topics_pattern
If you define "topics_pattern", the "topics" setting is ignored.
Hey u/lclarkenz, the AMA is at 08:00 BST (other timezones).
​
I'm also doing a talk on Kafka Connect, which will be at 07:00 - 07:45 BST :)
Might be just me, but that's an odd requirement. I wouldn't be surprised if nobody does that - at least not for backups. It doesn't sound like a wise thing to do.
So many questions and remarks pop up when I read your inquiry: - To be able to put send a document to a queue, every field would need to be stored, not just indexed. Do you really store every field for your documents? > By default, field values are indexed to make them searchable, but they are not stored. This means that the field can be queried, but the original field value cannot be retrieved. - For doing backups, you could use the Snapshot feature https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html - If you didn't mean really backup, but instead an hot standby, you could have a cluster with multiple instances - Most people wouldn't bother too much with doing Elasticsearch backups, unless they need to have a low "mean time to recovery" and a big index. For most use-cases, simply rebuilding the index from source would be enough. - I don't know of any mechanism that tells you that new documents have been added to the index - that's a thing you need to be able to push new documents to Kafka - It seems too much effort - you need a producer service + a consumer service to actually do all the work. There might be some out-of-the-box solution out there (Apache Nifi might theoritically handle this nicely enough) without writing a solution, but you still need to manage extra instances. - Last but not least - you might just have an easier time letting both ElasticSearch instances do the same work. Just push the same data to be indexed in both - or if your pulling the data, pull from the same source.
Hiya!
So I've got a lot of evidence that the technique is solid, if not using the exact solution listed above. We've got customers that do this every day for everything from banking to chat. https://ably.com/blog/dependable-realtime-banking-with-kafka-and-ably
We've gone back and forth on this; if you visited us a month earlier, you would have seen prices. The pricing on our G2 (https://www.g2.com/products/lenses/reviews) is our standard pricing, but it can be more complicated than: x clusters + y users = $z. Sometimes the price comes out to be lower and other cases higher. We don't want to mislead our potential customers.
Understand your point of view on the subject; we are re-doing our pricing plan page and I will bring your feedback to the table. In the meantime, hope you try our product and support community.
Best, Evan
There are great courses on https://udemy.com (e.g. by Stephane Marek) that’s about the price of a good coffee (watch for deals; one going on now). Totally worth it, as it is full of hands on learning. Minimize learning the theory and try it out.
I'd use MQTT (or http) for communications between the weather stations and the data center, and Kafka Producers only from your front end servers to Kafka. Generally, I'd use Kafka Producers when you need high volume and you're in a relatively stable environment.
Here's some slides you might find useful:
https://www.slideshare.net/MatthewHowlett1/processing-iot-data-with-apache-kafka