What is Reddit's opinion of elasticsearch?

That's the point. Contributors didn't assign their copyright.

"The CLA is a license agreement, which enables Elastic to distribute your code without restriction. It doesn't require you to assign to us any copyright you have, the ownership of which remains in full with you."

https://www.elastic.co/contributor-agreement

With the CLA you can make all contributors sign an agreement that gives Elastic the rights to do anything they want with the code you contributed, even changing its license.

https://www.elastic.co/contributor-agreement

"The CLA is a license agreement, which enables Elastic to distribute your code without restriction. "

DISCLAIMER: I am just one dude with some opinions on stuff.

> "How much does specializing in one specific area help or benefit your CS Career?"

It depends on how much mobility you want. Some people are totally cool grinding away at the same stack for 20+ years, some people want to do new things and work with new technologies at new companies all the time. I imagine both approaches have their long-term pros and cons.

> "Does this affect / prohibit you from trying new things, since you won't use them in the workplace?"

Not necessarily. It really depends on the company though. Some companies are perfectly fine with minor maintenance of their old COBOL applications.

My job primarily involves a servlet who's core code is around 12 years old, but that doesn't mean we aren't constantly looking at how new technologies can benefit the product. Part of why I was hired is a recent inclusion of Elastic stack software to vastly benefit our product's searches. So all these decades old Java/TSQL devs are now learning things (not everything) about Elasticsearch.

> "To what extent should you focus to become decently employable with one area?"

To the extent that it benefits your compensation or advancement within a given organization/domain. If something seems in-demand based on job postings, it's probably worth learning.

I used to do LAMP stack development professionally, but haven't touched the technology in quite some time. Similarly, the projects I work on in my free time have drifted away from that stack in favor of technologies more closely tied to my role in the company. Sure this means getting a LAMP job in the future may be more difficult, but it also means the knowledge of my company's current stack is stronger which makes me a more valuable employee (in theory, anyway). Trade-offs and all that.

You can use http://www.solarwinds.com/fr/free-tools/event-log-forwarder-for-windows for sending log to a syslog server.

Other option is to use ELK (Elastic Search, Logstash and Kibana) with Winlogbeat more info

I wouldn't be surprised. They were pretty angry at Amazon for what they viewed as infringing upon their trademarks in the blog post explaining their reasoning for the license changed.

https://www.elastic.co/blog/why-license-change-AWS

> The code that contributors gave in the past is still Apache licensed and always will be.

Take a look here

You can see as of 7.11 the code is not apache 2.0. This is the whole point. This is a move against open source.

[I work at Elastic]

we are dual licensing these products, so if SSPL doesn't work then you can use it under the Elastic License. check out the image under the "The change" section on the blog post - https://www.elastic.co/blog/licensing-change

My company ran an unusually large elasticsearch cluster on ec2. (We had indexes ranging from 5TB to over 7TB at any given time). While our use case is not common, we pushed elasticsearch to several limits that show what kind of issues you could run into managing your own cluster:

I/O Wait with EBS Volumes: Elasticsearch talks to disk a lot. We tried every class of EBS (sometimes, magnetic is the way to go: https://logz.io/blog/benchmarking-elasticsearch-magnetic-ebs/). We consistently hit the EBS bandwidth caps.

Elasticsearch assumes it has unfettered access to the disk, so when you are out of EBS burst balance your cluster grinds to a halt. Instances will fail to respond to other instances, other instances will start promoting replicas -- leading to more bandwidth demands and, usually, a cascading failure of the whole cluster.

We ended yup with SSD ephemeral storage -- which cancelled out any savings we got from rolling our own cluster.

Garbage Collection Pauses: We continued to see long-pauses on indexing which turned out to be garbage collection. Garbage Collection in ES is a "stop the world" event. We were running large instances and giving half the memory to the heap. It turns out this is a bad strategy if your total memory is 60GB. (https://www.elastic.co/blog/a-heap-of-trouble)

These considerations may or may apply to you. We killed as many Elasticsearch Service clusters as self-hosted clusters as we grew. In the end, our desire to tweak and optimize won out and we ran our own instances and handled our own fault-tolerance and backups.

Unless you are planning on massive scale, the Service is worth the extra few cents an hour.

But on what OS? What service?

On Linux you can try simple "service X stop && backup && service X start" (or systemctl stop/start X if you're on systemd. Just check docs) in your script. On Windows you've got Stop-Service cmdlet.

Also: look at the documentation: maybe your service can release a file by sending a specific code/api call? Or maybe can backup files itself (like in ElasticSearch)?

We have kibana pointed at three clusters with tribe nodes. 200 billion documents or so. 210 data nodes (across the three clusters).

When querying data, the complexity of the query will limit the number of panels on a dashboard, as will attention paid to cache/query sizes, file system cache etc. Lots of aggregations will tank query performance.

We also outright ban leading wildcard queues due to cost/complexity.

Edit: added links to conf talks about doing this which have some of our scaling notes.

https://www.elastic.co/elasticon/conf/2018/sf/scaling-log-aggregation-at-fitbit

https://m.youtube.com/watch?v=Vp0W78-__BQ

A devops talk is still high on my list of things to do sometime hopefully in 2018.

In the near term, I'll be speaking at Elastic{ON} 2018 with Chris Burkhart, a Principal Technical Lead on Battle.net for the data platform team. That talk will cover a high level overview of our telemetry platform at Blizzard, and some specific scenarios where it's been invaluable for Overwatch live ops.

Cheers.

Since Elasticsearch 2 default translog durability (index.translog.durability) was changed from async to request. Change it back to async and you should have similar performance as on ES 1.7. https://www.elastic.co/guide/en/elasticsearch/guide/current/translog.html

The wildcard field type is a part of Elastics X-Pack feature set: https://www.elastic.co/subscriptions

OpenDistro, which AWS develops and are the base of the AWS ES service does not include any of the X-Pack features, as they're license incompatible.

Set the value node.availability_zone in each host's Elasticsearch config and use allocation awareness to prevent replicas from being routed to other nodes in the same AZ:

node.availability_zone = "us-east-1b" cluster.routing.allocation.awareness.attributes: availability_zone

If you'd prefer not to fight with setting the AZ in each config, you can also use the elasticsearch-cloud-aws plugin to add the attribute for you.

It's pretty much wrong to call these things out as "mistakes" IMO. All the Zen discovery settings were deprecated over two years ago, they don't do anything any more except emit warnings that you're using a deprecated setting. Similarly, the official recommendation is not to use bootstrap.memory_lock: you should prefer simply to disable swap altogether.

it doesn't currently do automated correlation, it's a platform for easily building this sort of thing and then scaling effectively

dunno about specific stats, but there are some pretty awesome use cases out there - https://www.elastic.co/use-cases

Not sure if this is what you're asking for but... Elasticsearch is used fairly extensively in my field in conjunction with Logstash and Kibana to form a free alternative to Splunk.

I know of at least one university that has a cluster of these types of systems in place to log and index data points from their campus network including Internet traffic and end point logs. You can imagine how much data this would generate per hour.

[I work at Elastic]

in 7.10 there's searchable snapshots - https://www.elastic.co/blog/whats-new-elastic-7-10-0-searchable-snapshots-lens-user-experience-monitoring

Expensive by a personal context, trivial by an enterprise one. I mean, SSDs are pretty cheap these days, and you can buy 'enterprise' grade storage at $1000-$8000/terabyte - that runs seriously fast because it's running in parallel IO to multiple NVMe SSDs.

... but chances are that doesn't matter, because the PoGo dataset isn't reliant on a global view - each player's inventory etc. is effectively isolated from each others, so it supports a sharding database topology very merrily, and things like elasticsearch are very easy indeed to scale wide. https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

I'm not entirely sure what your end goal is, but have you considered using logstash? It's pretty much made for storing logs and uses elasticsearch as its datastore, so easy to query entries and such.

https://www.elastic.co/products/logstash

There is a guide to setting it up with pfsense here (and also an interface for visualising the logs):
https://elijahpaul.co.uk/monitoring-pfsense-2-1-logs-using-elk-logstash-kibana-elasticsearch/

Familiarize with the term threat modeling, quick intro here: https://roberthurlbut.com/Resources/2019/CodeMash/Robert-Hurlbut-CodeMash2019-User-Story-Threat-Modeling-20190910.pdf
Familiarize with core risk management concepts
Look up use case management and terms like crown jewels or top-down/bottom-up/middle out
Read anything you can find on https://correlatedsecurity.com
Grab the old Anton Chuvakin bible on log management; still applies
Get a copy of the Don Murdoch BTHB
Have a look here to get an idea of what to look for: https://www.elastic.co/guide/en/security/current/prebuilt-rules.html
Review the Malwarearcheology cheat sheets
Understand your business and value chain
Understand how these processes are backed by IT and how your specific infrastructure can be attacked
Understand eventual compliance regulations and controls for which you might need to gather information
Involve the vendor/MDR provider
If you are starting from scratch rather focus on EDR than SIEM (80% of MITRE ATT&CK TTPs are linked to the endpoint)
Start thinking in MITRE stages, but don‘t make it a dogma - that‘s not what it‘s for

I've used both of these libraries while working with Elasticsearch for ~1 year and I settled on using github.com/elastic/go-elasticsearch/v8

I went with the official client because it seemed safer from a security perspective. The company is going to have their own security guarantees which you won't get from the olivere client.

To learn how the client worked, I spent a lot of time reading Elasticsearch's documentation and figuring out what I wanted to do based on that documentation. After you know what you want, I then translated it into the Go code.

I found both libraries do not do a great job of making Elasticsearch functionality "discoverable". It also wasn't easy to try new things once you knew they existed with the libraries. Eventually I discovered Kibana on Elasticsearch's cloud offering, which has an interactive online IDE I highly recommend. I ended up designing most of my queries in that IDE with my test data in Elasticsearch's cloud and then translating that query into Go when I was happy with it. That process had the shortest dev loop time and I loved how relatively easy the cloud offering was for setting everything up. But it's definitely not free - I think I spent ~30 bucks a month.

would using apache 2.0 license with the trademark clause have helped in this case? one of the complaints from elastic was the use of the trademark https://www.elastic.co/blog/why-license-change-AWS

Their main product page lists ElasticSearch as “free and open”. I said they shouldn’t do that. Is there a problem with my argument?

Whether they’ve clarified in a FAQ or a statement they made what they actually meant is irrelevant when the heading is clearly there to attract people who would care that it is FSF/OSI open vs some arbitrary definition they came up with.

Synchronously logging to an HTTP endpoint sounds like a horrible idea. For this kind of thing best to use a long-lived connection. You definitely don't want your app to do this.

Best to use something like logstash:

https://www.elastic.co/guide/en/logstash/current/plugins-outputs-http.html

Is a SOAP service your only option? Is your system living in the dark ages?

>>Getting upset about it is also not a valid business strategy. > >Who is this aimed at? I am not running a business.

It's aimed at Elastic, not you.

https://www.elastic.co/blog/why-license-change-AWS

Although AWS did misuse their trademark, I don't think Elastic's actions make much sense. To me it seems like they wanted more control over Elasticsearch's revenue and they're blaming AWS to justify their actions.

ELK: Elasticsearch, Logstash, Kibana

Logstash: operates like a syslog server, taking in logs from various machines or applications.

Elasticsearch: Operates like a query-able database to easily dig down into the logs stores. You can make custom queries very easily to aggregate data, even between different logs.

Kibana: A graphical web frontend for Elasticsearch that allows easy creation of charts, graphs, or outputs from custom queries into a table in order to visualize trends in your logged data.

More info available here on www.elastic.co

Consider Kibana. Free and open source, allows you to build dashboards and visualizations without writing code.

I work with people all the time who brand and repackage.

you can do this with elastic search pretty easily

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html

AWS have managed elastic search, its a piece of cake to maintain and setup.

https://aws.amazon.com/elasticsearch-service/

Git
Drone
- Run linters and full system tests using real dockerized services
- Build artifacts (generating the API Doc for example).
- Build docker images with the code and pushes them to a private registry
Rancher
- Monitor and manage a cluster of VPS
- Deploy / Rollback images on staging and prod environments
Elastic Stack
- Monitor the app

If anyone wants to be a little more sophisticated, you can do essentially the same thing with Twitter. There are multiple ways to do this, but I'm just using the Twitter streaming api to stream tweets into an Elasticsearch index and then I use Kibana as a front end to query those tweets.

Unfortunately, it only sends about 1% of all tweets to your machine, but if more people do the same, then there's a higher chance of catching ISIS related tweets. I think you can do more with the streaming api itself to narrow down tweets based on search terms, so that might be helpful as well.

Twitter Streaming API Overview

How to Stream Tweets Into Elastic Index

How to Use Kibana as Front-end to Elastic Index

Frozen Tier is applicable for on-prem clusters.

It supports the following repository types, some of which can be hosted on-prem:

AWS S3 (and compatible variants like MinIO)
Google Cloud Storage
Azure Blob Storage
Hadoop Distributed File Store (HDFS)
Shared filesystems such as NFS
Read-only HTTP and HTTPS repositories

Frozen Tier is all about leveraging "Object Storage", to cost optimize generally older data that still needs to be queryable, but isn't often queried. As data ages out, query interest in it (typically) drops off. The goal is to move that data to a cheaper storage medium, and leverage a cache to keep frequently queried parts of it, fast.

These assertions are correct:

Cold Tier doesn't use local replicas, it relies on Object Storage to dynamically restore a lost primary
Frozen Tier has a local disk (typically an NVMe drive ~5TB) that it uses to cache frequently accessed portions of up to ~100TB of snapshots stored in Object Storage.

Extreme stark differences with how AWS handles Free and Open Source Software vs how Google handles Free and Open Source Software. Say what you will about Google and their data vacuuming, but when you look at how Google treats open source projects, it really makes you want to choose them vs AWS.

AWS clones an imcomplete version of MongoDB and releases it for production, same thing with elasticsearch and many many other services.

On the other hand... Google takes the time to work with the Vendors of open source projects and gets the vendors to integrate them directly into GCP so you can actually get a legit version of your open source project.

https://cloud.google.com/blog/products/open-source/bringing-the-best-of-open-source-to-google-cloud-customers https://www.elastic.co/gcp

For context: I am one of those crazy people that has Amazon Alexas and then go on /r/privacytoolsIO and tells my friends and family to not let too much of their lives get taken over by google.

If you want something that can actually be useful, learn elasticsearch. It opens up many opportunities to learn about your data because it can perform powerful aggregations that are much harder to do in plain SQL.

> I know that building a mechanism to guarantee that something is only processed once is not trivial

You're severely underestimating how difficult it is.

> In my particular use case I'm dumping messages into elastic search and so I don't ever want to double up on documents and determining if a "document" already exists in this case would be a difficult and expensive proposition because there is no unique is aside from a timestamp but that could legitimately doubled up on this situation as many events can happen within a second.

If you don't want to insert duplicates into ElasticSearch, then you should use the ID field to identify the document. You could simply take a hash of the document and use that as the ID.

Building exactly once delivery is something most people consider to be impossible. I'm not entirely sure that's true, Kafka recently announced that they have achieved this, but for most mere mortals I think we should just pretend it is because it's so difficult to achieve it might as well be.

Yes. It's really quite clever - it's all about automated anomaly detection, and seeing 'aberrant' patterns. And then deciding if those aberrations represent people cheating, and classifying stuff that matches that sort of pattern as 'probably good' or 'probably bad'.

It can work in near realtime, but there's no real need - and in many ways it's not useful to do that - it's far better not to give feedback on the 'triggers' - and just gather information on cheating patterns for use next time, and then ban all at once in the 'wave'.

I've been doing this on a relatively smaller scale using Elasticsearch Machine Learning

I've been doing analysis on logging from servers - it's a similar sort of problem, you've got an awful lot of 'noise' (e.g. stuff that's not a problem) to sift, so you need to pick out the signal from that. I would assume a similar technique will work for spoofer detection.

So, I have probably had this conversation with 100 different people.

My honest answer is that by far the most preferred solution is to denormalize your data. The second best solution would be to do application side joins with multiple queries.

To be clear, you can do queries to find parents of children, so it is possible to do what you mentioned. However, I see many database-minded people try and do things with parent-child relationships and nested objects and put a lot of effort into something that is ultimately abandoned (almost always due to scalability or performance issues). You may be the one that figures it out, but my experience would tell me it is unlikely.

I started off in the SQL world and know that the Elasticsearch paradigm is different, and I empathize. ES is designed to return results back in milliseconds, so things like an application side join are quite practical. Normalization makes queries lighting fast, if at all possible.

Try to use Elasticsearch like a search engine, not an RDBMS, and you will be much happier.

Good luck 😀

Have you looked at using Elasticsearch's Docker images? They've also got Logstash and Kibana images which pretty much fits all of your requirements.

[I work for Elastic]

As others have pointed out, the best-practice for a production cluster, starts at 3 nodes. Here's what the docs say:

> High availability (HA) clusters require at least three master-eligible nodes, at least two of which are not voting-only nodes. Such a cluster will be able to elect a master node even if one of the nodes fails.

If you're running in Elastic Cloud, you can provision smaller nodes and get a 3rd Master node (which only needs 1GB RAM) for free. There's a nice configurator here. You'll also be able to use Machine Learning there, too.

We run elastic clusters as a service out of Kubernetes with ECK. It's great. With the correct operator running a database on k8s makes a ton of sense.

https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html

I also work at Elastic.

Hopefully, this should clear this up: https://www.elastic.co/pricing/faq/licensing (updated to include more information)

The TL;DR (Please read the whole FAQ, anyhow) is that if you are already using the default distribution under the Elastic license, it's the same as it has been, per this paragraph:

> If you download and use our default distribution of Elasticsearch and Kibana, nothing changes for you. Our default distribution continues to be free and open under the Elastic License, as it has been for nearly the last three years. If you build applications on top of Elasticsearch, nothing changes for you. Our client libraries continue to be licensed under Apache 2.0. If you use plugins on top of Elasticsearch or Kibana, nothing changes for you.

If you are building from source/modifying the source and compiling it yourself, to host a service, you can reach out to:

Hiking the PCT with the Elastic Stack

A nerdy article about using the Spot's API to track someone hiking the PCT.

This will work, but it's pretty limited, most searches now use a least a fulltext-search, to match words in different order, words that are close etc...

The best it so use something like elastic search.

That would be a question for Elastic as they'd have the numbers.

My scope is within my company, so I don't know how big corps deploy ES. ES does offer a hosted solution which I believe runs on either Google or Amazon which is almost definitely Linux based. You might be able to cross reference companies on the Use Cases page with other projects to see if they lean towards Linux or Windows...

Linux is definitely the target environment for elasticsearch, Windows support came along later. So I would think big corps would stick with Linux for "better support".

Sadly the way search engine software works is that it has to load all of the documents up to a specific page then throw away all of the results for the earlier pages leaving just the results for that page.

You can read a slightly more indepth description from Elasticsearch. https://www.elastic.co/guide/en/elasticsearch/guide/current/pagination.html

Yeah, you're looking for log searching and aggregation. Parsing, in this context, means at time of ingestion. look at /u/packet_whisperer's suggestions of ELK and graylog.

The error message you are getting shows kibana is having trouble parsing its config file (kibana.yml).

If you are using kibana < 4.2 using server.port is not yet supported and it is just port instead: https://www.elastic.co/guide/en/kibana/current/kibana-server-properties.html

If you are on 4.2+ you still would want to look at your config file around that line, as that is where the error is happening.

Getting an ELK stack up and running acceptably in production is non-trivial, and requires a lot of research. If you're looking to determine whether ELK can replace Splunk quickly, it's probably worth the money (hey, if you can afford Splunk...) to get some support directly from elastic.co about sizing.

Or you could try their 'as a service' offering where they manage the hosting of it for you, https://www.elastic.co/found. You can develop on a small cluster quickly, see if it fits, and if it does, $$$ to scale it up, or less $$$, more time and hassle to build it yourself.

Have you reviewed the documentation?

For example https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html has some initial tips (including noting that more shards doesn't necessarily mean better query perf), and that page will open the correct section of the index to link to a few other basic tuning pages.

Elastic was angry that

Amazon was providing Elasticsearch as a service
Amazon was apparently violating their trademarks

so they switched its license from Apache 2.0 to the Server Side Public License with that as the justification. For the record, the SSPL is a non-free license claimed to be open source. It's a copyleft license based on the AGPL practically designed to cripple cloud service providers by forcing the release of source code of anything used to make the program available as a service.

Amazon later forked Elasticsearch and created OpenSearch, still under the Apache 2.0 license.

that second one is beautiful. it probably wouldn't take much work to get everything restarted- if you swapped your syslog for elastic (though I don't know if this is available on the free tier) you could set up a metrics threshold alert that fires when it doesn't see packets on port {whatever} for {some period of time} and fire a webhook back to HA to turn on all your stuff

I can't speak for the EU, but the market definitely exists in the US. I've met quite a few different people who consult full-time as subject matter experts on things ranging from Java runtime optimization (she worked for AMD designing processors, very cool lady), various PHP/C#/Java frameworks, and different DevOps tools. They basically say "here's how you should be doing things" and rarely write any lines of code as part of a contract gig.

However, you're also competing with established vendors in many cases. My org is primarily a Microsoft shop and get practically all the resources we could ever need from Microsoft. We have no need for a SQL Server consultant, an Azure consultant, etc. We call our regional rep, take a trip to our local tech center, and sit with some engineers they've flown in for us. On a smaller scale, my old org (infra monitoring) had a handful of contractors they'd refer people to who needed a bit more "hand holding" or had a particularly complex setup. Orgs like Elastic have similar offerings that they provide. That may or may not be a problem depending on what technologies you specialize in.

The question you should ask yourself is how capable you are of selling your talents. You could be the most brilliant engineer there ever was. As a freelancer/contractor, that doesn't matter if you can't sell those talents to businesses.

Yes and no.

It would be better than what exists now but it wouldn't be great.

Google search is relatively slow compared to Elasticsearch. Using Google search also means that Reddit can't control when new content is crawled and added to the search index. It may be very quick or it may take a day or more. Larger, more active subreddits may get updated faster than smaller subs, too.

It also greatly reduces the site's ability to add features to search such as different ways to filter the results. If Reddit had their own search it would be possible for a user to search only their own post history to find something they posted 2 years ago that has info they want to share again now, for example. That's not possible with Google search.

High speed full text search is a very interesting topic (at least to me) and I've spent a fair amount of time researching it and working with Elasticsearch in particular. ES is used by some very large sites including the only sites larger than Reddit that aren't search engines or owned by search engines: Wikipedia and Facebook. Other large scale users include Netflix, Tinder, eBay, Dell, Github, etc.

All things considered ES is probably the best solution for Reddit and I would be surprised to find they aren't already working on it.

I run an ELK stack fed by softflowd on pfsense (netflow input to logstash) to track per-device bandwidth usage. Logstash has InfluxDB output. You should be able to do something with that.

Firstly, let's clear one thing up: queries do not happen "instantaneously", they happen in either a discernible or indiscernible amount of time. Google queries may seem instantaneous, however that X milliseconds is just indiscernible to a human in some cases.

Here's some options:

1) Pre-compute query results (effectively creating an inverted index) -- better if the data doesn't change frequently

2) Use ElasticSearch, which is designed for high-scale field matching queries

Going back to the query time, you need to set parameters of expectation. Is it 100 queries or 500 queries concurrently? That's a very wide range. You should have a defined:

With X concurrent queries we want the average query time to be Y milliseconds.

Take a look at this documentation.

You're right in that (bu default) adding a node will not free up space on the first node, because shards are replicated over instead of distributed. You do gain performance though in that once shards are fully replicated and rebalanced (so that each node has more or less an even amount of shards) the workload when querying elasticsearch is distributed. You also gain resiliency through replication because you can turn off one node and the other one will become the replica set primary.

Now if you have 400gb worth of data and want to have two nodes handle half of it each then you could set number_of_replicas to zero. That would in effect turn off replication. Or you can compromise halfway, and have three nodes in your cluster with one replica, giving you 33% disk space savings.

Another thing you should do is have the data nodes in elasticsearch by themselves. Kibana and logstash would read and write from/to their own client nodes.

https://www.elastic.co/webinars/introduction-elk-stack

There's talks of integrating it with Squil, ignore splunk/OSSIM unless you are huge/want to burn money.

Watch a few videos on people dumping all their logs/pcaps/etc into Elsa, normalizing with Logstash and custom views with Kibana.

This is not really the right approach. Something like elastic search which has a couple different php clients is the best way to approach searching. More complicated yes, but will yield much better results and be much more flexible.

It depends on the value of the word safe (As all software contains bugs, but some you may never ever encounter, and others can be bad for you), but as I understand it, it means there will be no more releases of that particular version.

7.11.x will not have a higher value for x. That doesn't mean that it is not safe, but if you need a fix or feature, you should go to a newer release that has it, since this will not be backported to a 7.11 version, if that makes sense.

EOL means a bit more to subscription customers (https://www.elastic.co/support_policy has a bit more on that), but that's the general gist of it.

[I work for Elastic]

Elastic recently launched a feature called Searchable Snapshots[1] that allows you to "search your backups". The new Frozen tier[2] utilizes this feature, making it easier to store data in cheaper storage mediums (e.g., S3). It caches common requests to that tier to improve speed[3].

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots.html

[2] https://www.elastic.co/blog/whats-new-elasticsearch-7-13-0

[3] https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots.html

Probably a bit of an anti pattern to run two intensive applications on the same host without probably segregating the resource usages (I.e cgroups). If one Java process is trying to allocate direct memory buffers but the OS is under memory pressure then I’d say increased gc time and more frequent gc events are likely.

You could consider changing the index store type if nothing else, though changing it to something like niofs will likely incur a larger context switching penalty and overall cpu overhead for disk io

https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-store.html

(I work for Elastic)

May also be helpful to read this other blog that we published today. It clarifies the license change. To be clear, it is dual licensed between the Elastic License & SSPL (not only SSPL). We also talked about the future of the Elastic License (also seeking feedback).

https://www.elastic.co/blog/license-change-clarification

We didn't end up using it, but we have looked into it quite a bit.

It mainly seems to focus on important services that ElasticSearch keeps locked behind the xpack paywall. The main things we were looking at:

Kibana Authentication
- Elastic only allows basic auth. OpenDistro has LDAP and SAML options.
Alerting
- Elastalert is really the only alerting option if you aren't xpack or OpenDistro. It works, but it's definitely lacking. We're trying out a GUI (praeco) which should hopefully help out with that.
Machine learning
- Totally locked behind elastic paywall

That said, we ended up not using it. Main reasons:

You are locked into the OSS Version. You lose a lot of features when you drop from basic (which is still free) and OSS. Us even more since I think there are some things in the k8s version that aren't in the basic version.
The security is based on SearchGuard, which Elastic is currently suing. (That said, AWS has publicly stated support for SearchGuard so that might deter further action).
Less frequent updates

I can't speak for versions as old as this one, but in modern Elasticsearch 21GB is a pretty small index that's probably best suited to a single-shard configuration. A common recommendation is to aim to have each shard in the 20-40GB range: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

> how to present my past experience in a way that would be attractive to employers even though I may not have experience in their specific language of choice

"Seasoned engineer looking for new opportunities with <technologies that are in demand>".

> how to make it known that I'd be willing to be flexible on salary given that there'd be a learning curve

Don't do that. Just ask what the salary range is for a position, and either be OK or not-OK with landing somewhere in that range :) If they ask you to place yourself on that range (they probably will), you're certainly welcome to place yourself on the lower end. Don't go throwing out numbers or vague "i'm cheap" sentiments. Get the range they've budgeted for first, then work within those bounds.

> advice on what to brush up on without knowing where exactly I'd be applying in the future

I'd do some analysis to see what sort of technologies are in-demand in the Orlando area. I use the Indeed API and build some word clouds with Kibana every now and then, just for the sake of seeing what sort of tech experience companies in my area are looking for.

Then, brush up on those things well enough to be able to do some koans and/or trivial exercises.

This is pretty much the definition use case for ElasticStack Alerting, I'm not sure what limitations you have but you can self host the community edition for free.

Also there are several great SaaS solutions I have used in the past but they only offer a very low free tier(500MB/day) and after that you have to pay.

ELK/ElasticStack - My go to for a free self hosted option.

Sumologic - SaaS paid solution pretty much the same thing.

DataDog - Also very popular SaaS solution.

Refer to the documentation here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html#_consider_mapping_identifiers_as_literal_keyword_literal

"The fact that some data is numeric does not mean it should always be mapped as a numeric field. The way that Elasticsearch indexes numbers optimizes for range queries while keyword fields are better at term queries. Typically, fields storing identifiers such as an ISBN or any number identifying a record from another database are rarely used in range queries or aggregations. This is why they might benefit from being mapped as keyword rather than as integer or long."

it's not a database, it doesn't have acid compliance

make sure you read https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html

if you install new cluster with els6 next to els2 cluster, you can use reindex from remote api, it will reindex old els2 data to els6 https://www.elastic.co/guide/en/elasticsearch/reference/current/reindex-upgrade-remote.html

Developer here (sia web che desktop). L'accoppiata InfluxDB (o qualsiasi altro Database supportato come Graphite o MySql) + Grafana/kibana produce dashboard eccezionali come questo (kibana) . Puoi costruirti (o farti costruire) delle opzioni in modo che tu con un click vai a modificare la dashboard. Per esempio, vuoi vedere solo i dati di quell'eshop in italia? Selezioni da una tendina, PAESE : ITALIA.

Utilizzo spesso questa soluzione per monitoring business & infrastructure, banalmente "controllo le statistiche del sito uebbe".

Scherzi a parte, se vuoi qualcosa di serio vai da un web developer bravo e competente e gli chiedi di svilupparti qualcosa ad-hoc. I dati li inserisci a mano?

Maybe you could also create a matching function.

For example:

I have Tauros and I am looking for Khangaskhan
Someone has Khangaskhan and this person looking for Tauros
We live nearby

So the app could show all (or some) matching results.

Once you also have the location where the person is, I could set a radius up to 100km, for example.

Just for your information, some tools like Elastic Search can make these types of queries really fast.

This is a great approach. You could also consider the official cloud service offered by Elastic. It's also run on AWS, but managed by the Elastic team. I haven't heard good things about the AWS Elasticsearch service, but I haven't looked into it recently.

Very tiny images like Alpine might fit in some cases, but — and, sure, I say this with my bias showing — I think in most cases you're better off with a "real" base OS. And it's not just my word; see for example Elastic's switch to a CentOS 7 base OS. If you have a mix of containers with the same base, the advantage of the bottommost layer being minimal dissipates.

I think a pretty good model is a reasonably-minimal Fedora base, with a batteries-included common shared layer, and then applications on top of that. For that, we need things like... well, not perl anymore, but in general... packages with less-kitchen-sink minimal Requires.

We use the ELK/EFK stack for this, I maintain an Ansible playbook that sets this up for you if it helps. You can use the Windows filebeat client to send logs to it, but I don't know past that since we don't have any Windows servers.

Graylog with Active Directory Auditing (NXLOG) https://marketplace.graylog.org/addons/750b88ea-67f7-47b1-9a6c-cbbc828d9e25 Might be enough.

Or for a more advanced setup ELK stack with Winlogbeat:

http://logz.io/blog/windows-event-log-analysis/

https://www.elastic.co/blog/monitoring-windows-logons-with-winlogbeat

Yes, that would be a great use case for Watcher.

Check out the documentation here. I think you would want to take the first example and change the query to look for your condition instead of the word 'error'. It should be a simple substitution.

One more thing, X-Pack takes all the individual plugins (Watcher, Marvel, Shield, Graph) and some new features and told them into a single integrated installer that is much more unified and GUI friendly. X-Pack has all the pieces to help you build an enterprise-ready production system. It is 5.0 and above, but if you are using 2.X you can install the plugins (like Watcher) individually.

Full Disclosure: I work for Elastic. Feel free to PM me.

never used ELK before so take this with a grain of salt.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html

Use this setup for logstash to change the name of a field, there are examples listed.

Not sure on the port issue, i think that might be the port used for reporting from windows to logstash.

Edit: You might want to scrub that log before posting it.

LSF has been replaced by Filebeat for a little while now.

Filebeat can push logs directly into elasticsearch if you don't need logstash to do parsing.

Mainly playing around with Rumprun unikernels + rust. Over the weekend I managed to get a mio demo running inside a unikernel, which required a slew of tiny PRs to add netbsd support to net2, nix-rust and mio. Side effect is that I learned a lot about porting header files into Rust :)

Sending a PR of netbsd tweaks to context-rs soon so I can try mioco inside a unikernel next.

At work, I managed to sneak a Rust program into a published article on time series analysis. I used Rust for a quick'n'dirty "cluster simulator" which generates data-points for analysis. So that was kinda fun, since normally this type of demo code would be python or something similar. (Article and simulator if anyone is curious)

Lucene's just a library, if you want an actual search service built on it you probably want to look at Solr or Elasticsearch (my preference).

You would need some server side scripting language and web framework. All commonly used in web development would do, so choose one of Ruby (on Rails), Python (+ Django), PHP or whatever else you like.

For the browser part it depends how smart or complex you want the user interface. You may get away with some very basic Javascript and find plugins for most tasks (like a smart search box). You don't need to become a JS wizard but should be able to do at least some basic tasks. Also some Ajax understanding would help to make your site act better.

Possibly a RDBMS database like Postgres or MySQL. Though in this case not mandatory (at least for the things you list). That's because you will need some smart full text search engine anyway and you can use this as the storage as well. So that's Elasticsearch which would do perfectly what you want (e.g. make suggestions). (You could still store that data in the RDBMS and only copy those fields you need to search into Elasticsearch, a common approach too).

My personal choice for this would be Ruby on Rails plus Elasticsearch as the core for the heavy lifting. Beyond that you would maybe need browser plugins, some Javascript and a few more smaller tools depending on your needs for design, usability etc.

Hi.

I work for Elastic.

(By the way, it may help you to search for Elasticsearch vs Elastic Search)

There's also this: https://www.elastic.co/training/free which has some courses that may help you get started. There's also a trick, I've found when looking for how to use a specific product set, which is to search on Google for "Getting started with XYZ" which is the product you are trying to use.

We also have paid courses if you'd like.

As an assumption, I figured, if you are adding it as a search bar to your website, this may be a good start: https://www.elastic.co/guide/en/app-search/current/getting-started.html

Configure existing ES cluster so that it can snapshot (snapshot is ES term for backup) to S3 or Google Cloud Storage.

Create a new cluster in AWS, then restore from S3. You can also do incrementally to minimize downtime: i.e. one first large sync, then a last update to catch up with latest changes since the first sync

Documentation for current ES version: https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html

Elasticsearch Deprecated their High Level Java Client in 7.15 but the API from 7.16 is still in Beta Testing...

Working with arrays of data inside ES isn't particularly intuitive.

If you've got specific groupings you're looking for, the filters aggregation is what you need. Would look something like this:

  "aggs" : {
    "data" : {
      "filters" : {
        "filters" : {
            "a|b" :   {
                "bool" : {
                    "filter" : [
                        { "term" : { "data" : "a" } },
                        { "term": { "data" : "b" } }
                    ]
                }
            },
        }
      }
    }
  }

If you're looking to query against terms based on the entire array, you might be better off merging it as part of indexing or otherwise finding a way to restructure the data to suit this need.

and consider using a filter rather than a query - https://www.elastic.co/guide/en/elasticsearch/reference/7.15/query-filter-context.html

these are usually more efficient to run

It depends on your mapping. Ideally, accountId is has a “keyword” mapping. Then you can use a “term” query for an exact match.

More info here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html

I would encourage you to look into Transforms[1].
You can use transforms to create summary indices that you can use for reporting purposes and delete them once you are done with them.
[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/transforms.html

try source filtering when you are doing your request - https://www.elastic.co/guide/en/elasticsearch/reference/7.14/search-fields.html#source-filtering

that said, this sort of extraction is typically done as you index the data to Elasticsearch, not after

ELK stack is what i've been using for a while. You can ingest just about any logs and have them tailored to your liking. Not for the faint of heart tho as the technical setup is steep.

https://www.elastic.co/what-is/elk-stack

We use Elastic APM. It's not as good as, eg, newrelic, but it does have good language support, distributed tracing and is pretty easy to set up. Some of the features are paid (as is the elastic.co way) but none that you need for basic APM/exception tracking, imo.

Use the mapping API and define your mappings prior to writing data to the index. You can use dynamic mappings and index templates to apply mappings when an index is created. https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

(I work at Elastic)

Have you seen the new frozen data tier and searchable snapshots? You can make rotating data into S3/GCS/Azure Object Store a part of your index lifecycle. And when needed you can search that data too.

Frozen Tier: https://www.elastic.co/blog/introducing-elasticsearch-frozen-tier-searchbox-on-s3

Searchable Snapshots: https://www.elastic.co/blog/introducing-elasticsearch-searchable-snapshots

It shows shards as:

index-name shard-index shard-type status doc-count size node

So you're taking the first field - the name of the index - and piping it to the delete index API.

I see there's an article rather dubiously suggesting this :/

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

Read the whole thing. Every page.

Repeat every example, with variants on your own cluster.

Work very hard to not compare Elasticsearch to a relational database. Elasticsearch is not a database.

https://www.elastic.co/blog/why-license-change-AWS

Obviously this is going to be a one-sided blogpost, but it's hard not to empathise with Elastic (yes I'm empathising with a multi-billion dollar company lol.) I think suddenly removing that Apache license was always going to scare so many people, and probably not the best decision though

[I work at Elastic]

since we published our initial blog, we have added two posts with additional details: License change clarification and Why we had to the change license

Ok, so speaking as someone that's introduced centralised logging into several businesses generating GBs of logs per day, I feel like I can help you.

PHP logs information using the error_log function. Depending on how PHP is running depends how this works. In the CLI it just puts it to STDERR, and during a mod-php/fpm request it uses the error_log ini setting. Bear in mind that you should avoid using the internal PHP error logger, as it has some limitations. In FPM, there is a maximum log size which truncates logs longer than this. Use Monolog or another PSR logger instead.
What to log? Always lean towards logging more than you need, rather than less. Don't just log errors, log events that happen in your application (user logged in, signups, emails etc.) How to log? I'd recommend just building up an array, JSON encoding it and dumping in a file. This will help you a lot when you're parsing the logs later. I use Filebeat to send to an Elastic stack.
One thing I would always do is create a trace id. This is a unique ID for each request which you add to every log. This helps you track down errors and events, as logs generated by the same trace can simply be looked up. You can do this in Monolog by writing a simple processor.
I'd recommend Elastic Cloud. It's very cheap and enables you to scale. Also comes with the side-bonus if you ever need a search engine, you can just use your logging stack as a search engine as well!

Definitely have a look at the visual builder in Kibana. Ive built funnels using it. It can do multiple percentages on the same visualization (filter ratio), and if you have multiple visual builder visualizations on the same dashboard they share a cursor. Also you can save the time period as part of the dashboard. I use the latest version of Kibana so I hope what I described is available in what you're using.

https://www.elastic.co/guide/en/kibana/current/time-series-visual-builder.html

> LME is for you if:

You don’t have a SOC, SIEM or any monitoring in place at the moment

Guess what is coming in ELK... :)
https://www.elastic.co/guide/en/siem/guide/7.x/index.html

What is Reddit's opinion of elasticsearch?
From 3.5 billion Reddit comments

➔ elasticsearch website

By popularity on Reddit, this Service is:

100 reviews of this app found across Reddit:

What is Reddit's opinion of elasticsearch? From 3.5 billion Reddit comments

➔ elasticsearch website

By popularity on Reddit, this Service is:

100 reviews of this app found across Reddit:

What is Reddit's opinion of elasticsearch?
From 3.5 billion Reddit comments