I've seen benchmarks where PostgreSQL beat Mongo at being Mongo. One example: https://www.arangodb.com/2018/02/nosql-performance-benchmark-2018-mongodb-postgresql-orientdb-neo4j-arangodb/
I've also seen a shim that allows using Postgres as a drop-in replacement for MongoDB without changing the clients, can't find it right now though.
I believe that on many platforms rwlocks have significantly higher overhead than mutexes, so one has to be really benefiting from the increased parallelism, with a relatively long-running critical section, before they're a win. See, for instance, this table where rows #1 and #3 are mutex vs a rwlock used "properly": even for the read-heavy workload there, the rwlock is a little slower than the mutex on Linux but much faster on the other two platforms. However, additionally, at least on Windows, using a rwlock as a mutex (i.e. only acquiring write critical sections), in row #2, is faster than using it as a rwlock.
Additionally, focusing on the cost of an individual operation can be misleading: especially for atomics, one often needs to string together multiple to actually achieve something useful.
Did you ever look at NoSQL databases with strong consistency guarantees (in the ACID sense) like ArangoDB or FoundationDB or RethinkDB? I think that the question is no longer "NoSQL" xor "ACID" but that there is now much more choice out there.
>So basically every string access is protected by a mutex....
No
An atomic updates of the reference counter are not like a mutex, they are 100 to 8000 times faster: https://www.arangodb.com/2015/02/comparing-atomic-mutex-rwlocks/
And not every string access, it is a reference count, so it is only updated, when a new reference is created or destroyed, e.g. a string is assigned to another string (newstringvariable := oldstringvariable
). When you then access the string, it is a direct pointer access and the reference counter is not accessed at all.
Hi this is Alan from ArangoDB.
I agree that Foxx is a lot more complicated than it needs to be. This is why we've spent the past months completely rewriting the API for ArangoDB 3.0.
The upcoming major version will do away with models and repositories (just use the collections directly) and replace the controllers with routers, which are nestable and behave more like what JS developers might be used to from Node frameworks like Express.
Additionally we aim to put a stronger focus on backwards compatibility starting with 2.8. It will be possible to run 2.8 Foxx services on ArangoDB 3.0 and the 3.0 Foxx APIs will follow semver (i.e. remain backwards compatible until 4.0 rather than the previous deprecation policy).
We feel that the biggest advantage of "full-stack JavaScript" comes from allowing developers to move throughout the entire stack and making it easier to share knowledge in the team.
Additionally Foxx can allow some applications to drastically reduce the size of their existing application server by shifting most of the business logic closer to the database (avoiding unnecessary roundtrips or leaking implementation details of the database engine into the server's frontend). It also allows some novel approaches like handling GraphQL directly inside the database.
I encourage you to give Foxx another try when ArangoDB 3.0 comes out and would love to hear your feedback.
Hi this is Jan from ArangoDB.
Thanks for you kind words... just a little update on upgrades with v3.0 (release April 2016).
We´ll implement persistent indexes, automatic-failover and VelocyPack (own format for serializationa dn storage)... if you like, check out our Dev-Roadmap here: https://www.arangodb.com/roadmap/
If you want to give them something challenging, I recently had to refer to the ArangoSearch 3.5 documentation and found it to be particularly bad.
This would be too challenging to give to students as a graded assignment, IMO, but it could be fun to walk through and give a few examples of simplifying sentences for concision.
https://www.arangodb.com/2019/09/when-exceptions-collide/
Max does a pretty good job explaining it in C++, but the concept is generally applicable. Exceptions should be exceptional, and if you’re running a multithreaded program, exception processing can cause a catastrophic pause of all threads while it figures it’s shit out.
That’s why the Scylla people straight up catch exception throwing.
Wow.. this is starting to span so much interesting material and connections, it might become worth it to do some more detailed collaborative mapping.
Perhaps some of the IT apes with spare time might be interested in setting up some kind of graph database to map connections and relationships between people, companies, places, etc.
Some kind of rudimentary Apeware case management like:
https://www.arangodb.com/solutions/case-studies/fbi-grade-case-mgmt-investigative-community/
https://neo4j.com/blog/icij-neo4j-unravel-panama-papers/
* forehead wrinkles appearing *
Relational databases are in general faster then document stores because they are storing in strongly defined sets of tuples not in dynamically typed maps like mongo does. Additionally when you benchmark jsonb type from postgres it also has grater performance then mongo in most real cases.
​
With all that in mind there is really reason to use document stores instead of relational databases.
​
But if you really need maximum performance for accessing you probably should take a look at for example redis.
​
The general answer for anything that goes beyond what the low-level authentication and REST API in ArangoDB offers is "use Foxx". As Foxx lets you extend the REST API with your own logic you can use it to write your own domain-specific endpoints that can do mostly whatever you want: fine-grained permissions, transactions, batch queries, or simply any kind of data-intensive operations you may want to keep outside your existing backend server.
If you don't want data to leave the database, just lock everything down and expose your Foxx services and have all data access and manipulation go through them.
See this writeup on what Foxx can do for you: https://www.arangodb.com/why-arangodb/foxx/
We also have a community slack chat if you want to take this off reddit: https://slack.arangodb.com/
ArangoDB, Cologne, Germany seeks an experienced C/C++ developer with a solid algorithmic background and deep understanding of computational complexity theory. You will work on our multi-model NoSQL database ArangoDB. (We use C++ 11) https://www.arangodb.com/jobs/senior-developer-cpp/ (on-site)
Because it's not a JSONSchema implementation?
Actually, that's a good thing. JSONSchema is equal parts too complex and too limited. We actually moved away from JSONSchema to Joi for validation in ArangoDB for those reasons.
The benefit of JSONSchema, of course, is that it can be expressed in JSON. Sadly that makes it a bit unwieldy.
For those who are interested in the technology:
The core will be a Laravel app, to power the admin panel as well as the API.
Frontend will be built in Vue.js. This is because I have a lot of experience with it, and it makes this app easily embeddable into websites.
Database, I'm on the bridge between Elasticsearch or ArangoDB (https://www.arangodb.com/). I considered PostgreSQL and MongoDB, but this system feels like a limbo between needing a schema - and not. This I'm still not 100% settled on though.
API/backbone will be Node.js with the SailsJS framework.
Server will be a high-powered dedicated server, within the EU (so that it's easy to comply with GDPR).
super excited. You mentioned that it would work with any db api over http? Can you please confirm if it would work with ArangoDB? And if we could apply for the beta accounts? :) Thank you so much really excited about this.
> NoSQL databases lack the neatness of relations but aren't they more performant?
Performance is hard to generalize and really depends on the way the data is modelled and query patterns.
Based on this benchmark: https://www.arangodb.com/2018/02/nosql-performance-benchmark-2018-mongodb-postgresql-orientdb-neo4j-arangodb/ It looks like MongoDB isn't necessarily more performant.
The one thing that is easier with MongoDB is setting up replication. However, with PostgreSQL there are many hosted solutions (AWS RDS, Google Cloud SQL, Heroku, and many more) that provide you with a replica and automated backups.
MySQL is a great choice for either relational or document storage, as it does both. I would suggest trying both within MySQL and seeing which you like better.
Or, if you really want to mix it up, try using a graph database. ArangoDB is a great up and coming graph database platform that lets you build a microservices layer in Javascript that runs inside the database engine itself - among numerous other great features.
You could check out the ArangoDB Graph Course... takes you from 0 knowledge about graph concepts to first advanced queries, including queries, query options and stuff https://www.arangodb.com/arangodb-graph-course/
Warning: I've really only started using this a few days ago, all my knowledge is based on reading documentation for a few days and developing a super tiny microservice with it for an application that's so far pre-alpha that I don't even dream of the release day.
So recently I started looking in to storage backends that would scale better then Postgres and MySQL. After a while I stumbled upon ArangoDB and so far I'm super impressed. They have a very interesting design for their clustering. Despite being distributed their design is supports ACID transactions. They have a custom query language that allows you to do tons of things that you wouldn't be able to do on regular NoSQL databases, they even have JOINS. They also have native supports for creating, maintaining and querying graphs and you can expose a GraphQL API natively. And they have a feature called Foxx that let's you create microservices that run on your coordinator nodes and to calculations and other operations in the background.
I fully realise this may not be helpful to you, but it might be worth it to look in to this!
Except it's not always that straightforward. In practice, MongoDB out-of-the-box can suffer far more scalability issues with high volume queries. In the dev ops industry you'll see that many teams that start with MongoDB love it for its simplicity when bootstrapping a product and have initial success with larger datasets (like millions of user records) -- but when it comes to higher load numbers Mongo can fall apart unless the cluster is specifically tuned for large query numbers (the way it did for Epic until they tuned it properly). PostgreSQL clusters on the other hand rarely hits the same bottlenecks with their default configurations.
Even with a simple JSON schema MongoDB loses many of these benchmarks (see tests like https://www.arangodb.com/2018/02/nosql-performance-benchmark-2018-mongodb-postgresql-orientdb-neo4j-arangodb/ ). Don't get me wrong, as a software engineer I love starting my projects with MongoDB for its user friendliness and quickstarts, but eventually as they reach certain thresholds more serious databases need to be employed.
Depending on the scale (amount of site traffic) and what capabilities you want in the store you may need multiple databases. I'd suggest:
ArangoDB : persistant storage (documentDB) + it has an inbuilt graph engine so you can run analytics. However take note that you must pay particular attention to how you structure your data, financial transactions for example must be contained within a single document :
> Using a single instance of ArangoDB, multi-document / multi-collection queries are guaranteed to be fully ACID. In cluster mode, single-document operations are also fully ACID. Multi-document / multi-collection queries in a cluster are not ACID, which is equally the case with competing database systems. Note that for non-sharded collections in a cluster, the transactional properties of a single server apply (fully ACID).
https://www.arangodb.com/features-may-2017/
For everything else you'll need i.e. session related things (tokens, shopping carts, etc), i'd go with redis.
If you can live without analytics a typical RDB such as MySQL/MariaDB or even a standalone docStore such as mongo can be used instead of Arango and you can add in a graph database (neo4j) at some later date.
I'd also recommend using a RAD framework (note different from CMS). But this will depend entirely what language you choose to develop in.
Jan from ArangoDB here
As others already commented, ArangoDB also supports graphs but as you for sure already saw is a native multi-model db. The graph is formed by JSON documents pointing to each other by containing _from and _to attributes. At the very heart, ArangoDB is a transactional document store, so you can store arbitrary data on your edges and vertices.
The simplest example of a good usage is an ecommerce app with the cart using the key/value model, product data is stored in documents and the recommendation engine leverages the graph model. Check out this talk of my team mate Michael: https://www.youtube.com/watch?v=on1l2pEEWnw
More advanced usages can be machine learning, network monitoring or fraud detection which are more graph heavy but also need to store data as simple documents.
Cool use case could also be IoT. E.g. Think of a smart electricity network with sensors. You have the network forming a graph (Production, Transport, Usage, Consumption-Counters), you have sensors across the network sending some data (key/value) maybe every 5 min and every part of this network also has some static data about itself (Name, Geo Location, address, etc = JSON Doc). For your storage needs you can all do this with ArangoDB as the database and also use some fancy stuff like the Pregel integration or geo-index to do some cool analytics of your network. You can find some Pregel tutorial here: https://www.arangodb.com/pregel-community-detection/ As you can do all of this nicely and efficiently in ArangoDB, it could be an interesting use case for your project.
And even the performance advantages of NoSQL databases vs. NoSQL databases are very dubious. See this article where PostgreSQL beats MongoDB in almost every scenario:
https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/
Yes (at least in userspace!):
unless you only have readers (but then why bother locking at all).
The problem is that "to allow them to read concurrently" is a complex algorithm in itself.
There was also an interesting paper last year where it was shown that in some cases, having a single mutex for an entire program would sometimes be faster than having many mutexes being locked concurrently, one for each "different" data one would like protect.
(Ingo from ArangoDB)
Just saw your comment. The test cases are not that sophisticated and easy to understand. We might be the first DB vendor that provides the whole benchmark settings on Github, providing raw data and dumps of the used databases so that everyone can validate the results and make their own tests as well.
In the meantime we've published several updates, new performance tests and added OrientDB in comparison (as suggested in a comment above). Furthermore, the vendors OrientDB and Neo4j improved their databases after the initial performance blog post. A benefit for all customers.
> db vendor rates their db 100% over everyone else
Unfortunately MongoDB is better in 4 out of 7 tests. But we made our point - we might not be better, but we are not far away and can compete. So you might take this database into account if you need more flexibility when modeling your data.
@see: New results and updates to the blog post: https://www.arangodb.com/performance/
I'll reply to it even though it's been over a month.
PostgreSQL is pretty much like MySQL, but it supports different features - one of them being recursive queries, which allows for querying a comment with all its respective children and their respective children - and so on. MySQL doesn't support such a query.
There is a nice tutorial over here on using recursive queries with PostgreSQL.
Like someone else said below, the usual NoSQL model is probably not a good choice in this scenario, as it has no relationships or JOIN clauses (you would have to build the graph structure in your app and would probably need to dispatch many, many queries). Maybe a GRAPH Database (such as ArangoDB) could suit your needs. I can't really say what's the best for your longterm, but it's something to consider.
You might want to have a look at https://www.arangodb.com/. It's multi purpose database but it supports graphs quite nicely. Beside the standard HTTP API there is a Go driver but I never used it so I can't comment on the quality.
I made a list of projects that I think should be on there and what they are using at the moment
piwik – madmimi
sourcefabric – mailchimp
mongo db – eloqua.com
nangios - constant contact
humhub – mailchimp
docker.com mailchimp
https://www.arangodb.com/newsletter mailchimp
Strikebase mailchimp
Puppetlabs? marketo.com
http://www.openia.it/ - mailchimp?
ArangoDB has geo indexes. It uses a Hilbert curve and Polyhedrons to achieve this. The presentation of it shows the details of the implementation and the reasons for the design decisions.
Having said that, I can't say how well it works and whether it satisfies the requirements of the blog post author.