Most store XML as text however, and that's not terribly useful if you need to do any queries within it. However, Postgres supports storing JSON using a decomposed binary format so that it can be searched and indexed like regular columns. So, if you're looking for a relational DB with JSON support it's hands down the best choice right now.
Some languages (locales) treat V and W as the same. This "feature/bug" is actually quite common.
Because of the hype, basically. The times when Node.js and MongoDB became buzzwords overlapped enough to make the combination an "obvious" choice. Especially with MongoDB using JSON for everything (even as a query language).
Basically what made MongoDB so successful is that it was one of the first widely-recognized NoSQL solutions in a time when everybody thought they had "Big Data" and needed a database that was built for scalability first (and integrity last).
Additionally "MEAN" provided a nice buzzword and now the idea that Node.js and MongoDB go together perfectly is stuck in everyone's head and the media (i.e. bloggers and tech websites) perpetuates it because they're too lazy to do the research to come up with someone more practical.
I would strongly recommend against MongoDB, especially as a "default" when you don't know your actual requirements. There are plenty of viable databases (non-relational and relational alike) and by all likelihood MongoDB is not the best fit for your project's requirements.
Personally, I prefer ArangoDB these days (disclosure: I've become a maintainer because I was already using it and interested in working on it). But I'd say for most projects PostgreSQL is a good default. Of course nothing beats actually doing your homework and seeing what database fits your project best before you pick it.
Configuring postgresql's replication it's not that difficult after 9.0 Docs
This is master/slave though. So if you need sharding or a master/master setup you would have to look into other solutions.
I believe this is the intended ideal for the team. They want to focus on the database, not the implementation. After a long time of not supporting replication they gave in with this statement:
The team recognized the need for basic replication services but also realized that by packing in too much they may hinder more specialized solutions from being developed. I can agree with this statement since most people will take the path of least resistance, ultimately ending with a bunch of poorly scaled apps or with the team having to focus on implementation scenarios and not the core DB.
MVCC is not that difficult to reason about, the PostgreSQL documentation page linked in the article does a pretty good job of explaining the possible levels and concurrency gotchas: http://www.postgresql.org/docs/9.4/static/transaction-iso.html
But the most important distinction between relational databases with MVCC and various "eventually consistent" databases is that you only have to understand MVCC, not implement it. You read the docs, choose an isolation level that fits your concurrency and performance goals, wrap database calls in transactions, and you're done.
If you wanted to get similar concurrency guarantees from many other eventually consistent databases, you would still come across the same gotchas, but now you also need implement the solutions in your application code.
PostgreSQL has had pretty decent JSON support since 9.1, and it's been getting better since then: http://www.postgresql.org/docs/9.4/static/functions-json.html
Adding a JSON field into a relational table can be pretty good approach to a number of tricky scenarios, especially avoiding pivots for settings and the like. Array support in PostgreSQL is also very nice.
PostgreSQL used to have an option to work that way before postgres version 6.2.
>> Postgres supports the notion of time travel. This feature allows a user to run historical queries.
>>
>> ....
>>
>> For example, to find the current population of Mariposa city, one would query:
>>
>>SELECT * FROM cities WHERE name = 'Mariposa';
>>
>> ...
>>
Postgres will automatically find the version of Mariposa's record valid at the current time. One can also give a time range. For example to see the past and present populations of Mariposa, one would query:
>>
>>
SELECT name, population
FROM cities['epoch', 'now']
WHERE name = 'Mariposa';
>>
>>
where "epoch" indicates the beginning of the system clock.
>>
>> ...
>>
>> As of Postgres v6.2, time travel is no longer supported. There are several reasons for this: performance impact, storage size, and ..... Time travel is deprecated: The remaining text in this section is retained only until it can be rewritten in the context of new techniques to accomplish the same purpose.
Yes you can count the number of recipes with chicken as one of the ingredients:
# create table recipes (name text not null, ingredients text[] not null);
# insert into recipes(name, ingredients) values ('yum', '{chicken,butter}'), ('sandwich', '{chicken,bread}'), ('breakfast', '{cereal,milk}');
INSERT 0 3
# select name from recipes where 'chicken' = any(ingredients);
name
----------
yum
sandwich
(2 rows)
# select count(*) from recipes where 'chicken' = any(ingredients); count ------- 2 (1 row)
Using an array as part of a join clause:
# select a.name, b.name from recipes a join recipes b on (b.ingredients[1] = a.ingredients[1]);
name | name
-----------+-----------
yum | sandwich
yum | yum
sandwich | sandwich
sandwich | yum
breakfast | breakfast
(5 rows)
I don't know if you can make an array an index, but you can use a partial index on an array expression:
# create index on recipes(ingredients) where 'chicken' = any(ingredients); CREATE INDEX
That said, if you're doing a lot of array fiddling, then that is generally a sign that you should probably be using a separate table.
Why would you use MongoDB? PostgreSQL does the same thing, but faster using jsonb
and GIN
indexes: http://www.postgresql.org/docs/9.5/static/datatype-json.html - Why limit yourself to a document store that is only slightly better than writing files to a tmpfs
partition?
> If you know your data access pattern is going to be complex or you plan to use a lot of database-specific features (like PostgreSQL window functions), you may not want to use an ORM.
i hadn't ever heard of those before. damn, i could have solved a lot of problems with those. need to start paying more attention to databases.
also, thanks for telling people not to bitch about ORM by bitching about ORM. ;)
Might be because the official pronunciations for MySQL and PostgreSQL use S-Q-L instead of SEQUEL, and more linux devs are going to learn to use databases using one of those open source servers.
> The official way to pronounce “MySQL” is “My Ess Que Ell” (not “my sequel”), but we do not mind if you pronounce it as “my sequel” or in some other localized way.
https://dev.mysql.com/doc/refman/5.7/en/what-is-mysql.html
> Q: How do you pronounce PostgreSQL
A: post-GRES-que-ell, per this audio file. Many people, however, just say "post-GREZ".
http://www.postgresql.org/about/press/faq/
If you're curious, here is the mp3 for the PostgreSQL pronunciation. http://www.postgresql.org/files/postgresql.mp3
The varchar instead of text recommendation would appear to be wrong:
> Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
http://www.postgresql.org/docs/9.1/static/datatype-character.html
Hard to say, but generally the more components a system has the more likely is that it's going in the "stupid" direction. Keep it as simple as possible.
Postgres is a very powerful database, maybe you can just implement your unstructured data in there. Check the hstore datatype, it's a key - value store but you will be able to access the data in it together with your structured data in a single SQL statement. Has index support, special query operators etc... pretty neat.
Hmm. After thinking about it. I don't agree. Don't store phone numbers as integers, even if it is only a supporting column for indexing. If you need to index on it for reverse lookup of customers you are far better off stripping the special characters and store it as a string and index the result. This way the like
operator and other fuzzy logic in the DB still work and you can implement a much more flexible solution for revers lookup. Your data isn't so big that you need the performance boost of using integers for indecies.
If you aren't planning on doing mathematical operations on a field then don't make it numeric. ID's aren't special cases either; we increment those, thus they are arithmetic in nature.
Great things:
*) convert any query output, including nested structures into JSON
*) Incredibly rich type system
*) Incredibly rich planner
*) With SQL language extensions postgres can behave as a functional language
*) 'data modifying with' (chain output of updates/inserts to other queries etc
WITH moved_rows AS ( DELETE FROM products WHERE "date" >= '2010-10-01' AND "date" < '2010-11-01' RETURNING * ) INSERT INTO products_log SELECT * FROM moved_rows;
*) so many other things. for example string processing vi regexp_matches is so powerful it's completely replaced perl for me
edit: formatting
PostgreSQL fulltext search is very capable, and since you are on Heroku, you are probably running postgres already.
Have a look at the excellent postgres doku: http://www.postgresql.org/docs/current/static/textsearch-intro.html
The PostgreSQL "text" and "varchar" column types require their data to be valid in their configured character set; encrypted data is by design indistinguishable from a random sequence of bytes, and so is unlikely to be valid in any given character set.
Use a column with the bytea or blob types instead.
It's in progress, there's a long pgsql-hackers discussion going on where they're finalizing the details: http://www.postgresql.org/message-id/CAM3SWZQ7PdxFhg-B6EN80N3X0=raeyOFXxeweU2pF09hOQUb1Q@mail.gmail.com
He talks about how cheap PostgreSQL is. It's true that it's fine to use PostgreSQL and not give something back, but if you're a big Postgres user you should consider donating money. http://www.postgresql.org/about/donate/ Donate as much as you think it's worth if you can afford it. The project has to be financed somehow. :)
> well, then give the people the means to send the computation to the database in an extensible fashion, duh.
That's actually exactly what PostgreSQL does. And, guess what? Someone used that feature for geographic data. So if you're interested in raging at someone, maybe direct it at the people who are actually (not) doing what you're complaining about.
Also, maybe divert a little bit of your energy for swearing and claiming the world is stupid into forming your thoughts into text that more successfully conveys your original meaning.
> SQL Server. It is free on the low end and has some pretty cool features that kick in at 10 million or more rows.
Its restrictions are terrible.
> Maximum memory utilized (per instance of SQL Server Database Engine) > 1 GB
> Maximum relational Database size > 10 GB
This is a joke comparing to PostgreSQL
Start with the name :)
While the manual isn't a tutorial, it's one of the most valuable resources to learn and understand Postgres
> The ivory tower types figured out a lot of the fundamental algorithms, then later Postgres implemented them.
Especially in PostgreSQL's case, with its pluggable types and indices support. For example, pgSphere uses GiST indices to make queries using spherical operators perform well.
Check out PostGres SQL and PostGIS: http://www.postgresql.org/download/ You can install PostGIS after installing Postgres. You should also get pgAdmin so you can check out and play with the database easily. http://www.bostongis.com/?content_name=postgis_tut01
Sounds like GIN fast update.
Turning off autovacuum for the table makes it likely that the maintenance on the GIN index - writing the temporary updates to the index as a bulk update - will be triggered by a query that pushes the space used for it over the limit, rather than during background maintenance.
> I need the flexibility provided by mongoose documents.
Look into Postgres — it has JSON support with efficient storage, lookups and full-blown indexing. I believe it even outperforms mongo v2, both in storage and performance, not sure how mongo v3 compares though.
LOL at FUD "To do the same in PostgreSQL, Microsoft claims that you need to use a combination of then json_each, json_object_keys, json_populate_record, and json_populate_recordset functions"
Looks like they didn't read the http://www.postgresql.org/docs/9.4/static/functions-json.html especially json_to_recordset
Run postgresql over ssl and use ssl certificate authentication.
http://www.postgresql.org/docs/9.1/static/ssl-tcp.html
(though i would probably hit haproxy on the localhost and use haproxy to run the SSL certificate auth to the DB servers. )
I have heard such a data structure being called a sum tree. PostgreSQL uses a similar concept of max tree to find free space for new riws.
Do you have an idea how your data structure should cope with MVCC? More specifically the MVCC implementation of PostgreSQL means that concurrent versions of rows have separate index tuples and you have to check the row header to figure out which one is visible to your snapshot.
I guess for your use case of randomly picking a row a retry loop that discards invisible rows would work. Then you have to deal with the next issue - how do you deal with new entries being inserted while you are traversing the tree. You could get a SHARE lock on the table, blocking out concurrent modifications, but that will make concurrency suck.
The good news is that you can probably use the GIST index extensibility, or even better SP-GIST, to have a properly WAL logged implementation of your access method.
Do you use PostgreSQL? It's got a CIDR data type built-in. Also, have a look here, specifically for is contained within.
You can thank me in scotch and kittens.
Pretty sure the storage requirements for Postgres are the same for all three character types: http://www.postgresql.org/docs/9.1/static/datatype-character.html.
Please use more constructive words to form your argument next time, the words you chose are hurtful and don't advance the discussion.
That's a really good question.
It can make sense in a situation where you'll be JOINing on any of a number of possible attributes. But even then, you can still use WHERE when needed, and SELECT specific values as data.
But the main key factor to consider is, will a non-developer need to update data specific to the charity? If so, then I think it makes a lot of sense to put it in the database. If not, then you probably don't need to.
This is a case where a Ruby object makes sense - no ActiveRecord required. Just make an object with the methods you need, and use it wherever you need it.
class Charity def name "Awesome Charity" end
def address "123 Donation Lane" end end
That object is cheap to instantiate so you don't have to do anything tricky. But there are other things you could do if you wanted, like use class methods instead of instance methods.
You could even just read a value from a config file, and use a global variable or hash.
Personally, I like to use Ruby objects in this case. It provides a natural place to put behavior as you continue to develop your app.
So without knowing more (like whether you anticipate an admin regularly changing data about the charity), I would start with a Ruby object like the one I showed, and then move to a database table if we reach a point where a non-developer needs to be able to update the data.
And if you do use a database, now or in the future, you might consider using a database constraint to ensure there's only ever one record in the charities table.
Just a quick nitpick:
> Danger 1: Not knowing database data types. This usually involves using too-general data types like ‘text’ type in PostgreSQL for most of the columns. This type is introduced to store strings of unlimited length what is identified with CLOB type. Multiple columns with this type cause an enormous performance challenge.
From the PostgreSQL docs:
> There is no performance difference among [text, char(n) and varchar(n)], apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.
The world isn't just about Text & Binary. Its about so many things ... XML / (performant) JSON / Ranges / GeoSpatial data-types / Custom-data-types (where you can re-define add / multiply etc. operators) ... Heck you can even create your own operators! .. tons more just around data-types.
Something Tom Lane said might be appropriate here.
> CHECK is meant to handle constraints on a row's value in isolation. If you try to use it to enforce cross-row conditions, the project will certainly end badly.
My advice: store and recall your timestamps in UTC. Run PostgreSQL in UTC. Make time zones a problem that the edges of your application have to deal with, not the internals.
That said, if you want your database to screw around with time zones, PostgreSQL has your back.
$ psql template1
template1=# select now();
now
-------------------------------
2014-07-24 08:53:26.174943-05
(1 row)
template1=# set timezone='Europe/Berlin';
SET
template1=# select now();
now
-------------------------------
2014-07-24 15:53:41.343164+02
(1 row)
So:
SELECT * FROM pg_timezone_names; ALTER ROLE german_user SET timezone='Europe/Berlin'; ALTER ROLE uk_user SET timezone='Europe/London';
For more, see the PostgreSQL timezone docs.
From PostgreSQL Docs. 8.14. JSON Types
>There are two JSON data types: json and jsonb. They accept almost identical sets of values as input. The major practical difference is one of efficiency. The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.
Here's what postgresql's documentation has to say:
> decimal variable user-specified precision, exact up to 131072 digits before the decimal point; up to 16383 digits after the decimal point
http://www.postgresql.org/docs/current/static/datatype-numeric.html
The real danger with using floating point numbers when dealing with money is that they can't accurately convey every number that might be used, so there's bound to be rounding errors. People who trust you with their money aren't cool with rounding errors. Never use floating point numbers to do calculations with money.
> How to write unit tests against Postgres? >
Seriously just use ROLLBACK and SAVEPOINT(s). It will even roll back DDL statements. Am I missing some compelling reason to not wrap unit/functional tests in a transaction block?
http://www.postgresql.org/docs/9.4/static/sql-savepoint.html
http://www.postgresql.org/docs/9.4/static/sql-rollback-to.html
Actually postgres can compete with Mongo in the non relational space for a lot of use cases. It's a fair question.
I've implemented features similar to this one using a JSONField in Postgresql.
This way you can have your users define something like a key:value schema and store that in a normal database table.
One can maintain multiple versions concurrently (backporting fixes etc.), with a sane end-of-life plan for the old one. example: all the "latest releases" of PostgreSQL (top of the righthand sidebar).
http://www.postgresql.org/docs/8.4/static/queries-with.html
The section starting with...
>When working with recursive queries it is important to be sure that the recursive part of the query will eventually return no tuples, or else the query will loop indefinitely. Sometimes, using UNION instead of UNION ALL can accomplish this by discarding rows that duplicate previous output rows. However, often a cycle does not involve output rows that are completely duplicate: it may be necessary to check just one or a few fields to see if the same point has been reached before. The standard method for handling such situations is to compute an array of the already-visited values. For example, consider the following query that searches a table graph using a link field:
Almost exactly what I wanted to say: there is no technical limitation (other than bad automation tools), it's a use case problem. But I would like to add a couple points.
0) The SQL "standard" has actually had a solution to the hierarchical problem, and I don't mean nested sets. WITH RECURSIVE lets you query tree or graph like structures with a single statement. So I don't really think that the mismatch is a problem of storing/retrieving hierarchical data in a RDBMS. This sort of leads into the next point.
1) The mismatch, I believe, is that the relational model allows you to tease out new projections of your data (especially as you start to hit 5NF and 6NF structures), such that your models no longer map to a single table or a predefined set of joins. So I think the problem is that you have a combinatorial explosion of models if you are trying to map a model to every possible projection.
If you stick with only the software engineering purity you end up creating a model for every useful projection of your data (lots of work). Or if you stick to the purity of the relational model you have generic models that are basically key->value pairs that aren't very satisfying architecturally.
So I think the decision boils down to what you need. Do you need to serialize objects, or do you need to derive new relations between relvars on a regular basis.
--Cheers
See comments upthread for standalone mode. For LATERAL, it's a way of feeding table data to other JOIN'ed table expressions, especially functions.
I had this idea years ago, I even thought a tldr wiki would be good to "humanize" terms and let the public contribute. But it probably wouldn't be as big as wikipedia and so most likely not as peer-reviewed/accurate. Your method is better.
I like your style. I hope to see feature matricies for each service "group". as in, status.net/identi.ca type services like twitter, so you can pick one that way based on what you like. Facebook v Google+ v Diaspora, etc.
Your website will consist of:
So...
Pick a programming language; get a book for absolute beginners new to programming (not a textbook, but a beginners step-by-step intoductory book). Learn to program. Pick Python, PHP, or even C... just get down the basics of how to program by learning all the concepts.
Learn how to make basic html with tables. You can do all you need with this; learn CSS when you want to make something better.
Pick a programming language to make the site in: PHP or Python for example. For Python, learn a full framework like Django or put the parts together yourself.
The choice of programming language and web framework will affect how you interact with a database. Some will include a special system for interacting with the database that works differently (and will hide many SQL queries from you). So, if you picked Django, or you rolled your own with SQLAlchemy, you will need to study their tutorials/docs. Alternatively, you can use raw SQL queries. In any case, you will probably want to learn the basics of SQL and databases. The free PostgreSQL manual is one way to learn about databases and SQL: http://www.postgresql.org/docs/
PostgreSQL does have replication build in with recent releases.
http://www.postgresql.org/docs/9.5/static/high-availability.html
I haven't used it in production but I did set it up in development to see how it works. MySQL does have a more mature replication system but it is possible in PostgreSQL without third party tools.
Not really, for a few reasons. The first is that any query that is complex enough to significantly benefit from plan-caching is a) more likely to benefit from a better plan instead and b) very likely shouldn't be executed hundreds of times, instead being rewritten to all the work in one query (i.e. n+1 queries).
But more directly, if a query is executed multiple times and Postgres doesn't see major advantages in its custom plans over the generic plan, then it will revert to the the old behaviour and just start reusing the cached generic plan. See the Notes on PREPARE for more details.
So it's aiming to be the best of both worlds.
You can tweak autovacuum storage parameters on a per table basis.
On your hot tables you'll want to set the fillfactor to a lower value, and then drop the autovacuum_cost_delay for the table in question to a lower value.
Depending on the structure of your database, you may want to up autovacuum_max_workers and take a look at the autovacuum cost parameters ( global parameters set in postgresql.conf ).
Your ultimate aim should be to have autovacuum run frequent small jobs rather than occasional large ones.
If you'd like to talk about consulting PM me.
Well I'm going to randomly answer some of your questions... =)
And now some general advice/questions for thought on performance.
Hope this helps.
Of course. And if you want to increase the chances of many transactions being committed in a single write operation, you can set the "commit_delay" parameter.
http://www.postgresql.org/docs/9.1/static/runtime-config-wal.html
>> commit_delay (integer) >> >> When the commit data for a transaction is flushed to disk, any additional commits ready at that time are also flushed out. commit_delay adds a time delay, set in microseconds, before a transaction attempts to flush the WAL buffer out to disk. A nonzero delay can allow more transactions to be committed with only one flush operation, if system load is high enough that additional transactions become ready to commit within the given interval.
in PHP it's about as easy as dealing with MySQL.. The good part here is you don't have to worry about doing shit for v6 or v4 differently, or worry about storing subnets/hosts in integers.. Just throw subnets in CIDR in the DB, and you get:
http://www.postgresql.org/docs/9.3/static/functions-net.html
> Streaming-replication will always replicate whole database clusters
Logical Log Streaming Replication isn't quite a thing yet but the critical pieces of infrastructure will be in the next release. So… watch this space. 9.4 will make it reasonable to use logs for purposes other than replicating an entire cluster, which I find exciting.
>> WAN Replication to our DR Site
You don't need any fancy not-quite-released stuff for this; you want the streaming replication feature mentioned above. This is supported out-of-the-box. Check the docs for more.
What database are you using?
Enable external access to your database (postgresql example) http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html
Then just connect to your server instead of localhost with your programs... (sqlalchemy examples) http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
It depends on which database.engines you're using, but that's the basic idea on how to do it.
Whether it's of practical benefit or not depends on the type of function. In postgresql you can (optionally) define functions as "immutable", "stable", or (default) "volatile". This lets the query optimizer know how they can most efficiently be treated. Basically "immutable" functions are good targets for indexing.
If you want to learn SQL, I'd suggest you to use PostgreSQL. It's free, open source and very close to match ANSI SQL. I can't give you any resource though, beside the doc
The solution is to implement time zones in the correct way: Write one's code so that it queries the olson database. PostgreSQL even has it built in [1].
I've worked at multiple companies that deal with timezone sensitive logic, including a company that does scheduling services. Letting the IANA handle timezones is the only way to right bug-free date/time code that will remain reliable into the future.
http://www.postgresql.org/docs/current/static/datatype-datetime.html
Isolation levels are what you need to read up on. Default is READ COMMITTED so that would mean in your scenario you would only be seeing stuff that was committed prior to your query execution.
I would totally favor a single core table in these circumstances. That conceptual tidiness you mention really pays off in the long run and will be easier for others to understand (normalized data is expected; tables-per-year is definitely not). Subsetting by year, grouping by year, aggregation in general - sqlite
and dplyr
were designed to make that easy to code and quick to run. Further performance tweaks (like indexing) will probably depend on seeing all of the records at once.
Conversely, having split tables would be a pain if you ever needed to query, say, a single patient's records across all years.
The day may come when a single machine running sqlite can't handle all your data - but then you'd probably be better off looking into databases that support this kind of partitioning.
The cost of an explained query is not a unit of milliseconds. It is an arbitrary unit, but more closely resembles disk page fetches. Can you please correct this?
Source: http://www.postgresql.org/docs/9.5/static/sql-explain.html
SELECT FOR UPDATE will block/fail if another SELECT FOR UPDATE/UPDATE/DELETE is pending rollback/commit for the same row[s].
Since other instances of the client routine use the same lock mode and therefore don't step over each other's feet, how is this unsatisfactory?
EDIT: /u/doublehyphen made me realize that my explanation was unclear/incomplete. The NOWAIT option for SELECT/LOCK allows for parallel workflow in the face of conflicting lock attempts (see http://www.postgresql.org/docs/9.4/static/sql-select.html ).
Well, you can store JSON data directly into the database and have postgres search through it with various operators. If you've used something like mongodb, storing json directly into the database is really nice, especially if you have all the flexibility of being able to search through it with all the usual database queries.
What do you mean?
Tables containing 2 dimensional arrays in fields or tables representing 2 dimensional arrays? The first one is no problem with modern RDBMS. The second is using the wrong tool for the job.
http://www.postgresql.org/support/professional_support/europe/
So you would also be funding employment in the UK. I know MS has branches in the UK, but most ~~MSSQL~~ Oracle development is not done here.
EDIT: Actually the MySQL option is probably a bad one, because they would only go for the Oracle support anyway...
you can easily do that using pg_dump and pg_restore . A couple of script lines away. We have prod-staging sync on some accounts and db backups set in place using this
Postgres supports clustering using pg-pool, here's an overview and instructions on creating a cluster.
DBD::Pg groks array types. https://metacpan.org/pod/DBD::Pg#Array-support
Use the standard ? placeholder and pass in your arrayref when you call execute(). DBD::Pg will flatten it to the PostgreSQL array syntax (just make sure the arrayref you pass in matches the dimensions of the column as defined in PostgreSQL). When retrieving data, DBD::Pg will automatically explode array type columns in the resultset to Perl arrayrefs, though as the docs linked above point out you can turn that off if you have a specific need for doing so.
If you're asking how to use placeholders for individual elements in a PG array type in your query strings - I'd be inclined to say there is probably a better, more PostgreSQL-y way by sticking with the built-in PG array operators and functions (http://www.postgresql.org/docs/9.4/static/functions-array.html) rather than getting yourself into a situation where you're mangling SQL strings with some unholy mess of sprintfs, joins, concats, and ->quote() calls to build the raw array strings yourself. Let DBD::Pg handle translation between Perl arrayrefs and PostgreSQL arrays, since it will do it consistently and correctly.
A few others worth mentioning:
Sorry buddy but you're wrong. Postgresql has a json/jsonb column type. Meaning it can store whatever you want in there. And then you can use expression indexes to index whatever field inside the json. You can even use a gin index that will index EVERY field in the json. More info:
http://www.postgresql.org/docs/9.4/static/datatype-json.html
tldr: i was talking about a different thing
you have to give us a bit more info, whats your table definition?
Basicly you want to, rename that table, create the new table with partitions and then insert from the old (renamed) table into the new table.
http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html
There have been really great results from JITting SQL for example: http://db.disi.unitn.eu/pages/VLDBProgram/pdf/tutorial/tut5.pdf http://www.postgresql.org/message-id/[email protected] etc
I'll take a crack at answering some of these; but most of the answers are going to fall into either the "it depends" or "it's in the documentation" buckets.
2GB memory should fine for a small to medium database server if that's all it's doing, if you're running the web and mailserver and, and , and then maybe. It depends on your workload.
the documentation for pg_dump has the answers you seek. But yes, you can grab individual databases, even just the schema for one database.
( replicating one database not the whole cluster ) Yes, you can. But replicating the whole cluster is usually safer, saner, quicker and cheaper. Postgres builtin streaming replication operates at the level of an entire database cluster. But there are several trigger-based replication tools slony and bucardo that allow you to get quite creative with what you're copying and under what conditions. That class of tools has some known limitations and in my experience they tend to break any time someone looks at the schema and thinks about adding a table.
autovacuum is turned on by default in recent postgres versions.
first things first, go through the postgresql.conf and set all variables that are dependent on the size of memory. After which, pg_hba.conf set up for hostssl from outside ips and local ( unix socket ) connections for local connections. Most of the default configuration in PostgreSQL is pretty reasonable for a developers workstation in 2005, if you don't expand the work_mem to a large enough value you will be introducing a bottleneck once the machine gets busy.
It depends on what's needed and what the workload is. If you have hot tables with lots of reads and writes you'll need to reindex more often; if dead rows aren't being reaped by autovacuum quick enough; you might want to do a manual vacuum and then set the storage options for that table.
You don't have to do this with PostgreSQL - you can have it recognise you based on OS user. Or Kerberos, GSSAPI, RADIUS, LDAP, PAM, SSPI, certificate... http://www.postgresql.org/docs/9.3/static/auth-methods.html
You can't modify the structure of an SQL statement in that way, only the values of parameters. If you want to change the structure of the query in the way that you are doing, you need to build the whole query as a string and use execute
to parse and run it (see http://www.postgresql.org/docs/9.3/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN)
The following change should pretty much work:
RETURN QUERY EXECUTE 'SELECT hrtf.prov_id, hrtf.name FROM hrtf WHERE '||x1||' AND '||x2||' AND '||x3||' limit 11';
The official docs are here. Postgres is pretty standard in terms of what sql syntax it recognizes, so a lot of the information in that documentation will apply if you later move on to some other database.
While technically correct, that link is off. You're speaking of GIN indexing which was recently committed on the 9.4 branch. It's not even out yet. That said, you can find the docs on the devel page.
/r/markmywords though, Everything will die to Postgresql. NoSQL is dead. I'd still suggest you build a custom 9.4. By the time you deply it will be in production and you'll be somewhat sane and future safe.
Currently, the fastest off-the-shelf approach is to factor your words into a n-gram (usually trigrams) vector space and then query that space, ranking results with some form of distance measure, such as cosine similarity. Postgres' 9.1 pg_trgm module [http://www.postgresql.org/docs/9.1/static/pgtrgm.html] provides an easy way of building an index using a GIN or GIST. Don't remember the algorithmic BigOs of the to of my head, but should be in your desired range.
sqlite3 is built into Python ("batteries included") and is therefore easy to get started with. It's very lightweight and easy to use for even small programs. Here's a simple introduction.
http://docs.python.org/library/sqlite3.html
For larger projects, I would recommend PostgreSQL and the psycopg2 database adapter.
http://www.postgresql.org/ http://initd.org/psycopg/
If you're using a web framework, you should try an ORM that works with the framework. After trying many, my company has settled on Django and its terrific built-in ORM -- by far the most expressive we've found; it's a joy to use. We've tried SQLAchemy several times and each time ended up scratching our heads when trying to deal with complex table joins or very large record sets. There's much flame-war bait in this paragraph so understand that this is only my opinion. We've used Zope, Plone, Pylons web2py, and Django commercially with PostgreSQL for various projects over the past eight years.
I started by learning SQL but it's probably more common now for people to start by learning with an ORM.
> And what features does version 10 include?
10 is a bit outdated. You should use it for new projects. If you start a new Postgres project use 13 or 14
You can find a list of all features here
That is a huge stretch to use that to blame others with for the inevitable problems this is going to cause people.
psql --version
, postgres --version
.I found the benchmarks:
name old ns/iter new ns/iter diff ns/iter diff %
pg::connection::tests::benchmarks::prepared_statement_lookup_query_builder 1,048 191 -857 -81.77%
pg::connection::tests::benchmarks::prepared_statement_lookup_raw_sql 319 318 -1 -0.31%
sqlite::connection::tests::benchmarks::prepared_statement_lookup_query_builder 743 123 -620 -83.45%
sqlite::connection::tests::benchmarks::prepared_statement_lookup_raw_sql 183 180 -3 -1.64%
And here is the benchmark script.
So I guess the performance gain is because it's quicker for Diesel to construct a query string than it is to do nothing? If the benchmark isn't hitting the database, what is the prepare statement doing?
Okay, I had the exact same problem a few weeks ago, and here is how I solved it. Basically, I used sql-cli node package, and the COPY FROM function of postgres.
For the data, I wrote a bash script, that executed a series of queries and saved them to csv files.
Here is the script : https://gist.github.com/GaryHost/f150f7ddda6ee367b26e
Then I upload the csv.tar.bz2 archive to the postgres db server, and with the COPY FROM function i create my db data. (see here)
COPY table(column1, column2...) FROM '/var/lib/postgresql/data/pgdata/mssql/file.csv' WITH CSV DELIMITER ',' NULL '<null>';
As you can see, I was forced to do something a bit hacky to force the null items to be copied as such, and not as "null" strings.
As for the stored procedures, I was forced to rewrite them, but that was not much of a problem because the data architecture was kinda the same.
Those aren't the docs on GIN. They are the docs on Postgres full text search (as you can see from the URI).
These are the docs on GIN: http://www.postgresql.org/docs/9.4/static/gin.html
I'd add that from a performance perspective, Postgres full-text search is more than good enough (it may be tricky to write a good ranking function, though).
You actually have two problems, /u/darenw:
The result is that both the CPU and the disks must work much harder to service your front-end app until the caches are full. Depending on request overlap, this can take hours. Because every request, regardless of simplicity, must be rebuilt from scratch. The OS disk cache is also PostgreSQL's cache, so until a request comes in, data isn't in memory.
When you get time, you need to audit your application for data you can cache in some intermediate layer like memcached. If your application can not survive intermediate database outages, it's not ready for customer use.
You should also install the pg_stat_statements module and mine the view for information regarding queries since it has stats for execution counts and cumulative duration among others. You may find that 50% or more of your database traffic is caused by one or two very frequent queries that are great candidates for caching. These kinds of requests are generally fast, so aren't likely to be caught by checking pg_stat_activity since that's mostly a spot-check.
Have you considered using Postgres. It allows storing JSON documents using jsonb type that allows for efficient quering within the documents, and it provides JSON query functions. I find that this covers most of the cases I would use a NoSQL database for.
Maybe try Postgres Text Search or index the data with Elastic Search. Also Cloud Search is provided by AWS. Not sure how large their free tier is for that service.
There are several ways you can approach this - If you are okay to use LIKE query and it is sufficiently fast enough for your needs, then by all means go for it. If you can switch to Postgres http://www.postgresql.org/, they have full text search available out of the box. I have used it and it is good enough for the most cases. Another thing you can do is to add a full blown full text search engine like Elastic Search https://www.elastic.co/. The problem here is that you would need to adjust your infrastructure to use it (maintaining another service on your sever). Third option is to write tokeniser yourself like Ryan did in Railscasts some while ago http://railscasts.com/episodes/399-autocomplete-search-terms I have to mention that Elastic has been one of the least painful to setup compared to other full text search engines.
You can't specify the default owner for newly created tables, but you can set default privileges on a schema. See this: http://www.postgresql.org/docs/9.1/static/sql-alterdefaultprivileges.html
In particular, check out the examples.
> How frequently should a vacuum and an analyze be performed on a table?
On a production database, at least once a night. Docs, which you should read today if you have a PostgreSQL database in production.
> Would it be different for a table thats frequently written to vs one that is not?
Yes. Vacuuming recovers space from deleted rows, updates statistics for the query planner, and updates the visibility map. Tables that don't change don't benefit as much as tables that do change.
> In which situations is frequent use absolutely necessary?
Absolutely necessary is a fuzzy term here. For most users, the autovacuum daemon will do a better job than you will.
> I want to set up a sequence which always starts from the lowest available id value but also has default 'system' value that it should avoid.
In the first place, that's not a sequence. In the second place, you really don't want to do that. There's no real advantage to closing up all the gaps in a surrogate key column, and you open yourself up to the possibility of data corruption should you ever have to recover a table.
Just use the standard PostgreSQL sequences (SERIAL and BIGSERIAL).
this was posted two years ago and pretty much got the same response then
This isn't an anti-pattern. It's just a pattern. The four reasons given for it being an anti-pattern are not true: you don't have to poll (use sql server's query notifications or postgres's listen/notify), it's not necessarily inefficient to insert, update, and delete on the same table (use an index, especially a filtered index), clearing the records is trivial (do it whenever you want - all the time, at night, whatever), and finally sharing a database between applications is not a bad thing it's a good thing and the whole point of using a database in the first place.
PostgreSQL uses the Unix-domain socket location to decide where to put the lock files, so -k
should work.
See the documentation for <code>unix_socket_directories</code>. It is called unix_socket_directory
pre-9.3.
pg_stat_statements
should give you everything you need to know about queries running in your system, including the slow ones. It is much improved in 9.2 since it now normalizes the query text.
http://www.postgresql.org/docs/9.2/static/pgstatstatements.html
>he did use two public benchmarks
Ok, I didn't really look at that bit (because I tend to just fly over articles making silly claims).
However, benchmarks, you know.
On one side it's kind of kind of nice that there are benchmarks, yet we still all know they are utterly worthless (especially in super specific cases like ypur database setup).
So benchmarks (and for those who aren't convinced, do you know what your Windows 7 rating is for your computer ? does it have anything to do with your perception for your usage ?)... ok, they actually do have some kind of use. For example there are some standardised benchmarks of battery lifespan for laptops. Those can pretty much be used.
As for databases ?
I've seen as many different database uses as I've seen shops. Ans that was from a distance, since I poke at small ones for my clients (or my toys) with MySQL or PostgreSQL.
> 1) Postgres is not ACID compliant. Its sole mode of isolation is "Read-commited", not "Repeatable-Read"
The manual says otherwise: it has two isolation levels, read commited and serializable. PostgreSQL 8.1 or later is ACID-compliant.
If your table data is [almost] always in a specific order, you can also use the CLUSTER command (or your RDBMS equivalent, as CLUSTER is simply what PostgreSQL uses), which will physically sort the table on disk, based on index order, so the table is automatically in the right order by default. Use with caution, though. I've only ever used it on Materialized Views, when it was absolutely needed.
PostgreSQL Manual Page for CLUSTER.
> GCC supports a bunch of archaic platforms, so requiring a C99 compiler does not mean dropping support for droves of users
Here are the supported architectures, OSs, and compilers:
"You need an ISO/ANSI C compiler (at least C89-compliant). Recent versions of GCC are recommendable, but PostgreSQL is known to build using a wide variety of compilers from different vendors." -- http://www.postgresql.org/docs/9.1/static/install-requirements.html
"In general, PostgreSQL can be expected to work on these CPU architectures: x86, x86_64, IA64, PowerPC, PowerPC 64, S/390, S/390x, Sparc, Sparc 64, Alpha, ARM, MIPS, MIPSEL, M68K, and PA-RISC. ... PostgreSQL can be expected to work on these operating systems: Linux (all recent distributions), Windows (Win2000 SP4 and later), FreeBSD, OpenBSD, NetBSD, Mac OS X, AIX, HP/UX, IRIX, Solaris, Tru64 Unix, and UnixWare." -- http://www.postgresql.org/docs/9.1/static/supported-platforms.html
You seem to be arguing that PostgreSQL should support a narrower set of platforms. I'm pretty skeptical of that, even though I've never really used postgres outside of a "mainstream" environment. Aside from the fact that there are users who might not prefer GCC for whatever reason, there are benefits to all users when testing happens regularly on a variety of platforms.
5. Finding the Length of a Series
The presented solution doesn't work for postgreSQL because postgreSQL doesn't implement IGNORE NULLS for its window functions. So the best postgreSQL solution I can think of is to calculate a rolling sum based on lo
to yield a common number for each series, and then partition by
this value when looking for the upper and lower bounds of each series.
Code:
with data(id, amount) as ( values (9997, 99.17) , (9981, 71.44) , (9979, -94.60) , (9977, -6.96) , (9971, -65.95) , (9964, 15.13) , (9962, 17.47) , (9960, -3.55) , (9959, 32.00) ), signed_data as ( select id , amount , sign(amount) as sign , row_number() over (order by id desc) as rn from data ), data_with_lower_bound as ( select * , case when coalesce(lag(sign) over (order by id desc), 0) <> sign then rn end as lo from signed_data ), data_with_rolling_sum as ( select * , sum(lo) over (order by id desc) as rolling_sum from data_with_lower_bound ) select id , amount , first_value(rn) over ( partition by rolling_sum order by id asc ) - first_value(rn) over ( partition by rolling_sum order by id desc ) + 1 as length from data_with_rolling_sum
Result:
id | amount | length |
---|---|---|
9997 | 99.17 | 2 |
9981 | 71.44 | 2 |
9979 | -94.60 | 3 |
9977 | -6.96 | 3 |
9971 | -65.95 | 3 |
9964 | 15.13 | 2 |
9962 | 17.47 | 2 |
9960 | -3.55 | 1 |
9959 | 32.00 | 1 |
> connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
See that part of the error message? That means PostgreSQL expects you to connect through a Unix domain socket. You're not doing that. You're trying to connect through a TCP/IP socket. That's what "host: localhost" means.
Look at your pg_hba.conf file, and find the section for "IPv4 local connections". What's under that section?
> post: 5432
That part of your Rails configuration is wrong; it's port, not post.
You don't need to repeat "host: localhost".