Start with the name :)
While the manual isn't a tutorial, it's one of the most valuable resources to learn and understand Postgres
Then you need to consult the Python or the Python driver's documentation about prepared statements.
A quick internet search turns up this: https://www.psycopg.org/docs/sql.html
I have heard such a data structure being called a sum tree. PostgreSQL uses a similar concept of max tree to find free space for new riws.
Do you have an idea how your data structure should cope with MVCC? More specifically the MVCC implementation of PostgreSQL means that concurrent versions of rows have separate index tuples and you have to check the row header to figure out which one is visible to your snapshot.
I guess for your use case of randomly picking a row a retry loop that discards invisible rows would work. Then you have to deal with the next issue - how do you deal with new entries being inserted while you are traversing the tree. You could get a SHARE lock on the table, blocking out concurrent modifications, but that will make concurrency suck.
The good news is that you can probably use the GIST index extensibility, or even better SP-GIST, to have a properly WAL logged implementation of your access method.
Something Tom Lane said might be appropriate here.
> CHECK is meant to handle constraints on a row's value in isolation. If you try to use it to enforce cross-row conditions, the project will certainly end badly.
My advice: store and recall your timestamps in UTC. Run PostgreSQL in UTC. Make time zones a problem that the edges of your application have to deal with, not the internals.
That said, if you want your database to screw around with time zones, PostgreSQL has your back.
$ psql template1
template1=# select now();
now
-------------------------------
2014-07-24 08:53:26.174943-05
(1 row)
template1=# set timezone='Europe/Berlin';
SET
template1=# select now();
now
-------------------------------
2014-07-24 15:53:41.343164+02
(1 row)
So:
SELECT * FROM pg_timezone_names; ALTER ROLE german_user SET timezone='Europe/Berlin'; ALTER ROLE uk_user SET timezone='Europe/London';
For more, see the PostgreSQL timezone docs.
PostgreSQL does have replication build in with recent releases.
http://www.postgresql.org/docs/9.5/static/high-availability.html
I haven't used it in production but I did set it up in development to see how it works. MySQL does have a more mature replication system but it is possible in PostgreSQL without third party tools.
You can tweak autovacuum storage parameters on a per table basis.
On your hot tables you'll want to set the fillfactor to a lower value, and then drop the autovacuum_cost_delay for the table in question to a lower value.
Depending on the structure of your database, you may want to up autovacuum_max_workers and take a look at the autovacuum cost parameters ( global parameters set in postgresql.conf ).
Your ultimate aim should be to have autovacuum run frequent small jobs rather than occasional large ones.
If you'd like to talk about consulting PM me.
I strongly recommend taking a look at Postgres.app, developed by Heroku. It's a 1-step installation (drag the application package into the /Applications directory), includes PostGIS and hstore extensions, comes with the usual command line utilities, etc. I've done a couple dozen Postgres installs on Mac OS and I've found this to be the most painless way to install, run, and manage PostgreSQL.
The cost of an explained query is not a unit of milliseconds. It is an arbitrary unit, but more closely resembles disk page fetches. Can you please correct this?
Source: http://www.postgresql.org/docs/9.5/static/sql-explain.html
SELECT FOR UPDATE will block/fail if another SELECT FOR UPDATE/UPDATE/DELETE is pending rollback/commit for the same row[s].
Since other instances of the client routine use the same lock mode and therefore don't step over each other's feet, how is this unsatisfactory?
EDIT: /u/doublehyphen made me realize that my explanation was unclear/incomplete. The NOWAIT option for SELECT/LOCK allows for parallel workflow in the face of conflicting lock attempts (see http://www.postgresql.org/docs/9.4/static/sql-select.html ).
you have to give us a bit more info, whats your table definition?
Basicly you want to, rename that table, create the new table with partitions and then insert from the old (renamed) table into the new table.
http://www.postgresql.org/docs/9.4/static/ddl-partitioning.html
You can't modify the structure of an SQL statement in that way, only the values of parameters. If you want to change the structure of the query in the way that you are doing, you need to build the whole query as a string and use execute
to parse and run it (see http://www.postgresql.org/docs/9.3/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN)
The following change should pretty much work:
RETURN QUERY EXECUTE 'SELECT hrtf.prov_id, hrtf.name FROM hrtf WHERE '||x1||' AND '||x2||' AND '||x3||' limit 11';
> And what features does version 10 include?
10 is a bit outdated. You should use it for new projects. If you start a new Postgres project use 13 or 14
You can find a list of all features here
That is a huge stretch to use that to blame others with for the inevitable problems this is going to cause people.
psql --version
, postgres --version
.Okay, I had the exact same problem a few weeks ago, and here is how I solved it. Basically, I used sql-cli node package, and the COPY FROM function of postgres.
For the data, I wrote a bash script, that executed a series of queries and saved them to csv files.
Here is the script : https://gist.github.com/GaryHost/f150f7ddda6ee367b26e
Then I upload the csv.tar.bz2 archive to the postgres db server, and with the COPY FROM function i create my db data. (see here)
COPY table(column1, column2...) FROM '/var/lib/postgresql/data/pgdata/mssql/file.csv' WITH CSV DELIMITER ',' NULL '<null>';
As you can see, I was forced to do something a bit hacky to force the null items to be copied as such, and not as "null" strings.
As for the stored procedures, I was forced to rewrite them, but that was not much of a problem because the data architecture was kinda the same.
You actually have two problems, /u/darenw:
The result is that both the CPU and the disks must work much harder to service your front-end app until the caches are full. Depending on request overlap, this can take hours. Because every request, regardless of simplicity, must be rebuilt from scratch. The OS disk cache is also PostgreSQL's cache, so until a request comes in, data isn't in memory.
When you get time, you need to audit your application for data you can cache in some intermediate layer like memcached. If your application can not survive intermediate database outages, it's not ready for customer use.
You should also install the pg_stat_statements module and mine the view for information regarding queries since it has stats for execution counts and cumulative duration among others. You may find that 50% or more of your database traffic is caused by one or two very frequent queries that are great candidates for caching. These kinds of requests are generally fast, so aren't likely to be caught by checking pg_stat_activity since that's mostly a spot-check.
You can't specify the default owner for newly created tables, but you can set default privileges on a schema. See this: http://www.postgresql.org/docs/9.1/static/sql-alterdefaultprivileges.html
In particular, check out the examples.
> How frequently should a vacuum and an analyze be performed on a table?
On a production database, at least once a night. Docs, which you should read today if you have a PostgreSQL database in production.
> Would it be different for a table thats frequently written to vs one that is not?
Yes. Vacuuming recovers space from deleted rows, updates statistics for the query planner, and updates the visibility map. Tables that don't change don't benefit as much as tables that do change.
> In which situations is frequent use absolutely necessary?
Absolutely necessary is a fuzzy term here. For most users, the autovacuum daemon will do a better job than you will.
> I want to set up a sequence which always starts from the lowest available id value but also has default 'system' value that it should avoid.
In the first place, that's not a sequence. In the second place, you really don't want to do that. There's no real advantage to closing up all the gaps in a surrogate key column, and you open yourself up to the possibility of data corruption should you ever have to recover a table.
Just use the standard PostgreSQL sequences (SERIAL and BIGSERIAL).
pg_stat_statements
should give you everything you need to know about queries running in your system, including the slow ones. It is much improved in 9.2 since it now normalizes the query text.
http://www.postgresql.org/docs/9.2/static/pgstatstatements.html
i miss pgAdmin 3 a lot - it was a really great tool (despite some bugs that didn't get fixed in years...). I've heard of it's terrible codebase and understand that a team can get to the point where a complete rewrite simply is the better/easier option - but it feels like ever new release of pgAdmin 4 is broken for the first week because of weird bugs. As much as i want to use it, i just couldn't stand it anymore - this might also be the case for you.
The tool i'm using at the moment is DBeaver. The UI is, in my optinion, pretty ugly (as all eclipse-based programs seem to be...) but it's very clean, easy to understand and has a lot of great features.
Takes some days to get used to, but that's to be expexted after ~8 years of working with the same great tool (and over 2 years trying to give it's successor a chance only to be disappointed again and again...).
I also tried DataGrip and liked it quite a lot - feature-rich AND good-looking - but since i was looking for something free, just like you, i switched back to DBeaver.
If anyone involved in pgAdmin 4 reads this: i don't want to sound mean. i hope pgAdmin 4 will be great someday - but at the moment, almost three years after it's initial release, it's still buggy mess with, in parts, really clunky UI - much more than pgAdmin 3 ever was. And even though it brought us a bunch of new features (the dashboard... and autocompletion that's barely ever woirking correctly...), it's still missing quite some that pgAdmin 3 had (a f***ing query-history that's not gone after a restart...).
​
^(sorry for the rant; i had to get that off my chest; pgAdmin 4 is haunting my dreams.)
This.
Get yourself a cheap linode/droplet or other vps. $5 per month and that buys you 1 gig ram 20 gig ssd storage. Gonna put a lot more then 30k of rows on such vps. You also only pay while the machine is running.
Admin might be a bit of a pain. But there docs for that:
My understanding is, that the overhead of using serializable has a substantial bigger overhead in other products (mainly SQL Server).
And apparently Oracle's implementation is not a true serializable
The blog also includes a link to a presentation from Markus Winand about different serializable implementations which is also interesting
https://news.ycombinator.com/item?id=9510015
> MERGE is actually a separate feature. We're still considering implementing it. But it's syntax is too cumbersome for many usages. There's also some unrelated complexity - so implementing merge support does get easier by the infrastructure in here, but a fair amount of work remains.
Just the datafiles and wals. With an atomic snapshot, you do not need to wrap with pg_start_backup, pg_stop_backup.
For reference - http://serverfault.com/questions/568120/any-issues-with-filesystem-only-backups-on-postgresql
I've never used it in anger, but Liquibase can diff DBs: http://www.liquibase.org/documentation/diff.html
Looks like there's a few things it can't cope with according to the docs, but maybe worth evaluating.
A couple of Postgres providers that spring to mind. I'm assuming that changes to your schema will be handled :
I'm sure there's more but it depends on how easy you want it to be and how much you're willing to spend.
You can use Heroku PostgreSQL outside of Heroku iirc – https://www.heroku.com/postgres $50 will get you the smallest standard plan, so no hot failover, but if you're expecting that for such a small amount of money without DIY effort, then I'd say it's likely unobtainable.
> connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
See that part of the error message? That means PostgreSQL expects you to connect through a Unix domain socket. You're not doing that. You're trying to connect through a TCP/IP socket. That's what "host: localhost" means.
Look at your pg_hba.conf file, and find the section for "IPv4 local connections". What's under that section?
> post: 5432
That part of your Rails configuration is wrong; it's port, not post.
You don't need to repeat "host: localhost".
Actually the docs do mention it, but possibly not as clearly as they could. SSI guarantees that if the transaction succeeds then it will be consistent with a serial execution order. It doesn't guarantee that a transaction will only ever fail is it was not consistent.
In fact the very bottom of the docs states:
Which is exactly what you're seeing here. Every update requires a sequential scan, so it will conflict with every other update. If you add a PK constraint on your key column (and possibly disable sequential scans to force it to use it) then you should find that the serialization errors disappear.
But really you need to be willing to accept that you will sometimes have to retry transactions that had nothing wrong with them in the first place. Which shouldn't be a problem, as long as it doesn't happen often enough to impact performance. If it does (as I have experienced in some limited cases) then you may have to do some sort of external serialization (e.g. advisory locks). But if you don't have a real performance problem then there's a huge advantage in simplicity to just stick with retrying on failure.
http://www.postgresql.org/docs/9.5/static/ddl-schemas.html
Setting this saves you time when typing in schema names. For example you have a multi-schema database, but you use one of them more, you set that one to your search path. This saves you from typing in the schema name every time you want to perform a command.
Yeah there are a few things to be done here:
Most other databases can do these things (even SQLite).
[1] Some experimental prototyping I did on this simple case: http://www.postgresql.org/message-id/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww@mail.gmail.com
This is false. PostgreSQL changes the first digit whenever they feel like, just like the Linux kernel. The major version of the current release is 9.4.
So UC are two separate permissions, Usage and Create.
I got them from here: http://www.postgresql.org/docs/9.4/interactive/sql-grant.html There is a table with all the permissions listed out towards the bottom of that page.
It looks like that for your schema servers_schema that Usage and Create were granted to the public role, this would mean that any role that you created could go ahead and create tables in that schema.
So you need to do two things to get this working the way you want. You need to grant usage only to the role on your schema like so:
GRANT USAGE ON SCHEMA servers_schema TO test1;
Then you need to revoke access to public. like so:
REVOKE ALL ON SCHEMA servers_schema FROM PUBLIC;
That should get you where you want to be.
Another way to observe possibly confusing inconsistencies between serial id and 'inserted_at'-ish now()-based timestamp ordering is through the use of sequences with CACHE set to > 1. Sequences can be set to have backends pre-check-out N values every N nextval() calls, reducing locking contention on the sequence. So backend A could check out values 1-10, advancing the sequence to 11, but initially only use value 1. Then if the next insert were done through backend B, it'd claim 11-20, inserting its first row with value 11. But then if A saw the next activity, we'd then see the third timestamp in sequence get related to id 2.
So with CACHE set to > 1 (the default) for sequences, then you're probably better off erring on the side of the timestamp column for determining order. Although then with long TXs and the default clause using now() then it could get real ambiguous. Or better yet switch your default value clause for the timestamp from now() to statement_timestamp(), which always makes a gettimeofday() call instead of being fixed / locked to TX start time.
So, YMMV in a concurrent world!
There are a couple options here. Assuming the sorting formula can be calculated on data from each row individually, you can just index the expression. If the sorting expression is more complex though, using, for example, window functions or JOINs, you may need to use a materialized view with the sort key as a column, which you can then index. The drawback here though is that PG doesn't (yet) have incrementally updated mat. views, so you'll need to periodically REFRESH
it.
Either way, depending on your particular application, you may benefit from CLUSTER
ing on the index, but, you'll have to recluster periodically to maintain it.
Edit: I should add that if you have to go the materialized view route, it likely can just have the two columns (row_id, sort_key)
.
http://www.postgresql.org/docs/9.4/static/datatype-numeric.html section 8.1.4 is the doc you should be looking at, under "is the equivalent to specifying".
Once you have the sequence, calling setval will set its value.
Unless you're running an ancient version of Postgres (8.4 or earlier) the first question you should ask in relationship to Slony ( or Bucardo ) is:
What does this do for us that streaming replication and wal-log shipping don't?
Trigger based replication is prone to error, prone to malfunction, less durable in the face of network hiccups and more complex to reason about. You should not use it if you can do the job with postgres builtin replication tools.
And if you're using it to extract data to another postgres database in an altered form ( for instance automatically turning transactional data into entries in a fact table in a data warehouse ) you're probably better off using postgres_fdw than mucking around with Slony.
PostgreSQL's high quality documentation is among the best of any open source project available. Perhaps what you are looking for is in the PostgreSQL Tutorial. That would be an excellent complement to the reddit SQL wiki links you listed for someone learning SQL using PostgreSQL.
The exact quote from the docs is: > Note: The maximum allowed precision when explicitly specified in the type declaration is 1000; NUMERIC without a specified precision is subject to the limits described in Table 8-2.
So precision is limited to 1000 if you explicitly set the precision. If you just define a type as numeric
with no parameters, it gets the far bigger precision of 131072 + 16383 (from the referenced table).
I'm not sure about redshift. But here's the docs for postgresql, which redshift is based on. http://www.postgresql.org/docs/9.4/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL
There's a few different kinds of types with their own storage requirements, you can pick what you want to meet your needs.
PG replication has been perfect for me for the last several years. I replicate 10's of thousands of WAL's a day across my clusters (either via direct WAL reading or streaming) and have not had a single problem.
One clarification: standby PG instances will attempt to read WAL's at startup and will switch to streaming once it has read all available WAL's assuming the master has the next xact available. Streaming is faster method for getting each xact to the slave, it is not faster for (or even possible to use for) reading in large numbers of WAL files as one would need in a PITR scenario.
http://www.postgresql.org/docs/9.1/static/high-availability.html
Comments can be useful. SQL comments in any complex functions are useful too.
Of course, a sensible, logical design is more important than any documentation.
Anonymous blocks do exist, but the trigger structure (see the internal table pg_catalog.pg_trigger) requires a reference (pg_catalog.pg_trigger.tgfoid) to a function.
The documentation (see http://www.postgresql.org/docs/current/static/sql-createtrigger.html) indicates as much; it indicates that what you need to do is to indicate a procedure to execute.
You'll want to avail yourself of the trigger docs... <http://www.postgresql.org/docs/9.3/static/trigger-definition.html>
Note that the function needs to return something sufficiently carefully chosen, either NULL or NEW, depending on situation. See the trigger definition page in the docs.
Is it possible for you to make a backup first to another drive where you have the space? if so then do it and then carefully delete the OLDER files first. Be careful not deleting the newest ones (last hour maybe) since they might be needed in case a backup is scheduled.
http://www.postgresql.org/docs/9.3/static/backup.html
Normally shouldn't be a problem to delete the files BUT you can be in a situation where you can't recover the database to a point in time, that's why I recommend you to make a backup first. If you can't make a backup then delete the files but you should make a backup as soon as possible.
Also I recommend you if backups are scheduled to run and if they delete the files or not, if not then modify the job to do so. Also make sure the replication is off and if you have archiving setup or not.
This is a fascinating article. I have been following the 60-core Linux (Ubuntu) testing that has been talked about on the list serve this month as well. It is worth a read for anybody deploying to more than 24 cores. Here is the article http://www.postgresql.org/message-id/[email protected]
"time with time zone" and "timestamp with time zone" are two different things in PostgreSQL. Don't use time with time zone.
> We do not recommend using the type time with time zone (though it is supported by PostgreSQL for legacy applications and for compliance with the SQL standard).
I do not know of such a write up, but I assume you mean range types (start + stop time) rather than intervals (a duration, e.g. '4 hours'
). A reservation should normally be represented as a tstzrange, e.g. '[2010-01-01 14:45, 2010-01-01 15:45)'
.
Official documentation: http://www.postgresql.org/docs/current/static/rangetypes.html
There are a lot of them. I recommend looking at the Jobs mailing list, or if you want more piecemeal assistance, the Performance mailing list can be pretty handy.
Unfortunately you're not likely to get a lot of hands-on from a subreddit. :)
Not exactly sure what you are asking for, but the answer is probably yes. I would look into using CASE
expressions.
Manual page: http://www.postgresql.org/docs/current/static/functions-conditional.html
Read the explanation of serial in the official documentation to understand why integer is the correct type for the foreign key. The short answer is that serial is not a true type but instead a convenient way to define auto incrementing columns.
i've done most of my learning the hard way just through trial and error. i don't see a whole lot in terms of tutorials, etc. for postgres...a lot of times i just see referrals back to the postgres manual. however, the manual is excellent.
I actually hate all the IDE's for PostgreSQL right now, PGAdmin 3 was great, it was simple, out of the way, could save sql as files on the file system, view DDL, permissions, etc... just simple, clean and very useful.
Now everything is Web based and I hate it, and all the IDEs that you can install have 5000 features with 4950 you don't need or will need once.
​
Why can't someone make something simple, I've tried all of these on the list that could be installed on a Mac https://alternativeto.net/software/pgadmin/
Well, you can use bucardo to replicate out, but that's got its own challenges. There's also this now but I haven't had a chance to use it. From my cursory reading it won't work for migrating data to non-AWS servers (although it does handle getting data from non-AWS servers into AWS servers) but you could use that to migrate to an EC2 server and then replicate from that out. No idea how it fares with stuff like DDL and whatnot since it's almost certainly built around logical replication facilities.
QueryPie is by far my favorite free option (I love the dark mode).
We're now using RubyMine at work, so I've just been using the built in DB management, and that works pretty well (but not free).
Example:
revoke all privileges ( salary ) on table employees from exhuma;
set role exhuma;
select * from employees; --boom
Example:
set role exhuma;
create function audit_table() returns trigger language plpgsql as begin $$ insert into history_table (...) values (..., current_user) --that's you $$ end;
Web server connects to db as www
or something. User logs in to web site. Web server calls set role [email protected]
, so that current_user
is [email protected]
.
It would be more secure if PostgreSQL supported a "magic cookie" that prevented current_user
from calling set role [email protected]
. SQL Server does this. This is only a problem if you let users run their own SQL though.
Decent answers here: https://stackoverflow.com/questions/2998597/switch-role-after-connecting-to-database/19602050
In linux, PgAdmin3 was fixed some time ago, now it works fine with Postgres 10Good PgAdmin3 version is 1.22.2
P.S. Also, there is some alternatives exists, like OmniDB
https://www.slant.co/options/208/alternatives/~pgadmin-4-alternatives
While this is surely overkill, Talend Open Studio can do this. It is a free ETL solution. It's definitely a sledgehammer for this task, Foreign Data Wrappers in PostgreSQL are definitely better in terms of performance. But being someone who is newish do DBs, having a PointyClicky GUI interface might help.
As an ETL, you can define data sources (MySQL in your case), transformations (which might come in handy for data type conversions) and data "sinks" (PostgreSQL in your case). You then just connect the visual components together and hit the big green Play button.
Edit: They don't have many screen-shots on their home-page, but a Google images search should give you a good idea.
Hey! I work at Cockroach and I think we would fit exactly what you're looking for.
> * to have a distributed database solution in place which is horizontally scalable
That's exactly what we do.
> * Easy to add nodes, remove nodes without any downtime (sure I can accomodate some write-locks for setup)
No write locks needed. Nodes can be added and removed easily. We even sure that all upgrades can be rolling upgrades. More than that, we even have online schema changes.
> * Have the ability to tweat replication factor. Ideally I would love to have replication = number of nodes i.e. each node has complete database. So that when there are simple queries, it doesnt have to do distributed queries (which make system slow) and when the query is a bit complex, it does distributed since data is available in each node.
We do both local and distributed queries, but it does depend on how you setup your tables. We offer interleaved tables to ensure that child tables are in the same range and don't require a distributed query. And you can set the replication factor (and replica locations) based on Database, Table, and even row based.
> * Best case: Some GIS based plugin / extension on the solution would be icing on the cake.
Sadly not yet. But what exactly are you looking for? We've had a few requests for geo-spactial .
> * SQL compatible, so that least of the application rewrite is required.
Cockroach speaks the postgres wire protocol. So the SQL you know already should mostly already work.
If you have any questions, I'd be happy to answer them.
if you'd entertain a server-side option, redash is pretty rad. It'll do autocompletion from schema and syntax, and it's great for workgroup-oriented stuff (reports, scheduling, sharing snippets, etc.)
to say a touch more on what /u/doublehyphen said, in general PG will just magically do the right thing for you when it comes to timezones. When you connect to PG you say my timezone is X, and then it will just handle all the conversions for you, you can just speak in timezone X by default, and it just does the right thing. You can also speak in other timezones if you want, but you don't have to. If you know you always just want to speak UTC, you can just do something like: set timezone='UTC'; (or set it as a default). The docs help a lot here: http://www.postgresql.org/docs/current/static/datatype-datetime.html
No, for reasons mentioned elsewhere. However, it is possible to mark tables as unlogged, which prevents them from being replicated (but does mean they'll be truncated in the case of a crash).
there is: advisory locks. They are exactly what you want assuming you can organize all locking activity into a 64 bit space.
We'll probably need a little more info, like an example or two. Its hard to guess without something a little more concrete. Also be helpful to know what version of PG you are using.
You are using alter default privileges..., right?
UPSERT BABY!
Real talk though, I will probably wait for 9.5.1 before rolling out the upgrades.
Nice amount of "free" performance gains it seems, as per release notes.
Good list, but since PG 9.4 there is a much nicer way for Tip 8 - Pivot Tables (see 4.2.7. Aggregate Expressions) using an aggregate filter:
select date, count() filter (where type = 'OSX') as osx, count() filter (where type = 'Windows') as windows, count(*) filter (where type = 'Linux') as linux from daily_visits_per_os group by date order by date limit 4;
> For example, being a member of a role with CREATEDB privilege does not immediately grant the ability to create databases, even if INHERIT is set; it would be necessary to become that role via SET ROLE before creating a database. (http://www.postgresql.org/docs/current/static/sql-createrole.html)
'SET ROLE admin_template' and then do the create database.
WHERE
is also a thing. More to the point though, there are only five possible values for the <code>state</code> column (disabled
won't occur if your server is tracking activity), and you're trying to pull out three of them, so saying those need to be highlighted is kind of odd. Not only that, but those are the three most common states, so just sorting by COUNT(*)
as in /u/Vardy's example will bring them to the top (not, of course, that those three are going to get lost among the other two possible states).
But, wait; there's more! state = "fastpath function call"
will almost never occur in practice because, to quote the docs, "this interface is somewhat obsolete, as one can achieve similar performance and greater functionality by setting up a prepared statement to define the function call," so there is really only one possible value of state
that you're not grabbing. Saying you want to highlight these three values seems silly when not only are there only four realistically possible values, but the fourth, "idle in transaction (aborted)"
, indicates an error and is one that you really, really want to know.
tl;dr, state
will in practice only take four possible values. You're trying to grab three of those (so there's no need to "highlight" anything), and you really should want to know the fourth as well.
You need to check your return type versus what you are actually returning. Basically the example function is trying to put two variables into a single return variable.
I'm not sure all you are trying to do in this function but using a RECORD type instead of text may help.
Also you may want to just use a SQL function instead of a PL/PG function if all you are doing is returning a query result, but that goes back to I'm not sure what logic you will have in there.
Hope this helps.
The intended API for custom ordering in indexes is to define an operator class. In your case I would consider creating a custom type that stores the data in an easy to process manner and then define comparison operators and the associated btree opclass using those ops.
pg_receivexlog will allow you to stream WAL without setting up a standby. You can even set it to replicate synchronously, guaranteeing no lost transactions in case of a single failure.
Note that this behavior has changed in 9.5: > Use assignment cast behavior for data type conversions in PL/pgSQL assignments, rather than converting to and from text (Tom Lane) >This change causes conversions of booleans to strings to produce true or false, not t or f.
The docs seem to cover it pretty well: "-g --globals-only Dump only global objects (roles and tablespaces), no databases." ref: http://www.postgresql.org/docs/9.3/static/app-pg-dumpall.html
You can do your own global dump and check the sections:
--
--
--
-bash-4.1$
If you downloaded the EnterpriseDB Windows installer, it asks you to set the database administrator's password during the install. This is the password for the database user 'postgres' and has access to all databases that you create. You will need this password to connect via psql or pgAdmin III, until you set up additional user accounts. Normally you would set up at least one user level account with which to create and access individual databases. Like most database servers PostgreSQL maintains a set of users and corresponding passwords to control access to the database, which is completely separate from the Windows login system.
If this is a completely new install, you could just rerun the installer, start over and enter a new 'postgres' user admin password.
Or, if you want to keep this install for some reason, you can change the rules for who's allowed to login by editing the file C:\Program Files\PostgreSQL\9.4\data\pg_hba.conf. Here are instructions on how to do that. It's a little tricky, but not terrible. Beware of course that if you circumvent the password rules it will potentially leave your database open to access by anyone, so use a little caution.
Here is a guide on Regex it takes some getting used to but it's pretty powerful and has a whole host of uses in cleaning up weird data. REGEXP_MATCHES won't quite work because of what it returns, which isn't a text string. You can use the global flag (g) in the 4th argument to get it to replace all instances of the expression.
REGEXP_REPLACE(room,E'[A-Za-z]'','','g')
This statement will turn ENG into an empty string ('') which is not an integer. I would suggest fooling around with the statement and seeing if there is some means by which you can make the replace statement to do what you want. You can also nest REGEXP_REPLACE statements to further refine the string. For example I could change 'Foo1 2Bim Bar 3' to 123 using the following (yes there is a much better regex for this).
REGEXP_REPLACE(REGEXP_REPLACE('Foo1 Bim2 3Bar','[A-Za-z]','g'),' ','','g')
Additionally you don't need to replace things with empty strings
REGEXP_REPLACE('Cat','(Cat)','Dog' I'd review the documentation on string operations and see what you can come up with.
> I have managed to track down the reason moving our production DB to the 9.3 server isn't working is because template1 has been modified.
Pretty sure that's not possible. Convince me.
For fault isolation, I'd suggest you dump and restore less, and that you dump as plain text. Look at the --schema-only and --format=plain options for pg_dump. Watch for warnings on stderr.
Also look at pg_dumpall; it can dump roles and globals.
EXCEPTION WHEN OTHERS
in situations like this is very bad form. I'd strongly advise to figure out the specific exception you want to catch (you can use the literal condition name from this page). Probably that'd be "invalid_text_representation"/
Personally, if it was me, I'd also be validating against a regex instead of involving a subtransaction (an internal exception handler). subtransactions are more expensive than they look on the surface.
Also, I tend to prefer to just have the function return the converted value instead of a bool, reserving null for failed validations.
Finally, you should declare this function IMMUTABLE. Since the output is unambiguously determined from the inputs alone.
So I've seen you mention that you did a full vacuum a couple times now. Do you mean to say you are doing a VACUUM FULL ? and if so why a VACUUM FULL over say a regular VACUUM?
From what I can tell of your workload, lots and lots of updates, a regular VACUUM should be sufficient to free up space available to postgres while allowing selects/updates/etc to continue working. A VACUUM FULL is only useful if you need to free up disk space available to the OS, which doesn't seem to be your case.
I'd guess tweaking the autovacuum settings as others have suggested would solve your issue. But there is no reason for Postgres to need that 8 hour downtime.
As always the PG docs are great. VACUUM
To be clear, the problem's not PostgreSQL, the problem is my usage of it, lol.
I'm having a tough time extracting pertinent info from the installation guide, I'm not confident that I made correct/complete changes to config files, and it has the most steps and the steps that are the most complicated. Though I feel like I made it successfully through each section of the installation using the command lines, I didn't open PGAdminIII (mostly because I couldn't figure out how).
Also contact the companies listed here http://www.postgresql.org/support/professional_support/.
Most offer casual support by the hour. We once contracted one of them for 100usd per hour for 2 hours during which time we went through our configuration and ha setup. Worth every dollar.
> Postgres does break SQL spec and offer a proprietary syntax to support case sensitive object names.
No, it doesn't.
Double quoted identifiers are called delimited identifiers. They've been a part of SQL standards since the dark ages. Seriously, SQL1992 might be a real eye-opener for you.
As for the PostgreSQL devs, their advice on delimited identifiers is
> If you want to write portable applications you are advised to always quote a particular name or never quote it.)
PostgreSQL does break with SQL standards in folding unquoted identifiers to lower case. SQL standards require folding unquoted identifiers to upper case.
I looked at the source code and all answers in this thread are, while good guesses, incorrect.
The error message happens within set_pglocale_pgservice()
(which sets PGSYSCONFDIR
and PGLOCALEDIR
; see Environment Variables) when looking up the path to psql
, a step which includes resolution of symlinks. And resolving a symlink apparently requires changing to the directory of the executable. And after that you wish to restore it so \o
, \i
, etc will use the original working directory (this step is what fails due to permissions). The comment below explains why changing directory is necessary, but I am not sure why we need to follow the symlink in the first place, or if the comment is actually correct.
/* * To resolve a symlink properly, we have to chdir into its directory and * then chdir to where the symlink points; otherwise we may fail to * resolve relative links correctly (consider cases involving mount * points, for example). After following the final symlink, we use * getcwd() to figure out where the heck we're at. * * One might think we could skip all this if path doesn't point to a * symlink to start with, but that's wrong. We also want to get rid of * any directory symlinks that are present in the given path. We expect * getcwd() to give us an accurate, symlink-free path. */
Edit: Improved wording.
> No we can't upgrade it, I have to work with it.
There's nothing wrong with PostgreSQL 8.4… except it is end of life'd. Make sure someone with the authority to perform upgrades is aware that 8.4 is unsupported and no longer receiving updates.
(This is besides the fact that you're missing five years of PostgreSQL development and you're five major releases behind.)
Sure. Just setup a copy of your database from a backup. Using pg_basebackup is probably the simplest.
For anything faster/more efficient you need cooperation from your storage subsystem. For instance with the btrfs filesystem you could keep a streaming replication standby on your staging system, switch it into backup mode, make a copy using cp --reflink=always, edit postgresql.conf to change the port and start the copy up as a fork.
For forks from older states use backups, xlog archive and recovery_target_time.
Single reliable possibility is to use CONSTRAINT EXCLUDE which is available since 9.0 and store whole path as an array or ltree.
http://www.postgresql.org/docs/9.3/static/sql-createtable.html
Neither CHECK constraint nor trigger can solve your problem reliably in presence of concurent updates.
Other way is to take advisory lock in transaction ie serialize it by hands. Or, perhaps, SERIALIZABLE isolation level since 9.1 could help.
If you're going to start out with 9.4, you may want to look at tweaking the trigger to store the changes in jsonb format instead of hstore.
I don't know any way of retrospectively getting that info (not to say there isn't...) but if you don't mind a bit of scripting you can probably get it as it happens. You'll notice that the lock wait log messages appear twice - once when they've had to wait a second, and then once when they've finally completed. So you could write a script to watch for the initial warning, then automatically run the diagnostics /u/mgonzo linked to and log their results.
FYI, this page lists some of the operations that can acquire locks of the various levels, to give you some hints. Now I think about it, I've seen some lock wait issues with autovacuum in the past, so probably worth logging those (log_autovacuum_min_duration = 0
) too, to see if there's any correlation.
Good luck!
Edit: on the off-chance it really is a long-running query, you could always enable slow query logging, and make sure the log prefix includes the transaction ID (formatting options are documented in the config file). That should correspond with the transaction ID mentioned in the lock wait if it's to blame.
I would agree with you on all columns except MONEY datatypes, since many aggregates and math functions do not work as one would expect on MONEY columns. Our primary system uses money datatypes and it is standard practice to convert them to numerics. MONEY WAS a deprecated data type but they removed that designation (which was a bad decision IMHO) - http://www.postgresql.org/message-id/[email protected] - they should have left it dead IMO.
http://www.postgresql.org/docs/9.1/static/functions-aggregate.html
Note how most aggregate functions cannot accept a MONEY datatype as an expression.
Not everyone can add or modify indexes and such, I always try to approach things from the view that the user does not have the ability to change the source data structure in any way.
I also prefer to share help in the SQL standard format where applicable, which is why I didn't mention short-form casting in Postgre - I use it myself at work because of the nature of our work and the slim chance of moving to another RDBMS, but that is not always the case for others, and code portability is pretty important for many.
I am in complete agreement, though I'll go a bit further. psql
is a console for running SQL in the same way that bash
is a console for running processes – yes, that's its fundamental purpose, but it's capable of saving you much more effort than that description implies.
You get a ton of useful meta-commands: \h
to describe any syntax you've forgotten, \d
to describe any schema you've forgotten, \e
to edit a complex query in your favorite editor, \!
to drop to a shell, \ef
, \i
, \s
, \copy
, \watch
…
You get readline
, which is not specific to psql
but does more than you might think to save you effort.
You get variables, SQL interpolation, and several mechanisms for moving {data|commands} {in|out} of {files|pipes}. Plus, it all works over the network, not only meaning you can ask psql
to connect to a remote machine, but also that you can run psql
on a remote machine over SSH, so any psql
skills you learn are always available.
So: yeah, I also encourage you to use psql
:-)
Are you sure the problem is the comparison of the timestamps and not some other aspect of the join? Have you tried changing the query to SELECT t1.*, t2.timestamp FROM agent_log AS t1, ( ...
? I do not know what the query is supposed to do so I cannot look for other possible bugs on my own.
A couple of unrelated suggestions after looking at your code:
inet
type to store IP addresses, or the ip4r
extension of you handle them a lot. This means validation, smaller storage, faster lookup and more operators and functions.OIDS=FALSE
should be the default in your database so you should not have to specify it. Check the default_with_oids config option.ORDER BY
does not do anything useful in t2
.Change the value of unix_socket_directories
(or unix_socket_directory
pre-9.3) in the config to /tmp
or somewhere else where you are allowed to write. As explained in the relevant documentation it is used to specify where to put the lock file.
I hate when stupid advice end up first on Google.
EDIT: As the guy replying in /r/Debian said you should also be able to set the socket directory with postgres -k DIRECTORY
.
What you want is pg_xlog_replay_pause
. If you call this on a streaming standby, it will become a stable backup source until you call pg_xlog_replay_resume
.
Full documentation in System Administration Functions.
Read the more-than-reasonable PostgreSQL docs, at least those covering client application "psql". No, you're not going to memorize it all in one pass, but there's a chance you'll recall having seen something somewhere. Gotta start somewhere.
That said, many many meta-commands have been added to psql over the years, so separting your wheat from the chaff may be a challenge.
Check the related functions in the manual. You probably want pg_last_xact_replay_timestamp()
. On the master you can also get information about the slaves from the pg_stat_replication
view.
For those who are interested, they have a feature matrix that really shows off what they've added. And of course, the what's new in 9.2 Wiki entry is pretty handy, too.
Really, 9.2 is turning out to be a monster of a release.
"Currently, only B-tree indexes can be declared unique".
And, just to elaborate, all indexes are B-tree unless you explicitly ask for something else.
PostgreSQL does not support filtering on the key value of an object inside an array yet. (AFAIK)
I guess...
select substring('https://www.amazon.com/Barbie-FWV25-Travel-Doll/dp/B07GLMHSS8/ref=sr_1_1', '(?<=/)B.{9}');
returns Barbie-FWV
because that's the first time it encounters /B.....
. But
select substring('https://www.amazon.com/Barbie-FWV25-Travel-Doll/dp/B07GLMHSS8/ref=sr_1_1', '(?<=/)B[A-Z0-9]{9}');
returns B07GLMHSS8
.
First off, I'm sorry you're angry. Have you considered getting a new face? I just checked Amazon, and it looks like you could become a superhero if you wanted.
Much of the older data will not be accessed again, but we don't know ahead of time what will be needed. We are required to keep the data for just over a year.
Because we are changing our structure by a lot, I don't have a good answer to the size question until we start cramming it in the database. Our current educated guess for a week would be roughly 800GB - 1.25TB.
The records are all structured the same way. There are few types of records though, so we are considering putting each type on a different server. We would most likely still need a partitioning (or similar) solution.
I found a guide online that helped me connect without using SSH tunneling. Digital Ocean's guides have been super helpful in setting up my server: https://www.digitalocean.com/community/tutorials/how-to-secure-postgresql-against-automated-attacks
I think it's pretty secure. I still have SSH tunneling available. But my main concern now is that the connection will break if the IP address on my computer changes. May have to look into giving myself a static IP address (haven't done this in years)