Ciro Santilli $$ Sponsor Ciro $$ 中国独裁统治 China Dictatorship 新疆改造中心、六四事件、法轮功、郝海东、709大抓捕、2015巴拿马文件 邓家贵、低端人口、西藏骚乱
We have some runnable SQL examples with assertion under the sequelize/raw directory.
These examples are written in the Sequelize library using raw queries.
Sequelize is used minimally, just to feed raw queries in transparently to any underlying database, and get minimally parsed results out for us, which we then assert with standard JavaScript. The queries themselves are all written by hand.
By default the examples run on SQLite. Just like the examples from sequelize example, you can set the database at runtime as:
  • ./index.js or ./index.js l: SQLite
  • ./index.js p: PostgreSQL. You must manually create a database called tmp and ensure that peer authentication works for it
Here we list only examples which we believe are standard SQL, and should therefore work across different SQL implementations:
Have a look at some interesting examples under nodejs/sequelize/raw/many_to_many.js.
Each transaction isolation level specifies what can or cannot happen when two queries are being run in parallel, i.e.: the memory semantics of the system.
Remember that queries can affects thousands of rows, and database systems like PostgreSQL can run multiple such queries at the same time.
Implementation specifics:
Example where this level is sufficient: nodejs/sequelize/raw/parallel_update_async.js.
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js is an example which experimentally seems to be solved by REAPEATABLE READ, although we are not sure that this is truly the case and why. What is clear is that that example is not solved by the SQL READ COMMITTED isolation level.
In PostgreSQL, this is the first isolation level which can lead to postgreSQL serialization failures, this does not happen to SQL READ COMMITTED isolation level in that DBMS. You then have to retry the transaction.
nodejs/sequelize/raw/parallel_update_worker_threads.js contains a base example that can be used to test what can happen when queries are being run in parallel. But it is broken due to a sqlite3 Node.js package bug: github.com/mapbox/node-sqlite3/issues/1381...
nodejs/sequelize/raw/parallel_update_async.js is an async version of it. It should be just parallel enough to allow observing the same effects.
This is an example of a transaction where the SQL READ COMMITTED isolation level if sufficient.
These examples run queries of type:
UPDATE "MyInt" SET i = i + 1
Sample execution:
node --unhandled-rejections=strict ./parallel_update_async.js p 10 100
which does:
The fear then is that of a classic read-modify-write failure.
But as www.postgresql.org/docs/14/transaction-iso.html page makes very clear, including with an explicit example of type UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;, that the default isolation level, SQL READ COMMITTED isolation level, already prevents any problems with this, as the update always re-reads selected rows in case they were previously modified.
If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row
Since in PostgreSQL "Read uncommitted" appears to be effectively the same as "Read committed", we won't be able to observe any failures on that database system for this example.
nodejs/sequelize/raw/parallel_create_delete_empty_tag.js contains an example where things can actually blow up in read committed.
This example is similar to nodejs/sequelize/raw/parallel_update_async.js, but now we are doing a separate SELECT, later followed by an update:
  • SELECT FROM to get i
  • update on Js code newI = i + 1
  • UPDATE SET the newI
Although this specific example is useless in itself, as we could just use UPDATE "MyInt" SET i = i + 1 as in nodejs/sequelize/raw/parallel_update_async.js, which automatically solves any concurrency issue, this kind of code could be required for example if the update was a complex function not suitably implemented in SQL, or if the update depends on some external data source.
Sample execution:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED'
which does:
Another one:
node --unhandled-rejections=strict ./parallel_select_and_update.js p 2 10 'READ COMMITTED' 'FOR UPDATE'
this will run SELECT FOR UPDATE rather than just SELECT
Observed behaviour under different SQL transaction isolation levels:
  • READ COMMITTED: fails. Nothing in this case prevents:
    • thread 1: SELECT, obtains i = 0
    • thread 2: SELECT, obtains i = 0
    • thread 2: newI = 1
    • thread 2: UPDATE i = 1
    • thread 1: newI = 1
    • thread 1: UPDATE i = 1
  • REPEATABLE READ: works. the manual mentions that if multiple concurrent updates would happen, only the first commit succeeds, and the following ones fail and rollback and retry, therefore preventing the loss of an update.
  • READ COMMITTED + SELECT FOR UPDATE: works. And does not do rollbacks, which probably makes it faster. With p 10 100, REPEATABLE READ was about 4.2s and READ COMMITTED + SELECT FOR UPDATE 3.2s on Lenovo ThinkPad P51 (2017).
    SELECT FOR UPDATE should be enough as mentioned at: www.postgresql.org/docs/13/explicit-locking.html#LOCKING-ROWS
    FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
A non-raw version of this example can be seen at: nodejs/sequelize/parallel_select_and_update.js.
This example contains a deterministic demo of when postgreSQL serialization failures may happen.
Tested on PostgreSQL 13.5.
In this example, posts have tags. When a post is deleted, we check to see if there are now any empty tags, and now we want to delete any empty tags that the post deletion may have created.
If we are creating and deleting posts concurrently, a naive implementation might wrongly delete the tags of a newly created post.
This could be due to a concurrency issue of the following types.
Failure case 1:
  • thread 2: delete old post
  • thread 2: find all tags with 0 posts. Finds tag0 from the deleted old post which is now empty.
  • thread 1: create new post, which we want to have tag tag0
  • thread 1: try to create a new tag tag0, but don't because it already exists, this is done using SQLite's INSERT OR IGNORE INTO or PostgreSQL's INSERT ... ON CONFLICT DO NOTHING
  • thread 1: assign tag0 to the new post by adding an entry to the join table
  • thread 2: delete all tags with 0 posts. It still sees from its previous search that tag0 is empty, and deletes it, which then cascades into the join table
which would result in the new post incorrectly not having the tag0.
Failure case 2:
  • thread 2: delete old post
  • thread 2: find all tags with 0 posts
  • thread 1: create new post
  • thread 1: try to create a new tag tag0, but don't because it already exists
  • thread 2: delete all tags with 0 posts. It still sees from its previous search that tag0 is empty, and deletes it
  • thread 1: assign tag0 to the new post
which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
Failure case 3:
  • thread 2: delete old post
  • thread 1: create new post, which we want to have tag tag0
  • thread 1: try to create a new tag tag0, and succeed because it wasn't present
  • thread 2: find all tags with 0 posts, finds the tag that was just created
  • thread 2: delete all tags with 0 posts, deleting the new tag
  • thread 1: assign tag0 to the new post
which leads to a foreign key failure, because the tag does not exist anymore when the assignment happens.
Sample executions:
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED': PostgreSQL, 9 tags, DELETE/CREATE the tag0 test tag 1000 times, use READ COMMITTED
    Execution often fails, although not always. The failure is always:
    error: insert or update on table "PostTag" violates foreign key constraint "PostTag_tagId_fkey"
    because the:
    INSERT INTO "PostTag"
    tries to insert a tag that was deleted in the other thread, as it didn't have any corresponding posts, so this is the foreign key failure.
    TODO: we've never managed to observe the failure case in which tag0 is deleted. Is it truly possible? And if not, by which guarantee?
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'READ COMMITTED' 'FOR UPDATE': do a SELECT ... FOR UPDATE before trying to INSERT.
    This is likely correct and the fastest correct method according to our quick benchmarking, about 20% faster than REPEATABLE READ.
    We are just now 100% sure it is corret becase we can't find out if the SELECT in the DELETE subquery could first select some rows, which are then locked by the tag creator, and only then locked by DELETE after selection. Or does it re-evaludate the SELECT even though it is in a subquery?
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'REPEATABLE READ': repeatable read
    We've never observed any failures with this level. This should likely fix the foreign key issue according to the PostgreSQL docs, since:
    • the DELETE "Post" commit cannot start to be seen only in the middle of the thread 1 transaction
    • and then if DELETE happened, the thread 1 transaction will detect it, ROLLBACK, and re-run. TODO how does it detect the need rollback? Is it because of the foreign key? It is very hard to be sure about this kind of thing, just can't find the information. Related: postgreSQL serialization failure.
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'SERIALIZABLE': serializable
  • node --unhandled-rejections=strict ./parallel_create_delete_empty_tag.js p 9 1000 'NONE': magic value, don't use any transaction. Can blow up of course, since even less restrictions than READ COMMITTED
All executions use 2 threads.
Some theoretical notes:
  • Failure case 3 is averted by a READ COMMITTED transaction, because thread 2 won't see the uncommitted tag that thread 1 created, and therefore won't be able to delete it
stackoverflow.com/questions/10935850/when-to-use-select-for-update from SELECT FOR UPDATE also talks about a similar example, and has relevant answers.
UPSERT is extremely handy, and reduces the number of find, check on server, update loops. But RETURNING is a fundamental part of that (to get the updated/existing) ID. Can't believe SQL hasn't standardized it yet as of 2022. But both SQLite and Postgres support it with similar syntax thankfully.
It is mind blowing that this is not possible... the only way to avoid ambiguity in JOINs with column name conflicts is to give aliases to each column...
An example where SELECT FOR UPDATE is a good solution to an use case can be seen at: nodejs/sequelize/raw/parallel_select_and_update.js.
A quick look at PostgreSQL's compliance notes: www.postgresql.org/docs/13/features.html shows the complete utter mess that this standard is. Multiple compliance levels that no one fully implements and optional features everywhere.
Create user for further logins without sudo askubuntu.com/questions/915585/how-to-login-mysql-shell-when-mysql-have-no-password/1325689#1325689:
sudo mysql -e "CREATE USER $USER"
Create test user with password:
sudo mysql -e 'CREATE USER user0 IDENTIFIED WITH mysql_native_password BY "a"'
sudo mysql -e 'GRANT ALL PRIVILEGES ON database_name.* TO "user0"'
and login as that user:
mysql -u user0 -p
Login with password given on the command line:
mysql -u user0 -pmypassword
The IDENTIFIED WITH mysql_native_password part is to overcome "Client does not support authentication protocol requested by server" when connecting from Node.js.
List users:
sudo mysql -e 'SELECT * FROM mysql.user'
View permissions for each user on each DB: serverfault.com/questions/263868/how-to-know-all-the-users-that-can-access-a-database-mysql
sudo mysql -e 'SELECT * FROM mysql.db'
List databases:
sudo mysql -e 'SHOW DATABASES'
Create database:
sudo mysql -e 'CREATE DATABASE mydb0'
Destroy database:
sudo mysql -e 'DROP DATABASE mydb0'
Show tables in database:
sudo mysql -e 'SHOW TABLES' mydb0
or:
sudo mysql -e 'SHOW TABLES FROM mydb0'
Dude's a legend. Sells company for a few million. Then forks the open source project next year. Love it.
PostgreSQL feels good.
Had a look at the source tree, and also felt good.
If Oracle is the Microsoft of database, Postgres is the Linux, and MySQL (or more precisely MariaDB) is the FreeBSD (i.e. the one that got delayed by legal issues). Except that their software licenses were accidentally swapped.
The only problem with Postgres is its name. PostgreSQL is so unpronounceable and so untypeable that you should just call it "Postgres" like everyone else.
On Ubuntu 20.10 PostgreSQL 12.6, login with psql on my default username without sudo fails with: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist
This is the one that worked on Ubuntu 21.04: stackoverflow.com/questions/11919391/postgresql-error-fatal-role-username-does-not-exist/38444152#38444152
sudo -u postgres createuser -s $(whoami)
createdb $(whoami)
Explanation:
  • sudo -u postgres uses the postgres user via peer authentication
  • -s in createuser -s: make it a superuser
  • createdb: TODO why do we have to create a table with the same name as the user? Otherwise login fails.
You can now run psql without any password. This works without password due to peer authentication,
sudo cat /etc/postgresql/12/main/pg_hba.conf
shows that peer authentication is available to all users apparently:
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
List users:
psql -c '\du'
output:
                                    List of roles
  Role name  |                         Attributes                         | Member of 
-------------+------------------------------------------------------------+-----------
 ciro        | Superuser, Create role, Create DB                          | {}
 owning_user |                                                            | {}
 postgres    | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
Delete user later on:
psql -c 'DROP USER username;'
Create a database:
createdb testdb0
Help toplevel:
help
Get help for Postgres commands such as \h and so on:
\?
List supported SQL commands:
\h
Show syntax for one type of command:
\h SELECT
List all databases:
psql -c '\l'
which shows:
    Name     |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-------------+----------+----------+-------------+-------------+-----------------------
 ciro        | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 postgres    | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 template0   | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
             |          |          |             |             | postgres=CTc/postgres
 template1   | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/postgres          +
             |          |          |             |             | postgres=CTc/postgres
 testdb0     | postgres | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
(6 rows)
Delete a database:
psql -c 'DROP DATABASE "testdb0";'
If you didn't give a database from the command line e.g.:
psql
you can do that afterwards with:
\c testdb0
Let's create a table and test that it is working:
psql testdb0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
List tables, no special tables:
psql testdb0 -c '\dt'
gives:
        List of relations
 Schema |  Name  | Type  | Owner
--------+--------+-------+-------
 public | table0 | table | ciro
(1 row)
View table schema: stackoverflow.com/questions/109325/postgresql-describe-table
psql testdb0 -c '\d+ table0'
output:
                                      Table "public.table0"
 Column |     Type      | Collation | Nullable | Default | Storage  | Stats target | Description 
--------+---------------+-----------+----------+---------+----------+--------------+-------------
 int0   | integer       |           |          |         | plain    |              | 
 char0  | character(16) |           |          |         | extended |              | 
Insert some data into it and get the data out:
psql testdb0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
psql testdb0 -c 'SELECT * FROM table0;'
output:
 int0 |      char0
------+------------------
    2 | two
    3 | three
    5 | five
    7 | seven
(4 rows)
Delete the table:
psql testdb0 -c 'DROP TABLE table0;'
In order to create a test user with password instead of peer authentication, let's create test user:
createuser -P user0
createdb user0
-P makes it prompt for the users password.
Alternatively, to create the password non-interactively stackoverflow.com/questions/42419559/postgres-createuser-with-password-from-terminal:
psql -c "create role NewRole with login password 'secret'"
Can't find a way using the createuser helper.
We can then login with that password with:
psql -U user0 -h localhost
which asks for the password we've just set, because the -h option turns off peer authentication, and turns off password authentication.
The password can be given non-interactively as shown at stackoverflow.com/questions/6405127/how-do-i-specify-a-password-to-psql-non-interactively with the PGPASSWORD environment variable:
PGPASSWORD=a psql -U user0 -h localhost
Now let's create a test database which user0 can access with an existing superuser account:
createdb user0db0
psql -c 'GRANT ALL PRIVILEGES ON DATABASE user0db0 TO user0'
We can check this permission with:
psql -c '\l'
which now contains:
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 user0db0  | ciro     | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =Tc/ciro             +
           |          |          |             |             | ciro=CTc/ciro        +
           |          |          |             |             | user0=CTc/ciro
The permission letters are explained at:
user0 can now do the usual table operations on that table:
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'CREATE TABLE table0 (int0 INT, char0 CHAR(16));'
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c "INSERT INTO table0 (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');"
PGPASSWORD=a psql -U user0 -h localhost user0db0 -c 'SELECT * FROM table0;'
Uses the name of the current Linux user to login without a password.
Ubuntu 21.10 has a certain default level of logging by default to:
/var/log/postgresql/postgresql-13-main.log
but it does not log everything, only/mostly errors it seems.
Setting:
log_statement = 'all'
under:
/etc/postgresql/13/main/postgresql.conf
and then restarting the server:
sudo service restart postgresql
just works.
When using SQL REPEATABLE READ isolation level and SQL SERIALIZABLE isolation level, concurrent transactions may fail with a serialization failure, and then you might need to retry them. You server code or your ORM must always account for that.
A good way to explore when it happens is to use the example
Often known simply as SQL Server, a terrible thing that makes it impossible to find portable SQL answers on Google! You just have to Google by specific SQL implementation unfortunately to find anything about the open source ones.
The minimalism, serverlessness/lack of temporary caches/lack of permission management, Hipp's religious obsession with efficiency, the use of their own pure Fossil version control[ref]. Wait, scrap that last one. Pure beauty!
Official Git mirror: github.com/sqlite/sqlite
Create a table
sqlite3 db.sqlite3 "
CREATE TABLE 'IntegerNames' (int0 INT, char0 CHAR(16));
INSERT INTO 'IntegerNames' (int0, char0) VALUES (2, 'two'), (3, 'three'), (5, 'five'), (7, 'seven');
"
List tables:
sqlite3 db.sqlite3 '.tables'
output:
IntegerNames
Show schema of a table:
sqlite3 db.sqlite3 '.schema IntegerNames'
outputs the query that would generate that table:
CREATE TABLE IF NOT EXISTS 'IntegerNames' (int0 INT, char0 CHAR(16));
Show all data in a table:
sqlite3 db.sqlite3 'SELECT * FROM IntegerNames'
output:
2|two
3|three
5|five
7|seven
____ The default isolation level for SQLite is SERIALIZABLE ____
It does not appear possible to achieve the other two levels besides SERIALIZABLE and READ UNCOMMITED
Includes its own copy of sqlite3, you don't use the system one, which is good to ensure compatibility. The version is shown at: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/common-sqlite.gypi#L3 SQLite source is tracked compressed in-tree: github.com/mapbox/node-sqlite3/blob/918052b538b0effe6c4a44c74a16b2749c08a0d2/deps/sqlite-autoconf-3360000.tar.gz horrendous. This explains why it takes forever to clone that repository. People who don't believe in git submodules, there's even an official Git mirror at: github.com/sqlite/sqlite
It appears to spawn its own threads via its C extension (since JavaScript is single threaded and and SQLite is not server-based), which allows for parallel queries using multiple threads: github.com/mapbox/node-sqlite3/blob/v5.0.2/src/threading.h
Hello world example: nodejs/node-sqlite3/index.js.
As of 2021, this had slumped back a bit, as maintainers got tired. Unmerged pull requests started piling more, and better-sqlite3 Node.js package started pulling ahead a little.
As claimed on their README, their operation truly appears to be 10x faster than the node-sqlite package!! It is insane!! How can that other package still exist at all?
The only big problem was the lack of ORM, but people are looking into that by adding it to Sequelize:
As a tree:
  • Root 1
    • Child 1.1
      • Child 1.1.1
      • Child 1.1.2
    • Child 1.2
      • Child 1.2.1
      • Child 1.2.2
As the sets:
 __________________________________________________________________________
|  Root 1                                                                  |
|   ________________________________    ________________________________   |
|  |  Child 1.1                     |  |  Child 1.2                     |  |
|  |   ___________    ___________   |  |   ___________    ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
|  |________________________________|  |________________________________|  |
|__________________________________________________________________________|

Ancestors