пятница, 6 января 2023 г.

Creating a simple foreign data wrapper for PostgreSQL

Today I'd like to describe how to create own minimal usable foreign data wrapper (FDW) for PostgreSQL.

Common information about FDW interfaces

I suppose you know C and have basic PostgreSQL DBA experience. Foreign data wrapper (FDW) is a mean of accesssing another data sources from PostgreSQL DBMS. In this article I'll walk you through writing minimal usable FDW (mufdw). Going through mufdw code, I'll show you how you can use PostgreSQL FDW API (its description you can find in the official documentation).

To access another data source you define FDW by specifying its name, validator function and handler function like this

  HANDLER mufdw_handler
  VALIDATOR mufdw_validator;
Both handler and validator are SQL functions. A validator is called when user mapping, foreign server or foreign table is created or modified to check if option is valid for the object. A handler just returns a struct, containing methods defining FDW (FdwRoutine). Most FDW will do foreign data wrapper creation behind the scenes when you create extension, so for administrator this could look like
Next you should define a foreign server, which basically specifies a remote server host in one or another way. For example, you can define host name, port and database name in foreign server options. Our sample mufdw allows to query another table in the same database. So server definition doesn't define any options and you can create server simply by
Now we have to define user mapping - how local user should access remote data source. Usually user mapping definition includes authentication information - like remote user name and password. As our minimal usable fdw will just wrap local tables and use the same user for scanning them (via SPI interface), our user mapping will be formal:
Let's create a plain table which will be used as a source for our foreign table.
CREATE TABLE players(id int primary key, nick text, score int);
INSERT INTO players SELECT i, 'nick_'||i, i*10 from generate_series(1,10) i;
Foreign table is actually an object which behaves mostly like normal table, but provides access to remote data. In foreign table definition we provide options, necessary to identify this data. In our case it's local table and schema name.
CREATE FOREIGN TABLE f_players (id int, nick text, score int) server loopback options (table_name 'players', schema_name 'public');
Now we can query data from our foreign table.
SELECT * FROM players ;
 id |  nick   | score 
  1 | nick_1  |    10
  2 | nick_2  |    20
  3 | nick_3  |    30
  4 | nick_4  |    40
  5 | nick_5  |    50
  6 | nick_6  |    60
  7 | nick_7  |    70
  8 | nick_8  |    80
  9 | nick_9  |    90
 10 | nick_10 |   100
(10 rows)

What happens when you query a foreign table? PostgreSQL planner firstly looks if a table used in query is a foreign table and looks for its foreign server FdwRoutine. Pointer to FdwRoutine is recorded in RelOptInfo structure, which is used in planner to represent relation. This work is done in get_relation_info() function. Firstly planner access our fdw methods when looks for relation size in set_rel_size() (which is called for each base relation in the begining of planning (make_one_rel()). set_foreign_size() calls GetForeignRelSize() to find out relation size after applying restriction clauses. When planner generate possible paths to access a base relation in set_rel_pathlist(), it calls GetForeignPaths() function from RelOptInfo->fdwroutine to generate foreign path. Access paths are used by planner to find out possible strategy to access relation. Usually for foreign tables there will be no another paths besides foreign path. For join relation there could be several paths - foreign join path and several local join paths (for example, using nestloop or hash join method). If foreign path is the best access path for a particular relation, GetForeignPlan() function from its fdwroutine will be called to generate ForeignScan plan in create_foreignscan_plan(). Executor executes ForeignScan plan. When it initializes foreign scan state, BeginForeignScan() is called. Later tuples are fetched as needed by executor in IterateForeignScan(). If it's necessary to restart foreign scan from begining, ReScanForeignScan() is called. When relation scan is ended, EndForeignScan() is called.

Mufdw implementation

Let's look at how the minimal set of functions needed by read-only FDW is implemented in mufdw.

The basic logic behind mufdw is simple. For each foreign table we provide table name and table schema as foreign table options. When user queries remote table we open SPI cursor and fetch data from local table, identified by these schema and name. Query text is saved in foreign scan private field during planning. In BeginForeignScan() we create cursor for later use. IterateForeignScan() fetches one tuple from it and saves in executor state node scan slot. Here node is a basic "class"-like hierarchical structures used in PostgreSQL source code.

GetForeignRelSize() function

void GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid)

This methods obtains relation size estimates for a foreign table. Here root is the planner information about the query, baserel - information about the table and foreigntableid is the Oid of the foreign table. In this function we have to fill in baserel->tuples and baserel->rows. baserel->fdw_private can be initialized and used for private fdw purposes.

In mufdw it is implemented as mufdwGetForeignRelSize(). mufdwGetForeignRelSize() basically allocates memory for relation fdw_private structure, searches in foreign table options for "table_schema" and "table_name" and saves them for further use. It also makes minimal effort to estimate relation size, but in fact it gives default estimations, as we don't gather any statistic for mufdw foreign table.

GetForeignPaths() function

void GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid)

This function should create foreign path and add it to baserel->pathlist. It's recommended to use create_foreignscan_path to build the ForeignPath. Arguments are the same.

In mufdw it is implemented as mufdwGetForeignPaths(). It creates basic foreign scan path and adds it to the relation path list. Cost is estimated as a sequential scan. These estimates are not accurate in any way.

GetForeignPlan() function

ForeignScan *GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid, ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan)

This function is called in the end of query planning and should create a ForeignScan plan node. It is passed foreign path, generated by GetForeignPaths(), GetForeignJoinPaths() or GetForeignUpperPaths(). We also get the target list, which should be emitted by plan node, restriction clauses to be enforced and outer subplan of the ForeignScan.

In mufdw it is implemented as mufdwGetForeignPlan(). We put all scan clauses in the plan node qual list for recheck as mufdw doesn't perform "remote" filtering. Function gets scan clauses as list of RestrinctInfo nodes, but should use expression list for plan quals. extract_actual_clauses() is used to perform this transformation. Also we construct simple "SELECT *" query to extract data from plain table. fdw_private foreignscan field should be a list of nodes, so that copyObject() could copy them. So we wrap our C string, representing query, into String node and create fdw_private as single-member list.

BeginForeignScan() function

void BeginForeignScan(ForeignScanState *node, int eflags)

This function is called to begin foreign scan execution. It can, for example, establish connection to DBMS. Note that function will also be called when running explain for a query (in that case (eflags & EXEC_FLAG_EXPLAIN_ONLY) is true). When called from explain, function should avoid doing any externally visible actions.

In mufdw it is implemented as mufdwBeginForeignScan(). Here we create cursor for the query, constructed during plan creation and initialize internal scan state with its name. Our local open_new_cursor() function opens cursor for query using SPI and returns name of corresponding portal. The issue here is that SPI context is transient, we can't use memory, allocated in this context, in other parts of program, so should copy it to our old context.

IterateForeignScan function

TupleTableSlot *IterateForeignScan(ForeignScanState *node)

This function is used to fetch one tuple and return it in ForeignScanState's tuple slot.

In mufdw it is implemented as mufdwIterateForeignScan(). Firstly we clear tuple slot, search for our SPI cursor and fetch one tuple from it. If there's no data, it will be enough. If there is some data, we deform SPI tuple to slot's array of values and nulls and call ExecStoreVirtualTuple() to store it. The only issue is that it should be materialized, as SPI tuple will be released when SPI context is destroyed.

ReScanForeignScan function

void ReScanForeignScan(ForeignScanState *node)

This function is used to restart scan from the begining.

In mufdw it is implemented as mufdwReScanForeignScan(). It just closes old cursor and opens new one for our query, stored in foreign scan internal state.

EndForeignScan function

void EndForeignScan(ForeignScanState *node)

This function should just end scan and release resources.

In mufdw it is an empty function. We don't explicitly release resources as they will be released when transaction ends.

fdw_handler function

fdw_handler is just a SQL function which should return pointer to FdwRoutine structure, which contains references to FDW functions, which we've just described.

In mufdw it's implemeted as mufdw_handler(). Only functions, necessary for qeurying foreign tables, are implemented.

fdw_validator function

fdw_validator checks options, used during CREATE or ALTER statements for foreign data wrapeprs, foreign servers, user mappings and foreign tables. It gets list of options as an array of text as the first argument and OID representing the type of object the options are associated with. OID can be ForeignDataWrapperRelationId, ForeignServerRelationId, UserMappingRelationId, or ForeignTableRelationId.

In mufdw it's implemented as mufdw_validator. It allows only "table_name" and "schema_name" options for foreign tables and doesn't check options for other objects.

SQL code

We need a bit of SQL code to glue this all together. We should create sql-level functions for validator and handler and create foreign data wrapper. For mufdw it's done in its extension script.


We looked at how you can create a simple read-only foreign data wrapper in C, analyzing mufdw sample foreign data wrapper. I hope, this was useful, or at least interesting.

четверг, 3 марта 2022 г.

Ukraine: Why it's happening

I'm not a politician or an analyst, but I wanted to share my point of view. I read BBC news and see nothing related to reality. However, FoxNews gave some adequate explanations, but I'm not sure they were heard. So, briefly, what's happening and why it's happening. Perhaps, at least my EU or US friends will read this. And as Russian citizen I'm not pretending to be objective.

  • Politics has always been dirty and cinic. Peace is only possible when there is a balance of forces in the region or when one state suppresses all other. In later case peace is usually not stable (we can recollect Iraq or Serbia here - tragedies there were only possible because one state considered itself (and was) a super-power). Moreover, peace in Europe for 70 yeears was possible only because major competitors in the Cold War had nuclear weapons and so had to tolerate each other.
  • A war is always terrible. Especially when it's a war with brother nation. And a lot of Russian people do have relatives and friends in Ukraine, so war with Ukraine is very painful for all sane Russians.
  • But this is not a war against Ukraine, although Ukraine suffers most from this war. It's a conflict with USA, NATO and EU.
  • This conflict has begun not on 24th or 21st of February, and even not in 2014. It began in 2008, when NATO targeted at incorporating Georgia and Ukraine.
  • Ukraine has chosen a side in this conflict not in 2022, but in 2013, when legal pro-Russian government was deposed. Pro-western government was established. Likely this government was popular that days. A course for EU integration and collaboration with NATO was chosen.
  • Russia, seeing this as a threat, secured Sebastopol Naval base by annexing Crimea (and local people appreciated this). The second action was to support separatists in Donbass. They were Russians, and we protected them, and this weakened now pro-western Ukraine. If you want to know more about how Ukraine treated them, you can look at the document linked below.
  • During last 6-7 years Russia has made attempts to arrange an agreement with US (and NATO) on the Ukranian crisis. Minsk agreements were signed in 2015, but were not respected by Ukraine or its partners.
  • Escalation of this crisis has begun in 2021. Not sure why. Likely during this period NATO started to ship war drones to Ukraine (besides other weapon). This made our government concerned. And instead of negotiations, we got several provocations - like UK warship in our waters. What would happened if our border guards flooded it? Don't want to consider. But the one sending it there was likely mad.
  • Presence of NATO forces in Georgia and Ukraine is considered an existencial threat for Russia. And our government has shown that it's ready to military actions if this possibility becomes more real in 2008 during crisis in Georgia.
  • NATO declares that Ukraine is not considered as a possible NATO member. But current world-wide support contradicts this. Ukraine de-facto is treated as NATO member.
  • It's clear that both sides (USA and Russia) have been preparing for this conflict for long (it seems to me, since last December).
  • The last straw (it appears) was the declaration of Ukraine president that his country will leave Budapest Memorandum. Besides all, Budapest Memorandum forbids Ukraine to have nuclear weapon.

Everyone can see what happened next. But don't consider that this war has begun in February. Everything is much deeper and harder. Absence of good will from NATO (and first of all, from USA) and irresponsible behavior of Ukranian government has lead to situation which could be avoided. In Russia we like to blame Stalin that he didn't prepare for the WWII. So, for Russian people this is the 21st June. A strike back just a day (or, likely, a decade) before NATO strike. This is not a liberation of Ukraine, but a war for safety of our borders. So in this situation we will not listen for calls for peace until Ukraine will not be a threat for Russia. Again, by no means, this is not a war against Ukranian people. This is a war agains their anti-russian elite and government. This is a war for peace near our borders. And as most Russian people see current Ukranian government an immediate threat to Russia, this war will continue until it reaches its objectives - destruction of nationalists in Ukraine. Once again, we didn't want to bring war to Ukranian civilians. It's a tragedy that they are suffering. It's a tragedy to see Ukranian cities damaged. I'm sure we'll win soon. But the sooner West stops supporting the current Ukraine government, the sooner peace will come to that land. And I hope, when everything ends, we'll help Ukranians to reconstruct their cities. But this will not ressurect dead.

For people, who want more context, I'm attaching a Russian Foreign ministry brochure - what was happening for 8 years in Ukraine and what Western countries didn't want to listen about.

вторник, 21 мая 2019 г.

One big Oracle ASM adventure

Well, this has happened long ago, but as I'm going to decomission http://dbseminar.r61.net, I think this information should be moved here...

Originally posted on on Tue, 04/07/2009

I had very unpleasant experience with RAC and ASM. I'd like to share it. Yesterday after VMware ESX Server upgrade (and maybe unsuccessfull live motion operations on one of two RAC nodes) I had the following records in asm instance alert log:
Mon Apr  6 15:48:52 2009
NOTE: cache registered group DATA number=1 incarn=0x3ad84292
NOTE: cache registered group FRA number=2 incarn=0x3b084293
NOTE: cache registered group LOGS number=3 incarn=0x3b084294
Mon Apr  6 15:48:52 2009
ERROR: no PST quorum in group 1: required 2, found 0
Mon Apr  6 15:48:52 2009
NOTE: cache dismounting group 1/0x3AD84292 (DATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Mon Apr  6 15:48:52 2009
ERROR: no PST quorum in group 2: required 2, found 0
Mon Apr  6 15:48:52 2009
NOTE: cache dismounting group 2/0x3B084293 (FRA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup FRA was not mounted
Mon Apr  6 15:48:52 2009
ERROR: no PST quorum in group 3: required 2, found 0
Mon Apr  6 15:48:52 2009
NOTE: cache dismounting group 3/0x3B084294 (LOGS)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup LOGS was not mounted

Diskgroups disappeared, and asm didn't want to see them. However, I could see /dev/sd* disks, where diskgroups were placed, and permissions were right: disks were owned by oracle user. select * from v$asm_diskgroup ; gave nothing and select path,header_status from v$asm_disk ; said that all disks were in provisioned state. It was awful, but in the end I've found sollution. I've noticed that kfed sees diskgroup and other attribute on the affected disks, but parameter kfdhdb.acdb.ub2spare is not 0 (as I found in Internet, it was its usual state). So I've dumped header info:
$ kfed read /dev/sdf > /tmp/sdf.noop.mod
$ vi /tmp/sdf.noop.mod  #Changed kfdhdb.acdb.ub2spare to 0
$ kfed op=write dev=/dev/sdf text=/tmp/sdf.noop.mod CHKSUM=YES
After this I could do "alter diskgroup fra mount" in asm instance, repeated this procedure for DATA asm diskgroup, after that I could mount diskgroups and recover my database. In conclusion I wish say that we are going to move our fra to OCFS2 fs the next weekend.

суббота, 20 октября 2018 г.

What is my sftp server doing?

Well, I'm not familiar with DTrace, but sometimes want to find, what some application is doing. In this case I wanted to monitor my sftp server. Luckily, most illumos distributions provide dtrace patch (coming from Oracle Solaris) to find this out. Unluckily, I haven't found any documentation on it, just source code. After reading Translators chapter of DTrace Guide and looking at /usr/lib/dtrace/sftp.d I've come to this:
dtrace -n 'sftp*:::transfer-done { printf ("%d: %s %s %s %d", pid, xlate <sftpinfo_t *>((sftpproto_t*)arg0)->sfi_pathname, xlate <sftpinfo_t *>((sftpproto_t*)arg0)->sfi_user, xlate <sftpinfo_t *>((sftpproto_t*)arg0)->sfi_operation, xlate <sftpinfo_t *>((sftpproto_t*)arg0)->sfi_nbytes  ); }'

dtrace: description 'sftp*:::transfer-done ' matched 8 probes
CPU     ID                    FUNCTION:NAME
  1  80412      process_read:transfer-done 7409: /export/home/user/1.pp user read 1808
  1  80412      process_read:transfer-done 7409: /export/home/user/1.pp user read 0
  1  80411     process_write:transfer-done 7409: /export/home/user/1.pp user write 1808
  1  80412      process_read:transfer-done 7409: /export/home/user/dtrace/poll.d user read 53
  1  80412      process_read:transfer-done 7409: /export/home/user/dtrace/poll.d user read 53

Seems rather interesting to me.

четверг, 30 августа 2018 г.

Quest: creating one hundred zones

Well, I need to create about one hundred zones once again. You could probably use ansible for this, but an old-fashioned man will do everything in shell. So: we have one "golden image" and have to create 100 zones like it. We could clone it, but with clones you receive wonderful issue - beadm activate fails in zone. So we create zones and do send/receive manually. This looks like this:
set -e

for i in $(seq 1 100); do 

    #Creating interface for the zone
    dladm create-vnic -l e1000g1 hnet$i

    #Creating initial config   

    cat > $TEMPFILE <<EOF
create -b
set zonepath=/zones/h$i
set autoboot=true
set ip-type=exclusive
add net
set physical=hnet$i
add capped-memory
set physical=2G
add rctl
set name=zone.max-swap
add value (priv=privileged,limit=2147483648,action=deny)
add rctl
set name=zone.max-locked-memory
add value (priv=privileged,limit=536870912,action=deny)

    zonecfg -z h$i -f $TEMPFILE
    zfs send -R data/zones/h0@initial | zfs recv -F data/zones/h$i
    # Zone tools should know that zone is in installed state, not configured
    # Also during installation zoneadm assigns uuid to zone (last field). We do this manually.
    gsed -i  -e "/^h${i}:/ s/\$/${uuid}/" -e "/^h${i}:/ s/configured/installed/" /etc/zones/index
    zoneadm -z h$i mount

    # We known that golden image ip address  ends in 254 and change it
    sed -i -e "s:hnet0:hnet$i:g" -e "s:\.254:.$addr:g" /zones/h$i/root/etc/ipadm/ipadm.conf
    zoneadm -z h$i unmount
    zfs destroy data/zones/h$i@initial
    rm $TEMPFILE
    zoneadm -z h$i boot

суббота, 10 февраля 2018 г.