|
|
| · · · · · · · | |
pgAdmin 1.4 online documentation32.9. C-Language FunctionsUser-defined functions can be written in C (or a language that can be made compatible with C, such as C++). Such functions are compiled into dynamically loadable objects (also called shared libraries) and are loaded by the server on demand. The dynamic loading feature is what distinguishes “C language” functions from “internal” functions [mdash ] the actual coding conventions are essentially the same for both. (Hence, the standard internal function library is a rich source of coding examples for user-defined C functions.) Two different calling conventions are currently used for C functions.
The newer “version 1” calling convention is indicated by writing
a 32.9.1. Dynamic Loading The first time a user-defined function in a particular
loadable object file is called in a session,
the dynamic loader loads that object file into memory so that the
function can be called. The The following algorithm is used to locate the shared object file
based on the name given in the
If this sequence does not work, the platform-specific shared
library file name extension (often The user ID the PostgreSQL server runs as must be able to traverse the path to the file you intend to load. Making the file or a higher-level directory not readable and/or not executable by the postgres user is a common mistake. In any case, the file name that is given in the
Note PostgreSQL will not compile a C function
automatically. The object file must be compiled before it is referenced
in a After it is used for the first time, a dynamically loaded object file is retained in memory. Future calls in the same session to the function(s) in that file will only incur the small overhead of a symbol table lookup. If you need to force a reload of an object file, for example after recompiling it, use the LOAD command or begin a fresh session. It is recommended to locate shared libraries either relative to
Before PostgreSQL release 7.2, only
exact absolute paths to object files could be specified in
32.9.2. Base Types in C-Language FunctionsTo know how to write C-language functions, you need to know how PostgreSQL internally represents base data types and how they can be passed to and from functions. Internally, PostgreSQL regards a base type as a “blob of memory”. The user-defined functions that you define over a type in turn define the way that PostgreSQL can operate on it. That is, PostgreSQL will only store and retrieve the data from disk and use your user-defined functions to input, process, and output the data. Base types can have one of three internal formats:
By-value types can only be 1, 2, or 4 bytes in length
(also 8 bytes, if /* 4-byte integer, passed by value */ typedef int int4;
On the other hand, fixed-length types of any size may be passed by-reference. For example, here is a sample implementation of a PostgreSQL type: /* 16-byte structure, passed by reference */
typedef struct
{
double x, y;
} Point;
Only pointers to such types can be used when passing
them in and out of PostgreSQL functions.
To return a value of such a type, allocate the right amount of
memory with Finally, all variable-length types must also be passed by reference. All variable-length types must begin with a length field of exactly 4 bytes, and all data to be stored within that type must be located in the memory immediately following that length field. The length field contains the total length of the structure, that is, it includes the size of the length field itself. WarningNever modify the contents of a pass-by-reference input value. If you do so you are likely to corrupt on-disk data, since the pointer you are given may well point directly into a disk buffer. The sole exception to this rule is explained in Section 32.10, “User-Defined Aggregates”. As an example, we can define the type typedef struct {
int4 length;
char data[1];
} text;
Obviously, the data field declared here is not long enough to hold all possible strings. Since it's impossible to declare a variable-size structure in C, we rely on the knowledge that the C compiler won't range-check array subscripts. We just allocate the necessary amount of space and then access the array as if it were declared the right length. (This is a common trick, which you can read about in many textbooks about C.) When manipulating
variable-length types, we must be careful to allocate
the correct amount of memory and set the length field correctly.
For example, if we wanted to store 40 bytes in a #include "postgres.h" ... char buffer[40]; /* our source data */ ... text *destination = (text *) palloc(VARHDRSZ + 40); destination->length = VARHDRSZ + 40; memcpy(destination->data, buffer, 40); ...
Table 32.1, “Equivalent C Types for Built-In SQL Types” specifies which C type
corresponds to which SQL type when writing a C-language function
that uses a built-in type of PostgreSQL.
The “Defined In” column gives the header file that
needs to be included to get the type definition. (The actual
definition may be in a different file that is included by the
listed file. It is recommended that users stick to the defined
interface.) Note that you should always include
Table 32.1. Equivalent C Types for Built-In SQL Types
Now that we've gone over all of the possible structures for base types, we can show some examples of real functions. 32.9.3. Calling Conventions Version 0 for C-Language FunctionsWe present the “old style” calling convention first [mdash ] although this approach is now deprecated, it's easier to get a handle on initially. In the version-0 method, the arguments and result of the C function are just declared in normal C style, but being careful to use the C representation of each SQL data type as shown above. Here are some examples: #include "postgres.h"
#include <string.h>
/* by value */
int
add_one(int arg)
{
return arg + 1;
}
/* by reference, fixed length */
float8 *
add_one_float8(float8 *arg)
{
float8 *result = (float8 *) palloc(sizeof(float8));
*result = *arg + 1.0;
return result;
}
Point *
makepoint(Point *pointx, Point *pointy)
{
Point *new_point = (Point *) palloc(sizeof(Point));
new_point->x = pointx->x;
new_point->y = pointy->y;
return new_point;
}
/* by reference, variable length */
text *
copytext(text *t)
{
/*
* VARSIZE is the total size of the struct in bytes.
*/
text *new_t = (text *) palloc(VARSIZE(t));
VARATT_SIZEP(new_t) = VARSIZE(t);
/*
* VARDATA is a pointer to the data region of the struct.
*/
memcpy((void *) VARDATA(new_t), /* destination */
(void *) VARDATA(t), /* source */
VARSIZE(t)-VARHDRSZ); /* how many bytes */
return new_t;
}
text *
concat_text(text *arg1, text *arg2)
{
int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
text *new_text = (text *) palloc(new_text_size);
VARATT_SIZEP(new_text) = new_text_size;
memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
return new_text;
}
Supposing that the above code has been prepared in file
CREATE FUNCTION add_one(integer) RETURNS integer
AS '
Here, Notice that we have specified the functions as “strict”, meaning that the system should automatically assume a null result if any input value is null. By doing this, we avoid having to check for null inputs in the function code. Without this, we'd have to check for null values explicitly, by checking for a null pointer for each pass-by-reference argument. (For pass-by-value arguments, we don't even have a way to check!) Although this calling convention is simple to use,
it is not very portable; on some architectures there are problems
with passing data types that are smaller than 32.9.4. Calling Conventions Version 1 for C-Language FunctionsThe version-1 calling convention relies on macros to suppress most of the complexity of passing arguments and results. The C declaration of a version-1 function is always Datum funcname(PG_FUNCTION_ARGS) In addition, the macro call PG_FUNCTION_INFO_V1(funcname);
must appear in the same source file. (Conventionally. it's
written just before the function itself.) This macro call is not
needed for In a version-1 function, each actual argument is fetched using a
Here we show the same functions as above, coded in version-1 style: #include "postgres.h"
#include <string.h>
#include "fmgr.h"
/* by value */
PG_FUNCTION_INFO_V1(add_one);
Datum
add_one(PG_FUNCTION_ARGS)
{
int32 arg = PG_GETARG_INT32(0);
PG_RETURN_INT32(arg + 1);
}
/* by reference, fixed length */
PG_FUNCTION_INFO_V1(add_one_float8);
Datum
add_one_float8(PG_FUNCTION_ARGS)
{
/* The macros for FLOAT8 hide its pass-by-reference nature. */
float8 arg = PG_GETARG_FLOAT8(0);
PG_RETURN_FLOAT8(arg + 1.0);
}
PG_FUNCTION_INFO_V1(makepoint);
Datum
makepoint(PG_FUNCTION_ARGS)
{
/* Here, the pass-by-reference nature of Point is not hidden. */
Point *pointx = PG_GETARG_POINT_P(0);
Point *pointy = PG_GETARG_POINT_P(1);
Point *new_point = (Point *) palloc(sizeof(Point));
new_point->x = pointx->x;
new_point->y = pointy->y;
PG_RETURN_POINT_P(new_point);
}
/* by reference, variable length */
PG_FUNCTION_INFO_V1(copytext);
Datum
copytext(PG_FUNCTION_ARGS)
{
text *t = PG_GETARG_TEXT_P(0);
/*
* VARSIZE is the total size of the struct in bytes.
*/
text *new_t = (text *) palloc(VARSIZE(t));
VARATT_SIZEP(new_t) = VARSIZE(t);
/*
* VARDATA is a pointer to the data region of the struct.
*/
memcpy((void *) VARDATA(new_t), /* destination */
(void *) VARDATA(t), /* source */
VARSIZE(t)-VARHDRSZ); /* how many bytes */
PG_RETURN_TEXT_P(new_t);
}
PG_FUNCTION_INFO_V1(concat_text);
Datum
concat_text(PG_FUNCTION_ARGS)
{
text *arg1 = PG_GETARG_TEXT_P(0);
text *arg2 = PG_GETARG_TEXT_P(1);
int32 new_text_size = VARSIZE(arg1) + VARSIZE(arg2) - VARHDRSZ;
text *new_text = (text *) palloc(new_text_size);
VARATT_SIZEP(new_text) = new_text_size;
memcpy(VARDATA(new_text), VARDATA(arg1), VARSIZE(arg1)-VARHDRSZ);
memcpy(VARDATA(new_text) + (VARSIZE(arg1)-VARHDRSZ),
VARDATA(arg2), VARSIZE(arg2)-VARHDRSZ);
PG_RETURN_TEXT_P(new_text);
}
The At first glance, the version-1 coding conventions may appear to
be just pointless obscurantism. They do, however, offer a number
of improvements, because the macros can hide unnecessary detail.
An example is that in coding One big improvement in version-1 functions is better handling of null
inputs and results. The macro Other options provided in the new-style interface are two
variants of the
Finally, the version-1 function call conventions make it possible
to return set results (Section 32.9.10, “Returning Sets from C-Language Functions”) and
implement trigger functions (Chapter 33, Triggers) and
procedural-language call handlers (Chapter 46, Writing A Procedural Language Handler). Version-1 code is also more
portable than version-0, because it does not break restrictions
on function call protocol in the C standard. For more details
see 32.9.5. Writing CodeBefore we turn to the more advanced topics, we should discuss some coding rules for PostgreSQL C-language functions. While it may be possible to load functions written in languages other than C into PostgreSQL, this is usually difficult (when it is possible at all) because other languages, such as C++, FORTRAN, or Pascal often do not follow the same calling convention as C. That is, other languages do not pass argument and return values between functions in the same way. For this reason, we will assume that your C-language functions are actually written in C. The basic rules for writing and building C functions are as follows:
32.9.6. Compiling and Linking Dynamically-Loaded FunctionsBefore you are able to use your PostgreSQL extension functions written in C, they must be compiled and linked in a special way to produce a file that can be dynamically loaded by the server. To be precise, a shared library needs to be created. For information beyond what is contained in this section
you should read the documentation of your
operating system, in particular the manual pages for the C compiler,
Creating shared libraries is generally analogous to linking executables: first the source files are compiled into object files, then the object files are linked together. The object files need to be created as position-independent code (PIC), which conceptually means that they can be placed at an arbitrary location in memory when they are loaded by the executable. (Object files intended for executables are usually not compiled that way.) The command to link a shared library contains special flags to distinguish it from linking an executable (at least in theory [mdash ] on some systems the practice is much uglier). In the following examples we assume that your source code is in a
file
TipIf this is too complicated for you, you should consider using GNU Libtool, which hides the platform differences behind a uniform interface. The resulting shared library file can then be loaded into
PostgreSQL. When specifying the file name
to the Refer back to Section 32.9.1, “Dynamic Loading” about where the server expects to find the shared library files. 32.9.7. Extension Building InfrastructureIf you are thinking about distributing your PostgreSQL extension modules, setting up a portable build system for them can be fairly difficult. Therefore the PostgreSQL installation provides a build infrastructure for extensions, called PGXS, so that simple extension modules can be built simply against an already installed server. Note that this infrastructure is not intended to be a universal build system framework that can be used to build all software interfacing to PostgreSQL; it simply automates common build rules for simple server extension modules. For more complicated packages, you need to write your own build system. To use the infrastructure for your extension, you must write a
simple makefile. In that makefile, you need to set some variables
and finally include the global PGXS makefile.
Here is an example that builds an extension module named
MODULES = isbn_issn DATA_built = isbn_issn.sql DOCS = README.isbn_issn PGXS := $(shell pg_config --pgxs) include $(PGXS) The last two lines should always be the same. Earlier in the file, you assign variables or add custom make rules. The following variables can be set:
or at most one of these two:
The following can also be set:
Put this makefile as 32.9.8. Composite-Type Arguments in C-Language FunctionsComposite types do not have a fixed layout like C structures. Instances of a composite type may contain null fields. In addition, composite types that are part of an inheritance hierarchy may have different fields than other members of the same inheritance hierarchy. Therefore, PostgreSQL provides a function interface for accessing fields of composite types from C. Suppose we want to write a function to answer the query SELECT name, c_overpaid(emp, 1500) AS overpaid
FROM emp
WHERE name = 'Bill' OR name = 'Sam';
Using call conventions version 0, we can define
#include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */
bool
c_overpaid(HeapTupleHeader t, /* the current row of emp */
int32 limit)
{
bool isnull;
int32 salary;
salary = DatumGetInt32(GetAttributeByName(t, "salary", &isnull));
if (isnull)
return false;
return salary > limit;
}
In version-1 coding, the above would look like this: #include "postgres.h"
#include "executor/executor.h" /* for GetAttributeByName() */
PG_FUNCTION_INFO_V1(c_overpaid);
Datum
c_overpaid(PG_FUNCTION_ARGS)
{
HeapTupleHeader t = PG_GETARG_HEAPTUPLEHEADER(0);
int32 limit = PG_GETARG_INT32(1);
bool isnull;
Datum salary;
salary = GetAttributeByName(t, "salary", &isnull);
if (isnull)
PG_RETURN_BOOL(false);
/* Alternatively, we might prefer to do PG_RETURN_NULL() for null salary. */
PG_RETURN_BOOL(DatumGetInt32(salary) > limit);
}
There is also The following command declares the function
CREATE FUNCTION c_overpaid(emp, integer) RETURNS boolean
AS '
Notice we have used 32.9.9. Returning Rows (Composite Types) from C-Language FunctionsTo return a row or composite-type value from a C-language function, you can use a special API that provides macros and functions to hide most of the complexity of building composite data types. To use this API, the source file must include: #include "funcapi.h"
There are two ways you can build a composite data value (henceforth
a “tuple”): you can build it from an array of Datum values,
or from an array of C strings that can be passed to the input
conversion functions of the tuple's column data types. In either
case, you first need to obtain or construct a Several helper functions are available for setting up the needed
TypeFuncClass get_call_result_type(FunctionCallInfo fcinfo,
Oid *resultTypeId,
TupleDesc *resultTupleDesc)
passing the same Tip Note Older, now-deprecated functions for obtaining
TupleDesc RelationNameGetTupleDesc(const char *relname)
to get a TupleDesc TypeGetTupleDesc(Oid typeoid, List *colaliases)
to get a Once you have a TupleDesc BlessTupleDesc(TupleDesc tupdesc) if you plan to work with Datums, or AttInMetadata *TupleDescGetAttInMetadata(TupleDesc tupdesc)
if you plan to work with C strings. If you are writing a function
returning set, you can save the results of these functions in the
When working with Datums, use HeapTuple heap_form_tuple(TupleDesc tupdesc, Datum *values, bool *isnull)
to build a When working with C strings, use HeapTuple BuildTupleFromCStrings(AttInMetadata *attinmeta, char **values)
to build a Once you have built a tuple to return from your function, it
must be converted into a HeapTupleGetDatum(HeapTuple tuple)
to convert a An example appears in the next section. 32.9.10. Returning Sets from C-Language Functions There is also a special API that provides support for returning
sets (multiple rows) from a C-language function. A set-returning
function must follow the version-1 calling conventions. Also,
source files must include A set-returning function (SRF) is called
once for each item it returns. The SRF must
therefore save enough state to remember what it was doing and
return the next item on each call.
The structure typedef struct
{
/*
* Number of times we've been called before
*
* call_cntr is initialized to 0 for you by SRF_FIRSTCALL_INIT(), and
* incremented for you every time SRF_RETURN_NEXT() is called.
*/
uint32 call_cntr;
/*
* OPTIONAL maximum number of calls
*
* max_calls is here for convenience only and setting it is optional.
* If not set, you must provide alternative means to know when the
* function is done.
*/
uint32 max_calls;
/*
* OPTIONAL pointer to result slot
*
* This is obsolete and only present for backwards compatibility, viz,
* user-defined SRFs that use the deprecated TupleDescGetSlot().
*/
TupleTableSlot *slot;
/*
* OPTIONAL pointer to miscellaneous user-provided context information
*
* user_fctx is for use as a pointer to your own data to retain
* arbitrary context information between calls of your function.
*/
void *user_fctx;
/*
* OPTIONAL pointer to struct containing attribute type input metadata
*
* attinmeta is for use when returning tuples (i.e., composite data types)
* and is not used when returning base data types. It is only needed
* if you intend to use BuildTupleFromCStrings() to create the return
* tuple.
*/
AttInMetadata *attinmeta;
/*
* memory context used for structures that must live for multiple calls
*
* multi_call_memory_ctx is set by SRF_FIRSTCALL_INIT() for you, and used
* by SRF_RETURN_DONE() for cleanup. It is the most appropriate memory
* context for any memory that is to be reused across multiple calls
* of the SRF.
*/
MemoryContext multi_call_memory_ctx;
/*
* OPTIONAL pointer to struct containing tuple description
*
* tuple_desc is for use when returning tuples (i.e. composite data types)
* and is only needed if you are going to build the tuples with
* heap_form_tuple() rather than with BuildTupleFromCStrings(). Note that
* the TupleDesc pointer stored here should usually have been run through
* BlessTupleDesc() first.
*/
TupleDesc tuple_desc;
} FuncCallContext;
An SRF uses several functions and macros that
automatically manipulate the SRF_IS_FIRSTCALL() to determine if your function is being called for the first or a subsequent time. On the first call (only) use SRF_FIRSTCALL_INIT()
to initialize the SRF_PERCALL_SETUP()
to properly set up for using the If your function has data to return, use SRF_RETURN_NEXT(funcctx, result)
to return it to the caller. ( SRF_RETURN_DONE(funcctx) to clean up and end the SRF. The memory context that is current when the SRF is called is
a transient context that will be cleared between calls. This means
that you do not need to call A complete pseudo-code example looks like the following: Datum
my_set_returning_function(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
Datum result;
MemoryContext oldcontext;
A complete example of a simple SRF returning a composite type looks like: PG_FUNCTION_INFO_V1(retcomposite);
Datum
retcomposite(PG_FUNCTION_ARGS)
{
FuncCallContext *funcctx;
int call_cntr;
int max_calls;
TupleDesc tupdesc;
AttInMetadata *attinmeta;
/* stuff done only on the first call of the function */
if (SRF_IS_FIRSTCALL())
{
MemoryContext oldcontext;
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
/* switch to memory context appropriate for multiple function calls */
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
/* total number of tuples to be returned */
funcctx->max_calls = PG_GETARG_UINT32(0);
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("function returning record called in context "
"that cannot accept type record")));
/*
* generate attribute metadata needed later to produce tuples from raw
* C strings
*/
attinmeta = TupleDescGetAttInMetadata(tupdesc);
funcctx->attinmeta = attinmeta;
MemoryContextSwitchTo(oldcontext);
}
/* stuff done on every call of the function */
funcctx = SRF_PERCALL_SETUP();
call_cntr = funcctx->call_cntr;
max_calls = funcctx->max_calls;
attinmeta = funcctx->attinmeta;
if (call_cntr < max_calls) /* do when there is more left to send */
{
char **values;
HeapTuple tuple;
Datum result;
/*
* Prepare a values array for building the returned tuple.
* This should be an array of C strings which will
* be processed later by the type input functions.
*/
values = (char **) palloc(3 * sizeof(char *));
values[0] = (char *) palloc(16 * sizeof(char));
values[1] = (char *) palloc(16 * sizeof(char));
values[2] = (char *) palloc(16 * sizeof(char));
snprintf(values[0], 16, "%d", 1 * PG_GETARG_INT32(1));
snprintf(values[1], 16, "%d", 2 * PG_GETARG_INT32(1));
snprintf(values[2], 16, "%d", 3 * PG_GETARG_INT32(1));
/* build a tuple */
tuple = BuildTupleFromCStrings(attinmeta, values);
/* make the tuple into a datum */
result = HeapTupleGetDatum(tuple);
/* clean up (this is not really necessary) */
pfree(values[0]);
pfree(values[1]);
pfree(values[2]);
pfree(values);
SRF_RETURN_NEXT(funcctx, result);
}
else /* do when there is no more left */
{
SRF_RETURN_DONE(funcctx);
}
}
One way to declare this function in SQL is: CREATE TYPE __retcomposite AS (f1 integer, f2 integer, f3 integer);
CREATE OR REPLACE FUNCTION retcomposite(integer, integer)
RETURNS SETOF __retcomposite
AS '
A different way is to use OUT parameters: CREATE OR REPLACE FUNCTION retcomposite(IN integer, IN integer,
OUT f1 integer, OUT f2 integer, OUT f3 integer)
RETURNS SETOF record
AS '
Notice that in this method the output type of the function is formally
an anonymous The directory 32.9.11. Polymorphic Arguments and Return Types C-language functions may be declared to accept and
return the polymorphic types
For example, suppose we want to write a function to accept a single element of any type, and return a one-dimensional array of that type: PG_FUNCTION_INFO_V1(make_array);
Datum
make_array(PG_FUNCTION_ARGS)
{
ArrayType *result;
Oid element_type = get_fn_expr_argtype(fcinfo->flinfo, 0);
Datum element;
int16 typlen;
bool typbyval;
char typalign;
int ndims;
int dims[MAXDIM];
int lbs[MAXDIM];
if (!OidIsValid(element_type))
elog(ERROR, "could not determine data type of input");
/* get the provided element */
element = PG_GETARG_DATUM(0);
/* we have one dimension */
ndims = 1;
/* and one element */
dims[0] = 1;
/* and lower bound is 1 */
lbs[0] = 1;
/* get required info about the element type */
get_typlenbyvalalign(element_type, &typlen, &typbyval, &typalign);
/* now build the array */
result = construct_md_array(&element, ndims, dims, lbs,
element_type, typlen, typbyval, typalign);
PG_RETURN_ARRAYTYPE_P(result);
}
The following command declares the function
CREATE FUNCTION make_array(anyelement) RETURNS anyarray
AS '
Note the use of |