GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
AWKA-ELM(5) AWKA EXTENDED LIBRARY METHODS AWKA-ELM(5)

awka-elm - Awka Extended Library Methods

Awka is a translator of AWK programs to ANSI-C code, and a library (libawka.a) against which the code is linked to create executables. Awka is described in the awka manpage.

The Extended Library Methods (ELM) provide a way of adding new functions to the AWK language, so that they appear in your AWK code as if they were builtin functions such as substr() or index().

ELM code interfaces with the internal Awka variable structures and functions, and is suitable for anyone with some experience and proficiency in C programming.

This document is a step-by-step introduction to how the ELM works, so by the end of it you can write your own libraries to extend the AWK programming language using Awka. For example, you could write an interface to allow AWK programs to communicate with ODBC databases, or solve the travelling salesman problem given input of town locations - whatever you require AWK to do should now be possible.

The C code produced by awka from AWK programs is heavily populated with calls to functions in the awka library (libawka). Hence after it is compiled, this code must be linked to the library to produce a working executable.

When parsing an AWK program, awka checks to see if each function call in the program is (a) a core builtin function, (b) a call to a user-defined AWK function in the program, or (c) a call to one of the extended builtin functions. The above order of priority is applied, so a user-defined function (b) overrides (c), and (a) overrides (b) to avoid conflicts.

If none of these prove to be true, the function call is written in the code in the format of a user-defined function, even though that function doesn't exist to its knowledge. Awka is assuming that by link time you will provide another object file or library that contains the missing function and resolve the call.

So if I pass awka the following code:

BEGIN { print mymath(3,4) }

The call it generates will look like this...

mymath_fn(awka_arg2(a_TEMP, _litd0_awka, _litd1_awka))

So all we need to do is write the mymath_fn() function, and link it with the awka-generated code, and bingo! AWK has been extended by you, to do what you want. And the only restrictions on what a function like mymath_fn() might do are those imposed by the C language!

So, you write the function, compile it into a library, use it in your AWK program, translate it, link it in, and you're away - its that simple (fingers crossed).

Ok, the first thing to notice is that the function name in the AWK code, mymath, has been appended with _fn in the C code. This happens with all unresolved AWK function calls (also with user-defined function names, but that doesn't matter here). It's done to avoid unintentional conflicts with functions in other libraries.

The definition of any function is this:-


funcname_fn( a_VARARG * )

Ugh! What's this a_VARARG thingy? Yes, learned reader, the time has come to get acquainted with the dreaded Awka data structures. Well they're pretty simple actually. The two you need to know about are a_VAR and a_VARARG, and as the latter contains arrays of the former, I'll deal with a_VAR first.


The a_VAR Structure

typedef struct {
  double dval;          /* the variable's numeric value */
  char * ptr;           /* pointer to string, array or RE structure */
  unsigned int slen;    /* length of string ptr as per strlen */
  unsigned int allc;    /* space mallocated for string ptr */
  char type;            /* records current cast of variable */
  char type2;           /* special flag for dual-type variables */
  char temp;            /* TRUE if a temporary variable */
} a_VAR;

These are used prolifically throughout the AWK library, and are at the heart of how it manipulates data. Remember, AWK variables are essentially typeless, as they can be cast to number, string or regular expression at your whim throughout a program. The only thing you can't cast to & from is arrays, as a variable is only either an array or a scalar (the other types).

Recall our mymath example earlier. In the AWK code, we had "mymath(3,4)", but the C code was "mymath_fn(awka_arg2(a_TEMP, _litd0_awka, _litd1_awka))".

The numeric value of 3 has been changed to _litd0_awka, and 4 to _litd1_awka. If you run awka with this example program & examine the output, you'll see that both _litd0_awka and _litd1_awka are pointers to a_VAR structures, and each has been set to the appropriate numeric values. Hence, all data passed to our functions will be embodied inside a_VAR's.

Confused? Yes? No? Take heart, it doesn't get much worse, and with a few more examples I hope things should be clearer. Looking at the call to mymath_fn above, you'll notice a call to awka_arg2(). Remember that mymath_fn only takes a pointer to an a_VARARG, so awka_arg2() obviously returns one of these.

What an a_VARARG contains is an array of a_VARs, and an integer showing how many there are in the array - thats all! Don't believe me? Then here's the structure in all its glory:


The a_VARARG Structure

typedef struct {
  a_VAR *var[256];
  int used;
} a_VARARG;

The a_VARARG structure gives us an easy means of passing around flexible numbers of a_VARS to functions, much as you'd use vararg in a C program. If you don't know what vararg does and have some time, check the stdarg manpage.

So, to conclude, awka_arg2() takes two a_VARs and packages them nicely into an a_VARARG to make life easy for our function. Another thing to note - the a_VARARG function allows up to 256 arguments. No parameters, only arguments, and they always win them! Sorry, on with the serious stuff...

So when we come to write mymath_fn, what type of thing should it contain? Ok, lets assume we want mymath to add the two numbers it receives as arguments, then add on the two numbers multiplied, and return the result, ie. (n1+n2)+n1*n2.

Well, here goes...

#include <libawka.h> 
a_VAR * 
mymath_fn( a_VARARG *va )
{ 
  a_VAR *ret = NULL;
  if (va->used < 2)
    awka_error("function mymath expecting 2 arguments, only got %d.\n",va->used);
  ret = awka_getdoublevar(FALSE);
  ret->dval = (awka_getd(va->var[0]) + awka_getd(va->var[1])) + 
                  va->var[0]->dval * va->var[1]->dval;
  return ret;
}

Ok, there's not a lot to it, so lets start at the top. You need to include libawka.h, as it defines the data structures plus the whole Awka API that you'll be calling.

The definition of mymath_fn is as described earlier. It will need to return a numeric value, but as we're in AWK (conceptually), this will need to be enclosed in an a_VAR, hence the existence of ret.

The incoming a_VARARG can contain any number of a_VAR's - we only care about the first two, so we check to see whether these exist, and if not spit an error through the awka_error function (or you could use your own error handler). When writing your own functions, you'll need to remember that any number of arguments could be passed in, and they could be of any type, so you'll need to check them.

So far, ret is NULL, so we need to create a structure to point it to. Better than that, we call awka_getdoublevar(), which gets us a temporary variable, already initialised to contain a numeric value. You guessed it, there's an awka_getstringvar() that we could use if our function was to return a string. The value of FALSE passed to awka_getdoublevar() means that we don't want to be responsible for freeing this structure, but prefer to leave it to libawka's internal garbage collection. I can't see any reason why you'd choose TRUE, but its there just in case.

The next 2 lines do the core stuff. Ok, ret->dval is set, that makes sense. The expression refers to the contents of the a_VARARG->a_VAR array, again this is expected. At first, though, it calls awka_getd() for each of the arguments, but on the next line it references the dval value directly. Why the calls to awka_getd?

Because it can't be sure that the incoming variables are already cast to numbers, so these functions (actually macros) do the casting for us, and return the value of dval after the cast is done. Subsequently, we can look at dval directly as we know its been set to the current numerical value of the variable.

Lastly, we return ret.

Alright, let's get this working. Follow these steps:

  1. Create mymath.c with mymath_fn(), exactly as its written above. 

2. Create mymath.h containing: a_VAR * mymath_fn( a_VARARG *va );
3. gcc -c mymath.c (or use whatever C compiler you have).
4. awka -i mymath.h 'BEGIN { print mymath(3,4) }' >test.c
5. gcc -I. test.c mymath.o -lawka -lm -o mytest
6. mytest

The output from running mytest should be 19. Magic!

A more comprehensive example is the awkatk library available from the awka website. Hopefully you'll find it helpful, and who knows, you may even use it to write GUI interfaces from AWK!

Obviously, this is intended to extend the limits of the AWK universe, as you could introduce any functionality written in C as a new builtin function within AWK.

There may be complex functions you've written in AWK and use all the time that are just plain inefficient, even using Awka. They're stable, you have the skill to implement them in C, so now you can, and your AWK programs become shorter in the process. It's no longer a choice of C or AWK, now you can migrate sections to C as & when you like.

There are many functions in standard C libraries that AWK doesn't have. Things like strcasecmp(), fread(), cbrt(), and so on. Now you can implement them.

Lastly, I'd love to see Awka have functions to read & write proprietary formats like MS Excel, to communicate with ODBC databases, to perform complex mathematical or scientific operations, to implement true multi-dimensional arrays, to provide Fast Fourier Transform functions - I know its possible. If you do develop something neat like this, it'd be very cool if you were to make it available for everyone to share. Just send an email to andrewsumner@yahoo.com, and I'd be happy to host it on, or link it from the Awka website.

So you've created quite a few Awka-ELM functions that you've put together into a library. Let's say they calculate the time needed to build the Sydney Harbour Bridge given a volume of manpower and the number of supervisors. Internally, there's quite a few algorithms that take into account strikes by unions, material shortages, and casualties as workers fall off the bridge.

Because of this complexity, within your library functions will need to call other functions. This is fine. What you need to do is not have an API function call another API function, but instead keep any functions they call hidden within the library, and also ensure these internal functions do not use the awka_getdoublevar(), awka_getstringvar() or awka_tmpvar() calls.

Apart from keeping your library structure nice and hierarchical and your API simple, it avoids overloading awka's internal pool of temporary variables. If this pool is overloaded, random chaos will ensue, so please avoid it.

All global variables in your AWK program are accessible by your library functions. Herein lies the potential for great danger, so be careful!

Global variables are, of course, pointers to a_VAR structures, and their name is the same as in the AWK script, with _awk appended. So the variable 'myvar' in the script would be myvar_awk in the translated C code. If you know what the variable name is, you can put an extern declaration of it in your library code then work with it directly, but this may be very restrictive, as it would mean that every script that uses your library would need that variable name reserved. There are other methods.

One of the easiest is with arrays. You can pass them in as arguments to your functions, as their address is passed over rather than a copy of their contents. Scalars are not as easy. Just say our function will work with a global variable, however it expects a string argument to contain the variable name in order to identify which variable to work with - this would make it pretty flexible.

You have available to you the gvar_struct variable _gvar (both described in awka-elmref(5)). This contains the name of every global variable in the script, and its a simple matter to search down the list to find a pointer to the a_VAR structure of the variable you want to use.

Looking again at the a_VAR structure, you may note that it contains a char * pointer that can reference strings, arrays and regular expressions. There is no reason why you couldn't introduce your own custom data structure and attach it to a global variable within one of your functions, as long as you adhere to the following rules:

1. Don't set the variable to anything in AWK after you set it to your customised value, as libawka will try (and fail) to free the value up, causing all sorts of flow-on problems.

2. Don't use the AWK language to copy or compare this variable to others, even with two variables of the same custom type (ie. custvar1 = custvar2), as libawka will have no idea how the copy should be done, and it will stuff it up. Instead, provide your own copy and comparison functions.

3. If your structures are memory intensive, you may consider providing a method of freeing the structures when they are no longer needed.

4. Document what your data structures and methods do, and how they should be used in the AWK script. Please, please do this, as it could save you a lot of grief later. If your library becomes publicly available this is especially necessary.

This has been a very brief introduction indeed, but hopefully enough to get you started. I recommend you refer to the awka-elmref(5) manpage for a listing of key libawka API functions and data definitions that are available for you to use (but hopefully not abuse). If you have any questions at all, don't be afraid to contact me (andrewsumner@yahoo.com). Put the word "awka" at the front of your message title so I know its not spam.

awka(1), awka-elmref(5), gcc(1)

Bound to be plenty. Let me know if you find a bug with the libawka interface, or get stuck with a problem. I am not, though, in any way responsible for bugs that are introduced by your code, nor am I liable for any damages or expenses incurred as a result. Nor am I liable for anything you do using Awka.

I'll help where I can, and I'll usually help debug someone's library if I have a personal interest in it. If you're not sure, try me anyway, the worst I can do is say no, and I might be able to help. I really like folk who send fixes along with bug reports, though. And I love the folk who send cash inducements (at last count, um, zero folk). Oh well, enough rambling, time to finish.

Andrew Sumner, August 2000 (andrewsumner@yahoo.com).
Aug 8 2000 Version 0.7.x

Search for    or go to Top of page |  Section 5 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.