Monday, November 28, 2005

MySQL 5 general purpose routine library - II: Using arrays

As you know, MySQL 5 stored routines language does not provide any sort of array structure. While this is not a problem when dealing with normal SQL, since arrays are a symptom of non-normalized data, when using a programming language, it becomes quite a strain, and the programmer is often struggling to achieve results that would be really natural and simple using an array, but are made unnecessarily hard by lack of them.
Thus comes into existence this library of routines that emulate arrays in MySQL 5.

Implementation

Arrays, as part of the MySQL General Purpose Stored Routines Library are implemented as datasets, all belonging to one table, and identified by a unique name. The details of the implementation are hiddenby the interface, and you should not worry a lot about it.

General concepts about the array library

Whenever possible, each routine is implemented both as a procedure and as a function. Only when a function returns an array value or an array property, then there is no corresponding procedure. This is a design choice, so that you are free of calling the routine in the most convenient way. When using a procedure that has a function synonym, a user variable with the same name of the procedure is set with the function return value. For instance, calling function array_create() you get the array ID.

If you choose to call the procedure instead, then @array_create will contain the array ID.

Array creation

You can create an array using array_create(name, size). 'Name' is any string not yet used as an array name. If you want to be sure that the name is not taken, use the function array_exists(name). 'size' is the number of items that you want to create (they will be set to NULL).
Using a zero size will create an empty array.

call array_create('myfirst',0);
You can also create an array from string containing a comma separated list, with the function array_from_list(string, array_name).

If you want to use a different separator, then you may want to look at array_from_list_complete(string, array_name, separator).

   call array_from_list('one,two,three', 'myarray');
set @list = 'Doc:Grumpy:Happy:Sleepy:Bashful:Sneezy:Dopey';
call array_from_list_complete(@list, 'snow_white_friends', ':');

array_from_list returns the array name, so it's easy to use it combined with a routine that requires an array:


select array_size(array_from_list('one,two,three','myarray'));

Inserting data

You can treat these library arrays in four different ways:
normal arrays

Thus you can insert a value by index.
array_set_value_by_index(array_name,index,value) does this, with its universal alias array_set(array_name, index, value);

associative arrays

Also known as hashes in Perl or maps in C++. You can set the value by key rather than by index.
array_set_value_by_key(array_name, key, value) is the routine for this, and array_set(array_name,key, value) can be an alias for that routine as well. Notice that when you set a value by key, the array will still have its automatic index. So you can alway access an array by index, even if you set the values by key.

queues

First in, first out, is the principle of queues. If you want to do this, insert your values using array_unshift(array_name, value) which adds a value at the beginning of the array, and retrieve them using array_shift(array_name), which returns (and deletes) the last element in
the array.
Notice that array_unshift is the most expensive method in this collection, because all the array_indexes need to be decreased by 1.


stacks

Last in, first out. This is one of the most used device in programming. array_push(array_name,value) adds an item to the end of an array, and array_pop(array_name) retrieves the last item and deletes it.

retrieving data

No matter how you entered data into an array, you can always retrieve your items by index. The lowest index is zero, the highest one is returned by array_max_index(array_name). Knowing this, you can quickly create a loop to walk through an array, using array_get_value_by_index(array_name, index). If you inserted your values by key, you can retrieve them in the same way. array_get_value_by_key(array_name, key) does the job. For your convenience, there is also a array_get(array_name, key_or_index) function that you can use to get values by key or index. If you want to set a key for an existing array value, you can use array_set_key_by_index(array_name, index, key).

information on arrays

There are some convenience routines to get information about arrays.
array lists

to see the list of all available arrays, use array_list(), that prints a
list of all public arrays.
Private arrays are all arrays with a name starting with an underscore character
('_'). These arrays are usually created and used inside a procedure and should
not be shared.
If you want to see them as well, use array_full_list().

array contents

To see all the items in an array, call [RP]array_show(array_name) .




advanced array usage


inserting several items at once

The array routines were designed to allow flexibility in insertion. In several programming languages you can fill an array with just a statement, be it a declaration or an action. Of course, the quickest way is to use the array_from_list function, but if you have an already existing array and you need to add more items, you can do one of the following:

more array_set functions in a SELECT statement

SELECT array_set('name', 'key1', 'value1'),
array_set('name', 'key2', 'value2'),
array_set('name', 'key3', 'value2') \G

chain insertion calls using array_setn

array_setn returns the array name, so it can be used in each statement where an array name is required.

select array_setn( array_setn( array_setn(
'name', 'key1', 'value1'),
'key2', 'value2'),
'key3', 'value3');


Populate an array from a dataset

To achieve this result, we need the help from another part of the library. For example, if we want to insert into an array the list of USA cities as keys with their population as value, we can do this:

call for_each_table_value(
'world', -- the database
'City', -- table name
'Name', -- the first column to use
'Population', -- the second column to use
null, -- no need for a third column: null
'CountryCode="USA"', -- filter clause
'call array_set("USA_cities","$I1","$I2")');
In this statement we invoke a for_each_table_value engine to execute our query for each row in table City where CountryCode = 'USA'. The query to execute will call the array_set procedure, using the columns as arguments ($I1 and $I2). As a result, we'll have an array of 264 rows, containing all USA cities with their population.

call array_show('USA_cities');
+-------------+-------------------------+-------------+
| array_index | array_key | array_value |
+-------------+-------------------------+-------------+
| 0 | New York | 8008278 |
| 1 | Los Angeles | 3694820 |
| 2 | Chicago | 2896016 |
| 3 | Houston | 1953631 |
| 4 | Philadelphia | 1517550 |
... ... ...
| 258 | Fall River | 90555 |
| 259 | Kenosha | 89447 |
| 260 | Elgin | 89408 |
| 261 | Odessa | 89293 |
| 262 | Carson | 89089 |
| 263 | Charleston | 89063 |
+-------------+-------------------------+-------------+
264 rows in set (0.20 sec)

If your dataset is reasonably short, there is another trick you can pull, without using a 'for each' routine. Let's assume that you want to put into an array just the districts for Italy, taken from table 'City' in the example world database.


select array_from_list(
(select group_concat(distinct District)
from world.City where CountryCode='ITA'),
'italian_districts')\G

call array_show('italian_districts');
+-------------+-----------+----------------------+
| array_index | array_key | array_value |
+-------------+-----------+----------------------+
| 0 | NULL | Latium |
| 1 | NULL | Lombardia |
| 2 | NULL | Campania |
| 3 | NULL | Piemonte |
| 4 | NULL | Sicily |
| 5 | NULL | Liguria |
| 6 | NULL | Emilia-Romagna |
| 7 | NULL | Toscana |
| 8 | NULL | Apulia |
| 9 | NULL | Veneto |
| 10 | NULL | Friuli-Venezia Giuli |
| 11 | NULL | Calabria |
| 12 | NULL | Sardinia |
| 13 | NULL | Umbria |
| 14 | NULL | Abruzzi |
| 15 | NULL | Trentino-Alto Adige |
| 16 | NULL | Marche |
+-------------+-----------+----------------------+
Well, a few are missing, but not because of our routine, which seems to work fine. Notice that if the list of entries is larger than 1024 bytes, group_concat will not work as expected, and you need to increase a system variable to get correct results. But for short lists this is a trick to keep in mind.
Even better, using array_from_pair_list, you can populate an array with a list of key/value pairs.

set group_concat_max_len = 8000; -- necessary, or group_concat can't work
-- well in this case
select array_from_pair_list(
(select group_concat( concat(Name,"=>",population ))
from world.City where CountryCode ='USA'),
'USA_cities')\G
And USA_cities will get the same contents we have seen when using the for_each_table_value method.

What else can you do with arrays? Wonderful things, as you'll see in the next articles about this library.

5 comments:

Jason Suwala said...

array_from_list has problems dealing with empty values in the CSV parameter

eg: array_from_list('"","",""', "array_name" ) seems to give errors.

Plus a hard-found problem: all values in the CSV input MUST be quoted (even for integers)

David said...

Hi,

I really love your libraries and they have really saved me a lot of time. Especially the for_each_table.

I think that I have found a bug in your implementation of queue. Look at the order of the data-output in the following example. Notice that the order is the same whether i use stack or queue.

mysql> call array_create('ids_since_find_case', 0);
Query OK, 0 rows affected (0.05 sec)

mysql> call array_unshift('ids_since_find_case', 1);
Query OK, 0 rows affected (0.00 sec)

mysql> call array_unshift('ids_since_find_case', 2);
Query OK, 0 rows affected (0.00 sec)

mysql> select array_shift('ids_since_find_case');
+------------------------------------+
| array_shift('ids_since_find_case') |
+------------------------------------+
| 2 |
+------------------------------------+
1 row in set (0.00 sec)

mysql> select array_shift('ids_since_find_case');
+------------------------------------+
| array_shift('ids_since_find_case') |
+------------------------------------+
| 1 |
+------------------------------------+
1 row in set (0.00 sec)

mysql> call array_create('ids_since_find_case', 0);
Query OK, 0 rows affected (0.00 sec)

mysql> call array_push('ids_since_find_case', 1);
Query OK, 0 rows affected (0.00 sec)

mysql> call array_push('ids_since_find_case', 2);
Query OK, 0 rows affected (0.00 sec)

mysql> select array_pop('ids_since_find_case');
+----------------------------------+
| array_pop('ids_since_find_case') |
+----------------------------------+
| 2 |
+----------------------------------+
1 row in set (0.00 sec)

mysql> select array_pop('ids_since_find_case');
+----------------------------------+
| array_pop('ids_since_find_case') |
+----------------------------------+
| 1 |
+----------------------------------+
1 row in set (0.00 sec)

mysql>

David said...

Hi again,

I just thought I should mention another small detail. There is an inconsistency between stacks and queues when attempting to retreve a value from an empty list, one giving an error and the other one returning a "null" value.

Regards,

David

Solo said...

Right now I am working on converting a massive Oracle database to MySQL 5.6 and I'm having major trouble porting the Varrays to anything MySQL understands without hurting the performance too much.

Can you tell me if this library is actually still compatible with MySQL 5.6.9rc?

Thanks in advance,

Erwin

Giuseppe Maxia said...

This library was made for MySQl 5.0, and probably it is not even fully compatible with MySQL 5.5.
Besides, stored routines in MySQL are not performing very well.