Valentina KERNEL – Page 8 – Paradigma Software Blog

v5.6 Builds Index of String/VarChar field in 10-15 times faster!

Bench Description

Prepare Steps

We have table T1 with field String[120] or VarChar.
We add 1 million records with unique values using METHOD( ‘concat( rand_string(80), RecID’ ) )

So we get table with one only field, 1 millions records, values are about 80-90 chars length.

Benches

Build index for this field.

Continue reading v5.6 Builds Index of String/VarChar field in 10-15 times faster!

New Docs Section – Valentina DB Indexes

We have re-write from scratch section about Valentina DB Indexes. You can read it here.

[NEW] Generating Test/Bench Data for Data Focused Apps (Part 2)

We have implement Stored Procedure ‘GenerateDataFor()’, which solves the task described in the part1 of this article.

We have added into Valentina Wiki new section, where we will keep this and future other Stored Procedures By Paradigma Software.

On this page you can find link to WIKI page, which describes GenerateDataFor() procedure and contains download link.

[NEW] SQL expression function RAND_REGEXP()

We added a new SQL function RAND_REGEXP into v5.5.6 (available in the night_build or upcoming beta). This is very powerful function that can generate random strings of any kind, following some regular expression.

This function is useful for test and bench data generation features we will be talking more about.

Our wiki has coverage of this new function: RAND_REGEXP

[NEW] Generating Test/Bench Data for Data Focused Apps (Part 1)

There are several database tools available for generating records for table T with some random data. Usually these tools can…

generate the test data itself;
format the data for replication of some bug

Yes, both very useful. But as speed junkies and test pilots, we also want to use this feature to

generate data for use in benchmarks

The difference between test and bench data, is that for benchmarking today, tomorrow and months or years later, we should generate the same records into a Table. Otherwise how we can compare results of a benchmarks as computer scientists? For tests it is okay to use random values in records, but benchmarks require exactness.

We were going to add such feature into Valentina Studio, but then we started thinking about benchmarking the Valentina engine (made in C++). It is clear then that we need such a feature right in the engine. So how to implement it?
Continue reading [NEW] Generating Test/Bench Data for Data Focused Apps (Part 1)

[NEW] Valentina DB engine – SELECT … FOR JSON

We made the first step in the direction to popular JSON format.

Valentina SQL already did have extentions

SELECT … FOR XML
SELECT … FOR REPORT

Now we adding one more: SELECT … FOR JSON.

We have upload 5.5b21 build where this feature is introduced. It works in same way as FOR XML. In the result you get cursor with a single record and a single TEXT field.

Example:

SQL query ‘SELECT * FROM tblCustomer FOR JSON’ returns

{
      "name": "tblCustomer",
      "fields": ["fldFirstName","fldLastName","fldCountry","fldPhone"],
      "records": [
           ["Peter","Thomas","Germany","111111"],
           ["Brian","Hill","USA","222222"],
           ["Simon","Smith","Italy","333333"],
           ["Chris","Maxwell","France","444444"],
           ["Greg","Silver","France","555555"],
           ["Jerry","Lucas","USA","666666"],
           ["Mark","Lord","Canada","777777"]
      ]
}

Valentina now will try to open read-only dbs in older format

One use of 4.9 Valentina ADK have told us that many his users have databases on DVD in 4,9 format. So if he will try change his app to Valentina 5.0, then how users will be able work with this read-only databases?

To resolve this, we have improve Valentina engine. Now it will make attempt to work with read-only databases in older format as is, without conversion.

Important to note, that this will work fine for 5.x engine opening 4.x databases.

[NEW][VSQL] MAIL Command

We have add a new command for Valentina SQL.

The main goal is to be able use Valentina Server as generator of PDF and/or HTML reports and sending them by email directly from VSERVER’s stored procedure. Besides, this command can be called by Event Scheduler of VSERVER or by a database or table trigger.

vext_mail
 : __MAIL
   __FROM character_string_literal_or_var 
   __TO character_string_literal_or_var 
   __SUBJECT character_string_literal_or_var
   __BODY character_string_literal_or_var
  [__ATTACH vext_attach_list]
      __SMTP character_string_literal_or_var
      __PORT character_string_literal_or_var
     [__USER character_string_literal_or_var,
      __PASSWORD character_string_literal_or_var]
     [__SSL truth_value_or_var]

vext_attach_list
 : character_string_literal_or_var AS character_string_literal_or_var , ...

character_string_literal_or_var
 : character_string_literal
 | variable_name

uint_or_var
 : UINT
 | variable_name

truth_value_or_var
 : truth_value
 | variable_name

truth_value
 : TRUE
 | FALSE

Continue reading [NEW][VSQL] MAIL Command

[NEW] DEFAULT clause extended by METHOD(‘const_expr’)

We have extend Valentina SQL by non standard feature. DEFAULT clause now has form DEFAULT METHOD(‘const_expr’).

This step increases declarative power of DDL part of VSQL and, therefore, allows you do less job later working with inserts and updates.

You can use in the expression built-in Valentina functions and UDFs that not depends on other fields. The most useful examples are:

* now()
* UUID()
* nextval( sequence_name )
* current_user_name()

Compatibility:
* this is not standard syntax.
* PostgreSQL have similar syntax and behavior, but it specify expression just in the literal: DEFAULT STRING_LITERAL. This cause ambiguity.

[NEW] VKERNEL now can create journal at a specified location + Sandboxed Mac Apps with V4CC.

Frank have contact us with request add ability for V4CC (Valentina for Cocoa) developers to specify the location of journal of a database. This is important for sandboxed applications, which on default can access only their sandbox folder and a file(s) that user specify explicitly.

PROBLEM is that if a user choose somedb.vdb file in the SelectFile dialog , then Valentina engine needs yet to create a journal file near to .vdb file. But for a sandboxed application this is prohibited by OS X. This is why developer want to be able specify another location for journal file.

Btw, this problem exists more of year for SQLite database that is used e.g. in CoreData of Apple, when it is used by a sandboxed app. Strange, but the only advice from Apple is – disable journal file.

We have spend couple of days to add into C++ level and into V4CC ADK this feature. Rest ADKs soon.

Now you can write the following in V4CC:

Continue reading [NEW] VKERNEL now can create journal at a specified location + Sandboxed Mac Apps with V4CC.

[NEW] Localisable ENUM Type

Ladies and Gentlemen!
The first time in the world! 🙂
Localizable Enum Type in DBMS!

We already many months have working ENUM type in 5.0 branch of Valentina. Let me remind that ENUM type is not from SQL standard, so different DBs implement it in different way if at all implement. We have implemented it using CREATE TYPE command of SQL Standard. And we have implement ENUM in way similar to PostgreSQL, because it is the most correct: you just CREATE TYPE ENUM once and later using it in all places of your database.

mySQL, in contrast, defines ENUM as part of a particular Table, right in the CREATE TABLE command. This is not good of course, because then you cannot use this type in other tables or for variables of Stored Procedures.

CREATE TABLE sizes ( name ENUM('small', 'medium', 'large') );

It is interesting that such mature database as Oracle do not have ENUM type.

All these existed implementations have one big problem from our point of view: such enum types contains string values of only one language. Below we will describe our solution.

Continue reading [NEW] Localisable ENUM Type

[NEW] 3 SQL Functions Added

On request of users we have add 3 new SQL functions:

* UNIX_TIMESTAMP() – read more …
* FROM_UNIX_TIMESTAMP() – read more …
* MURMURHASH() — read more …

Join speed improved

After some benches we have discover that loop on small joins (when only one record on the left and few on the right) is not fast enough.

We have two major algorithms internally and we have discover that the first takes only 50 seconds on 10000 loops, while the second takes hundreds of seconds…So problem presents in the second algorithm.

Improved.

Now the second algorithm takes 120 second. And we can add yet choice condition to choose the first for such case of small joins…

I think we will be able to improve the second alg yet to 70-80 sec. And may be with more complex changes it is possible to speed both after that …

Choosing appropriate database’s segment size

Each Valentina’s volume consists of set of segments, even internal service-data placed in such segments.

Database storage is implemented similar to some file system. There are volumes (.vdb, .dat, .blb, .ind, .tmp) and there are some embedded files on that volumes (all the data like field-data, indexes and so on go to that files). Each volume operates with own segment map – so we can find (and allocate new) segments for particular embedded file easy and fast.

Continue reading Choosing appropriate database’s segment size

[NEW] ORDER BY in aggregative functions First()/Last()

SQL Standard allows to SELECT only fields mentioned in the GROUP BY and expressions based on aggregative functions. You cannot SELECT a normal field. But sometimes you may very want to do this. Question is what to do in this case.

In this new Valentina Wiki article, we have describe in detail this problem and gave THREE solutions. The third solution is new for v5.0 and it works x400/ x50 times faster of the first two correspondently in tests on the database of our customer.

The third solution, uses idea from ORACLE database actually: FIRST()/LAST() functions with own ORDER BY to be used inside of each group. It seems mySQL and PostgreSQL do not have any way to resolve this task in such effective way.