martedì 23 agosto 2016

Summary of the work - GSoC

MathJax TeX Test Page Hello!

In this final blog post I want to summarize the work done so far for the GSoC. First of all I want to thank Octave community for their support:
when I had some problems I have always found someone in the mailing list or in the IRC channel willing to help me.

In my public repository I pushed three final commits:

https://bitbucket.org/Francesco_Faccio/octave-gsoc/commits/all

The first commit I pushed regards the work we did in the first part of the GSoC and it is a patch for Octave which allows to include IDA module of Sundials library as a dependence of Octave.

I wrote some configuration tests, checking for sundials_ida and sundials_nvecserial headers and functions and a macro to test whether sundials_ida has been configured with double precision realtype. I included a test for klu library in suitesparse, because it is required for sundials' sparse solver.
If all these conditions are satisfied, you can use ode15i and ode15s in Octave (you can see in my previous post how to configure sundials and octave correctly and how to run benchmark problems).


The second commit regards my work on solvers ode23 and ode45 which were already in Octave.
Since there were a lot of duplicate code, we decided to refactor the solvers, exporting the check of the options into two private functions:

$[defaults, classes, attributes] = odedefaults (n, t0, tf)$

is a function which returns a struct defaults, containing the default values of all the solvers, and structs classes and attributes, containing the arguments for validateattributes.

$options = odemergeopts (useroptions, options, classes, attributes, fun_name);$

is a function which loops over default fields and, if a user provides a new value, uses validateattributer or validatestring to test the value and overwrites the default one.

odeset.m and odeget.m have been modified in such a way that the user can add new special fields to an ODE option structure. They don't depend on private functions ode_struct_value_check and known_option_names, but use inputParser to perform input checking. This version of odeset doesn't include the option to catch a misspelled option name yet, so that 'Rel' doesn't become automatically 'RelTol'.

We wrote function decic, which can compute consistent initial conditions for ode15i, close to initial guesses y0 and yp0.


The third commit contains files ode15i.m, ode15s.m and a common interface __ode15__.cc


ode15i


The main functionalities of ode15i have been achieved. Here's a list of the options accepted by the solver:

RelTol: You can specify a relative error tolerance as a positive scalar. The default value is 1e-3.

AbsTol: You can specify an absolute error tolerance as a positive scalar or vector. The default value is 1e-6.

NormControl: Sundials controls the relative error using the norm of the solution at each step, so option NormControl is always "on".

OutputFcn, OutputSel: You can specify them to plot the components of the solution you want after each successful integration step. The default output function is $@odeplot$, but you can provide a custom output function.

Refine: You can increase the number of output points using the refine factor. __ode15__.cc uses the IDAGetDky function to reach the highest possible order of interpolation.

Stats: Only standard statistics are printed (number of successful steps, number of failed attempts and the number of function evaluations).

Events: If you provide an event function, it must be of the form $ [value,isterminal,direction] = EventFcn(t,y,yp)$. The solver localizes a change of sign on your event function and uses linear interpolation to extimate the time when the event occurs. A way to improve this functionality will be to use the rootfinding function of Sundials in order to have an higher order of interpolation.

MaxStep: Use this option to specify an upper bound for the integration step. The default value is $0.1*abs(tf - t0)$.

InitialStep: Use this option to specify the first integration step of the solver. Its default value depends on the tspan and the initial slope.

Jacobian: You can specify the Jacobian as a function which returns the modified Jacobian $[\frac{\partial F}{\partial y}, \frac{\partial F}{\partial y'}] = jac (t, y, y')$ or as a cell array containing the two constant Jacobian matrices $ \{[\frac{\partial F}{\partial y}], [\frac{\partial F}{\partial y'}] \}$.

In both cases, if the two parts of the Jacobian are sparse, the Jacobian will be stored in CCS format.
If you don't provide the Jacobian, it will be extimated internally by Sundials, using difference-quotient approximation.
In this implementation you can't provide only one part of the Jacobian yet, but you have to specify both.

JPattern, Vectorized: These options haven't been developed yet.

MaxOrder: You can specify an upper bound for the order of the solver. The default value is 5.


This solver can be extended in order to increase the compatibility with Matlab:
- completing the missing options
- allowing the user to solve equations with complex number
- allowing the user to use single precision variables.


ode15s


This solver uses the same dld-function of ode15i. Inside the m-file, the semi-explicit DAE is written as an implicit one and solve as ode15i.
The solver uses the same options of ode15i, but in addition, it has:

NonNegative: this option has not been implemented yet. It can be implemented using IDASetConstraint function.

Events: Same as ode15i, but it must have the form $[value,isterminal,direction] = EventsFcn(t,y)$.

Jacobian: You can provide a function which returns the Jacobian of the form $[\frac{\partial F}{\partial y}] = jac(t,y)$ or a constant Jacobian matrix.

Mass: You can provide a constant Mass matrix or a function handle of the form $M(t,y)$ or $M(t)$. If the Mass matrix depends on the state, the Jacobian Matrix will be computed internally from Sundials. If you provide a sparse Jacobian and a sparse Mass matrix, they will be stored in CCS format.

MStateDependence: Specify MStateDependence to "strong" or "weak" if the Mass matrix has the form $M(t,y)$. Use the value "none" if the Mass matrix has the form $M(t)$ or it is constant.

MvPattern: This option has not been developed yet.

MassSingular: This option has not been developed yet. The solver solves the problem both with and without a singular mass matrix in the same way.

InitialSlope: You can specify the initial slope of the problem. The default value is a vector of zeros.

BDF: this implementation of ode15s uses always BDF method.


This solver can be extended in order to increase the compatibility with Matlab:
- completing the missing options
- allowing the user to solve equations with complex number
- allowing the user to use single precision variables.


This code has not been merged yet, since Octave is concluding a release process.

Concluding, I really enjoied working on this project and there is still a lot of work to do. I will work in the next weeks to extend the functionalities of ode15i and ode15s. I plan to continue to contribute to Octave development, maybe both in ode and in some statistics project.

domenica 14 agosto 2016

How to replicate test

Hi!

Here's a little guide to replicate the test I made for ode15i

STEP 1

Download last version of Sundials (only IDA module is required (IDA version > 2.8.0), but SundialsTB, idas and kinsol will be used for some test). Follow the install guide of Sundials to build IDA module with KLU (you will have to specify the lib and include directories where you have previously installed KLU). Use SUNDIALS_PRECISION as double (default) in configuration.

STEP 2

Clone my octave repo you can find here: https://bitbucket.org/Francesco_Faccio/octave
and build Octave from source. KLU is already a Octave dependence, but you can specify other versions you have installed at configuration time.

STEP 3

Here you can find the scripts and C functions I used to test ode15i.
You can test my oct-file running the scripts in folder Octave and the same scripts in Matlab folder for testing Matlab's ode15i.

To test the C examples of Sundials compile the C file you can find in folder C, linking against libklu, libsundials_ida and libsundials_nvecserial. If you installed these libraries in default path, type:

$ gcc -c idaHeat2D_klu.c idaRoberts_dns.c
$ gcc -o idaHeat2D_klu idaHeat2D_klu.o  -lsundials_ida
 -lsundials_nvecserial -lklu -lm
$ gcc -o idaRoberts_dns idaRoberts_dns.o  -lsundials_ida
 -lsundials_nvecserial -lklu -lm
$ ./idaRoberts_dns
$ ./idaHeat2D_klu

Sundials provides some mex interfaces for solving the benchmark problems. In order to test that, you have to edit file SundialsTB/install_STB.m changing mex compiler (near row 20):

mexcompiler = 'mkoctfile --mex';

Install Octave and run from Octave install_STB.m script. Choose to compile IDAS and kinsol interface and install the Toolbox. A startup file is generated where you chose to install SundialsTB: run it.

Now you could be able to run the script in MEX folder, which uses Sundials mex interface for solving the dense problem.


For problem ida_klu in Octave and Matlab folder, you can choose to test the solver with a sparse jacobian function, a dense jacobian function, a cell jacobian dense or a sparse jacobian dense.

Since I added the cell jacobian option in ode15i recently and I changed some parameters in the benchmark problems, I have some new results:

Problem Robertson eq (tspan from zero to 4e10, abstol = [1e-8, 1e-14,1e-6], reltol = 1e-4, dense methods):

Matlab: 0.41s
Octave: 0.06s
C impl: 0.002s
Mex impl: 0.12s

Without jacobian:

Matlab: 0.45s
Octave: 0.07s
Mex impl: 0.15s

Problem Heat2D (tspan from zero to 10.24, reltol = 1e-5, abstol = 1e-8, sparse methods):

With a sparse jacobian function:

Matlab: 0.34s
Octave: 0.13s
C impl: 0.005s

With a dense jacobian function:

Matlab: 0.44s
Octave: 0.15s

With a dense jacobian cell:

Matlab: 0.45s
Octave: 0.08s

With a sparse jacobian cell:

Matlab: 0.34s
Octave: 0.07s

Without a jacobian:

Matlab: 0.56s
Octave: 0.47s


It seems a quite goode result, considering that C and Mex implementations are specificly designed for solving these problems. Since I use a m-interface before calling my oct-file, there is a little overhead, but it will be negligible for bigger more complex problems.

For these test I configured Octave with -O2 flag and all the results refer to the code till this commit

lunedì 1 agosto 2016

Performance test

Hello!

As klu library is now linked to ode15i, I could test the performance of the code both in the dense and sparse case. I tested my oct-file against Matlab ode15i m-file, the C implementation of sundials examples and the m-file relying on the mex interface of sundials. Although the code still contains unefficient use of memory, I think that the results are pretty interesting.

Before going into details, I have to specify that C and mex implementations are written for solving these specific problems, so their code is highly optimized (specific Jacobian matrices are constructed directly inside functions required by IDA, while in my oct-file I receive a function that when evaluated it returns the Jacobian matrices I have to pass to IDA). In other words, solving a problem requires more time than solving the problem.

The first test regards the solution of Robertson chemical kinetics problem, in which the Jacobian is a 3x3 dense matrix. I found a solution for RelTol = 1e-4, AbsTol = [1e-8, 1e-14, 1e-6], tspan = [0, 0.4] and I got (in mean):

0.015s for the oct-file
0.13s for the mex-file
0.002s for the C-implementation
0.19s for Matlab ode15i.

The second test regards the solution of a 2D heat equation. In this problem the Jacobian was a 100x100 sparse matrix with 330 non zero elements. I tested with RelTol = 1e-5, AbsTol = 1e-8 and tspan =[0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24] and got:

0.42s for the oct-file
0.005s for the C-implementation
0.45s for Matlab ode15i.

Unfortunately Sundials doesn't provide a mex interface for klu, so I couldn't test a sparse mex file version for this problem.

We can now make some considerations:
-Oct-file is 10 times faster than mex-file and this confirms that mex files cannot reach the performance of an equivalent (in this case, of a more general) otc-file
-Our solver is already faster than Matlab m-file
-The C implementation is the fastest, but code is designed to solve only a specific problem. We can try to get closer to this speed.

In order to improve the performance of the solver, the next step is to try to reduce looping and the use of memory and to write a DAE class, using functors and lambda functions to avoid the use of global pointers.