Added a more obvious message for numdiff and derivative obsolescence
[scilab.git] / scilab / modules / optimization / help / en_US / numderivative.xml
1 <?xml version="1.0" encoding="UTF-8"?>
2 <!--
3  * Scilab ( http://www.scilab.org/ ) - This file is part of Scilab
4  * Copyright (C) 2008 - Rainer von Seggern
5  * Copyright (C) 2008 - Bruno Pincon
6  * Copyright (C) 2009 - INRIA - Michael Baudin
7  * Copyright (C) 2010-2011 - DIGITEO - Michael Baudin
8  *
9  * This file must be used under the terms of the CeCILL.
10  * This source file is licensed as described in the file COPYING, which
11  * you should have received as part of this distribution.  The terms
12  * are also available at
13  * http://www.cecill.info/licences/Licence_CeCILL_V2-en.txt
14  *
15  -->
16 <refentry version="5.0-subset Scilab"
17           xml:id="numderivative"
18           xml:lang="en"
19           xmlns="http://docbook.org/ns/docbook"
20           xmlns:xlink="http://www.w3.org/1999/xlink"
21           xmlns:svg="http://www.w3.org/2000/svg"
22           xmlns:ns4="http://www.w3.org/1999/xhtml"
23           xmlns:mml="http://www.w3.org/1998/Math/MathML"
24           xmlns:db="http://docbook.org/ns/docbook">
25     <refnamediv>
26         <refname>numderivative</refname>
27         <refpurpose>approximate derivatives of a function (Jacobian or Hessian)</refpurpose>
28     </refnamediv>
29     <refsynopsisdiv>
30         <title>Calling Sequence</title>
31         <synopsis>
32             J = numderivative(f, x)
33             J = numderivative(f, x, h)
34             J = numderivative(f, x, h, order)
35             J = numderivative(f, x, h, order, H_form)
36             J = numderivative(f, x, h, order, H_form, Q)
37             [J, H] = numderivative(...)
38         </synopsis>
39     </refsynopsisdiv>
40     <refsection>
41         <title>Arguments</title>
42         <variablelist>
43             <varlistentry>
44                 <term>f</term>
45                 <listitem>
46                     <para>
47                         a function or a list, the function to differenciate.
48                     </para>
49                 </listitem>
50             </varlistentry>
51             <varlistentry>
52                 <term>x</term>
53                 <listitem>
54                     <para>
55                         a n-by-1 or 1-by-n vector of doubles, real, the point where to compute the derivatives.
56                     </para>
57                 </listitem>
58             </varlistentry>
59             <varlistentry>
60                 <term>h</term>
61                 <listitem>
62                     <para>
63                         a 1-by-1 or n-by-1 vector of doubles, the step used in the finite difference
64                         approximations.
65                         If <literal>h</literal> is not provided, then the default step is computed
66                         depending on <literal>x</literal> and the <literal>order</literal>.
67                         If <literal>h</literal> is a 1-by-1 matrix, it is expanded
68                         to the same size as <literal>x</literal>.
69                     </para>
70                 </listitem>
71             </varlistentry>
72             <varlistentry>
73                 <term>order</term>
74                 <listitem>
75                     <para>
76                         a 1-by-1 matrix of doubles, integer, positive, the order of the finite difference
77                         formula (default <literal>order=2</literal>).
78                         The available values of <literal>order</literal> are 1, 2 or 4.
79                     </para>
80                 </listitem>
81             </varlistentry>
82             <varlistentry>
83                 <term>H_form</term>
84                 <listitem>
85                     <para>
86                         a string, the form in which the Hessian will be
87                         returned (default <literal>H_form="default"</literal>).
88                     </para>
89                     <para>
90                         The available values are "default", "blockmat" or "hypermat".
91                         See the section "The shape of the Hessian" below for details on this option.
92                     </para>
93                 </listitem>
94             </varlistentry>
95             <varlistentry>
96                 <term>Q</term>
97                 <listitem>
98                     <para>
99                         a real matrix of doubles, modifies the directions of differenciation (default is <literal>Q=eye(n,n)</literal>).
100                         The matrix <literal>Q</literal> is expected to be orthogonal. Q provides the possibility to remove
101                         the arbitrariness of using the canonical basis to approximate the derivatives of a function.
102                     </para>
103                 </listitem>
104             </varlistentry>
105             <varlistentry>
106                 <term>J</term>
107                 <listitem>
108                     <para>
109                         a m-by-n matrix of doubles, the approximated Jacobian.
110                         The row <literal>J(k, :)</literal> approximates the gradient of <literal>fk</literal>,
111                         for <literal>k = 1, 2, ..., m</literal>.
112                     </para>
113                 </listitem>
114             </varlistentry>
115             <varlistentry>
116                 <term>H</term>
117                 <listitem>
118                     <para>
119                         a matrix of doubles, the approximated Hessian.
120                         The elements of <literal>H</literal> approximate the second-order partial derivatives of <literal>f</literal>.
121                     </para>
122                 </listitem>
123             </varlistentry>
124         </variablelist>
125     </refsection>
126     <refsection>
127         <title>Description</title>
128         <para>
129             Computes the approximated Jacobian and Hessian matrices of a function with finite differences.
130         </para>
131         <para>
132             The function <literal>f</literal> takes as
133             input argument <literal>x</literal>, a <literal>n-by-1</literal> vector,
134             and returns <literal>y</literal>, a <literal>m-by-1</literal> vector.
135             In the following, the k-th component of <literal>f</literal>
136             is denoted by <literal>fk</literal> and the Hessian matrix of
137             <literal>fk</literal> is denoted by <literal>Hk</literal>,
138             for <literal>k = 1, 2, ..., m</literal>.
139         </para>
140         <para>
141             Any optional argument equal to the empty matrix <literal>[]</literal>
142             is replaced by its default value.
143         </para>
144         <para>
145             In general, the <literal>order=1</literal> formula provides the
146             largest error (least accurate), the <literal>order = 2</literal> formula provides an average
147             error and the <literal>order=4</literal> formula provides the lowest error (most accurate).
148             However, there are cases for which this is not true: see the section "Accuracy issues" below
149             for more details on this topic.
150         </para>
151         <para>
152             The second derivatives are computed by composition of first order derivatives.
153         </para>
154     </refsection>
155     <refsection>
156         <title>The function</title>
157         <para>
158             The function <literal>f</literal> must have the following header:
159         </para>
160         <screen>
161             y = f (x)
162         </screen>
163         <para>where</para>
164         <variablelist>
165             <varlistentry>
166                 <term>x</term>
167                 <listitem>
168                     <para>a n-by-1 vector of doubles, the current point</para>
169                 </listitem>
170             </varlistentry>
171             <varlistentry>
172                 <term>y</term>
173                 <listitem>
174                     <para>a m-by-1 vector of doubles, the function value</para>
175                 </listitem>
176             </varlistentry>
177         </variablelist>
178         <para>
179             It might happen that the function requires additionnal arguments to be evaluated.
180             In this case, we can use the following feature.
181             The argument <literal>f</literal> can also be the list <literal>(myfun, a1, a2, ...)</literal>.
182             In this case, <literal>myfun</literal>, the first element in the list, must be a function and must
183             have the header:
184             <screen>
185                 y = myfun (x, a1, a2, ...)
186             </screen>
187             where the input arguments <literal>a1, a2, ...</literal>
188             are automatically appended at the end of the calling sequence.
189         </para>
190     </refsection>
191     <refsection>
192         <title>The shape of the Hessian</title>
193         <para>
194             The <literal>H_form</literal> option changes the dimensions of the output argument
195             <literal>H</literal>.
196             This manages the general case where <literal>m</literal> is different from <literal>1</literal>,
197             that is, when there are several functions to differenciate at the same time.
198         </para>
199         <para>
200             The possible values of <literal>H_form</literal> are:
201         </para>
202         <variablelist>
203             <varlistentry>
204                 <term>H_form = "default":</term>
205                 <listitem>
206                     <para>
207                         H is a <literal>m-by-(n^2)</literal> matrix; in this form,
208                         the row <literal>H(k, :)</literal> contains <literal>Hk</literal>:
209                     </para>
210                     <screen>
211                         H(k, :) == Hk(:)'
212                     </screen>
213                 </listitem>
214             </varlistentry>
215             <varlistentry>
216                 <term>H_form = "blockmat":</term>
217                 <listitem>
218                     <para>
219                         H is a <literal>(mn)-by-n</literal> matrix: the <literal>n-by-n</literal> Hessian
220                         matrices of each component of <literal>f</literal> are stacked row-by-row, that is:
221                     </para>
222                     <screen>
223                         H == [H1 ; H2 ; ... ; Hm]
224                     </screen>
225                 </listitem>
226             </varlistentry>
227             <varlistentry>
228                 <term>H_form = "hypermat":</term>
229                 <listitem>
230                     <para>
231                         H is a n-by-n matrix for <literal>m=1</literal>, and a n-by-n-by-m hypermatrix otherwise.
232                         The matrix <literal>H(:, :, k)</literal> is the Hessian matrix of the k-th
233                         component of <literal>f</literal>, i.e.
234                     </para>
235                     <screen>
236                         H(:, :, k) == Hk
237                     </screen>
238                 </listitem>
239             </varlistentry>
240         </variablelist>
241     </refsection>
242     <refsection>
243         <title>Performances</title>
244         <para>
245             If the problem is correctly scaled, increasing the order of the finite difference formula may reduce
246             the error of approximation but requires more function evaluations.
247             The following list presents the number of function evaluations required to
248             compute the Jacobian depending on the order of the formula and the dimension of <literal>x</literal>,
249             denoted by <literal>n</literal>:
250         </para>
251         <itemizedlist>
252             <listitem>
253                 <para>
254                     <literal>order = 1</literal>, the number of function evaluations is <literal>n+1</literal>,
255                 </para>
256             </listitem>
257             <listitem>
258                 <para>
259                     <literal>order = 2</literal>, the number of function evaluations is <literal>2n</literal>,
260                 </para>
261             </listitem>
262             <listitem>
263                 <para>
264                     <literal>order = 4</literal>, the number of function evaluations is <literal>4n</literal>.
265                 </para>
266             </listitem>
267         </itemizedlist>
268         <para>
269             Computing the Hessian matrix requires the square of the number of function evaluations,
270             as detailed in the following list.
271         </para>
272         <itemizedlist>
273             <listitem>
274                 <para>
275                     <literal>order = 1</literal>, the number of function evaluations is <literal>(n+1)^2</literal> (total is <literal>(n+1)^2+n+1</literal>),
276                 </para>
277             </listitem>
278             <listitem>
279                 <para>
280                     <literal>order = 2</literal>, the number of function evaluations is <literal>4n^2</literal> (total is <literal>4n^2+2n</literal>),
281                 </para>
282             </listitem>
283             <listitem>
284                 <para>
285                     <literal>order = 4</literal>, the number of function evaluations is <literal>16n^2</literal> (total is <literal>16n^2+4n</literal>).
286                 </para>
287             </listitem>
288         </itemizedlist>
289     </refsection>
290     <refsection>
291         <title>Examples</title>
292         <programlisting role="example">
293             <![CDATA[
294 // The function to differenciate
295 function y = f(x)
296   f1 = sin(x(1)*x(2)) + exp(x(2)*x(3)+x(1))
297   f2 = sum(x.^3)
298   y = [f1; f2]
299 endfunction
300 // The exact gradient
301 function [g1, g2] = exactg(x)
302   g1(1) = cos(x(1)*x(2))*x(2) + exp(x(2)*x(3)+x(1))
303   g1(2) = cos(x(1)*x(2))*x(1) + exp(x(2)*x(3)+x(1))*x(3)
304   g1(3) = exp(x(2)*x(3) + x(1))*x(2)
305   g2(1) = 3*x(1)^2
306   g2(2) = 3*x(2)^2
307   g2(3) = 3*x(3)^2
308 endfunction
309 // The exact Hessian
310 function [H1, H2] = exactH(x)
311   H1(1, 1) = -sin(x(1)*x(2))*x(2)^2 + exp(x(2)*x(3)+x(1))
312   H1(1, 2) = cos(x(1)*x(2)) - sin(x(1)*x(2))*x(2)*x(1) + exp(x(2)*x(3)+x(1))*x(3)
313   H1(1, 3) = exp(x(2)*x(3)+x(1))*x(2)
314   H1(2, 1) = H1(1, 2)
315   H1(2, 2) = -sin(x(1)*x(2))*x(1)^2 + exp(x(2)*x(3)+x(1))*x(3)^2
316   H1(2, 3) = exp(x(2)*x(3)+x(1)) + exp(x(2)*x(3)+x(1))*x(3)*x(2)
317   H1(3, 1) = H1(1, 3)
318   H1(3, 2) = H1(2, 3)
319   H1(3, 3) = exp(x(2)*x(3)+x(1))*x(2)^2
320   //
321   H2(1, 1) = 6*x(1)
322   H2(1, 2) = 0
323   H2(1, 3) = 0
324   H2(2, 1) = H2(1, 2)
325   H2(2, 2) = 6*x(2)
326   H2(2, 3) = 0
327   H2(3, 1) = H2(1, 3)
328   H2(3, 2) = H2(2, 3)
329   H2(3, 3) = 6*x(3)
330 endfunction
331
332 // Compute the approximate Jacobian, the Hessian
333 x = [1; 2; 3];
334 J = numderivative(f, x)
335 [J, H] = numderivative(f, x)
336
337 // Compare with exact derivatives
338 [g1, g2] = exactg(x);
339 Jexact = [g1'; g2']
340 [H1, H2] = exactH(x);
341 Hexact = [H1(:)'; H2(:)']
342
343 // Configure the step
344 J = numderivative(f, x, 1.e-1)
345
346 // Configure the order
347 J = numderivative(f, x, [], 4)
348
349 // Configure Hessian shapes
350 [J, H] = numderivative(f, x, [], [], "blockmat")
351 [J, H] = numderivative(f, x, [], [], "hypermat")
352
353 // Configure Q
354 n = 3;
355 Q = qr(rand(n, n))
356 J = numderivative(f, x, [], [], [], Q)
357
358 // Test order 1, 2 and 4 formulas.
359 // For the order 4 formula, there are some entries in H
360 // which are computed as nonzero.
361 [g1, g2] = exactg(x);
362 [H1, H2] = exactH(x);
363 Jexact = [g1'; g2'];
364 Hexact = [H1(:)'; H2(:)'];
365 for i = [1, 2, 4]
366   [J, H] = numderivative(f, x, [], i);
367   dJ = floor(min(assert_computedigits(J, Jexact)));
368   dH = floor(min(assert_computedigits(H, Hexact)));
369   mprintf("order = %d, dJ = %d, dH = %d \n", i, dJ, dH);
370 end
371 ]]>
372         </programlisting>
373     </refsection>
374     <refsection>
375         <title>Passing extra parameters</title>
376         <para>
377             In the following example, we use a function which requires the extra-argument
378             <literal>p</literal>.
379             Hence, we use the feature that the argument <literal>f</literal>
380             can be a list, where the first argument is the function
381             <literal>G</literal> and the remaining elements are automatically passed
382             to the function.
383         </para>
384         <programlisting role="example">
385             <![CDATA[
386 function y = G(x, p)
387   f1 = sin(x(1)*x(2)*p) + exp(x(2)*x(3)+x(1))
388   f2 = sum(x.^3)
389   y = [f1; f2]
390 endfunction
391
392 p = 1;
393 [J, H] = numderivative(list(G, p), x)
394 ]]>
395         </programlisting>
396     </refsection>
397     <refsection>
398         <title>The Taylor formula</title>
399         <para>
400             If <literal>H</literal> is given in its default form, then the Taylor series of
401             <literal>f(x)</literal> up to terms of second order is given by:
402         </para>
403         <para>
404             <latex>
405                 <![CDATA[
406 f(x+h)\approx f(x)+J(x) h + \frac{1}{2} h^T H(x) h
407 ]]>
408             </latex>
409         </para>
410         <para>
411             In the following script, we check this formula with numerical differences.
412         </para>
413         <programlisting role="example">
414             <![CDATA[
415 // The function to differenciate
416 function y = f(x)
417   f1 = sin(x(1)*x(2)) + exp(x(2)*x(3)+x(1))
418   f2 = sum(x.^3)
419   y = [f1; f2]
420 endfunction
421 x = [1; 2; 3];
422 dx = 1e-3*[1; 1; -1];
423 [J, H] = numderivative(f, x);
424 f(x+dx)
425 f(x+dx)-f(x)
426 f(x+dx)-f(x)-J*dx
427 f(x+dx)-f(x)-J*dx-1/2*H*(dx .*. dx)
428 ]]>
429         </programlisting>
430         <para>
431             In the following example, we use a function which requires three extra-arguments
432             <literal>A</literal>, <literal>p</literal> and <literal>w</literal>.
433         </para>
434         <programlisting role="example">
435             <![CDATA[
436 // A trivial example
437 function y = f(x, A, p, w)
438   y = x'*A*x + p'*x + w;
439 endfunction
440 // with Jacobian and Hessian given
441 // by J(x) = x'*(A+A')+p' and H(x) = A+A'.
442 A = rand(3, 3);
443 p = rand(3, 1);
444 w = 1;
445 x = rand(3, 1);
446 h = 1;
447 [J, H] = numderivative(list(f, A, p, w), x, h, [], "blockmat")
448 ]]>
449         </programlisting>
450     </refsection>
451     <refsection>
452         <title>Accuracy issues</title>
453         <para>
454             When the step <literal>h</literal> is not provided, the <literal>numderivative</literal>
455             function tries to compute a step which provides a sufficient accuracy.
456             This step depends on the degree of the derivative (Jacobian or Hessian),
457             the order of the formula and the point <literal>x</literal>.
458             More precisely, if <literal>x</literal> is nonzero,
459             then the default step <literal>h</literal> is a vector with the
460             same size as <literal>x</literal>, scaled with the absolute value
461             of <literal>x</literal>.
462         </para>
463         <para>
464             In the following example, we compute the derivative of the square root function.
465             The following script plots a graph which represents the relative error of the
466             numerical derivative depending on the step.
467             We can see that there is an optimal step which minimizes the relative error.
468         </para>
469         <programlisting role="example">
470             <![CDATA[
471 // Its exact derivative
472 function y = mydsqrt (x)
473     y = 0.5 * x^(-0.5)
474 endfunction
475
476 x = 1.0;
477 n = 1000;
478 logharray = linspace (-16, 0, n);
479 for i = 1:n
480     h = 10^(logharray(i));
481     expected = mydsqrt(x);
482     computed = numderivative (sqrt, x, h);
483     relerr = abs(computed - expected) / abs(expected);
484     logearray (i) = log10 (relerr);
485 end
486 scf();
487 plot (logharray, logearray);
488 xtitle("Relative error of numderivative (x = 1.0)", ..
489   "log(h)", "log(RE)");
490 ]]>
491         </programlisting>
492         <para>
493             The strategy in <literal>numderivative</literal> provides a sufficient accuracy in many
494             cases, but can fail to be accurate in some cases.
495             In fact, the optimal step also depends on the function value <literal>f(x)</literal>
496             and is second derivative, both of which are unknown at the time where the
497             default step is computed.
498             See "Practical optimization", by Gill, Murray and Wright, Academic Press, 1981,
499             for more details on this topic.
500             The relevant sections are "4.6.1. Finite-difference Approximations to First
501             Derivatives" and "8.6 Computing finite differences".
502         </para>
503         <para>
504         </para>
505     </refsection>
506     <refsection>
507         <title>History</title>
508         <revhistory>
509             <revision>
510                 <revnumber>5.5.0</revnumber>
511                 <revdescription>
512                     Function <literal>numderivative</literal> introduced.
513                 </revdescription>
514             </revision>
515         </revhistory>
516     </refsection>
517 </refentry>