summaryrefslogtreecommitdiffstats
path: root/services/nfsclient/README
blob: 944b830e2e054edf5504260c21188af68ab57b3e (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
RTEMS-NFS
=========

A NFS-V2 client implementation for the RTEMS real-time
executive.

Author: Till Straumann <strauman@slac.stanford.edu>, 2002

Copyright 2002, Stanford University and
                Till Straumann <strauman@slac.stanford.edu>

Stanford Notice
***************

Acknowledgement of sponsorship
* * * * * * * * * * * * * * * *
This software was produced by the Stanford Linear Accelerator Center,
Stanford University, under Contract DE-AC03-76SFO0515 with the Department
of Energy.


Contents
--------
I   Overview
  1) Performance
  2) Reference Platform / Test Environment

II  Usage
  1) Initialization
  2) Mounting Remote Server Filesystems
  3) Unmounting
  4) Unloading
  5) Dumping Information / Statistics

III Implementation Details
  1) RPCIOD
  2) NFS
  3) RTEMS Resources Used By NFS/RPCIOD
  4) Caveats & Bugs

IV  Licensing & Disclaimers

I  Overview
-----------

This package implements a simple non-caching NFS
client for RTEMS. Most of the system calls are
supported with the exception of 'mount', i.e. it
is not possible to mount another FS on top of NFS
(mostly because of the difficulty that arises when
mount points are deleted on the server). It
shouldn't be hard to do, though.

Note: this client supports NFS vers. 2 / MOUNT vers. 1;
      NFS Version 3 or higher are NOT supported.

The package consists of two modules: RPCIOD and NFS
itself.

 - RPCIOD is a UDP/RPC multiplexor daemon. It takes
   RPC requests from multiple local client threads,
   funnels them through a single socket to multiple
   servers and dispatches the replies back to the
   (blocked) requestor threads.
   RPCIOD does packet retransmission and handles
   timeouts etc.
   Note however, that it does NOT do any XDR
   marshalling - it is up to the requestor threads
   to do the XDR encoding/decoding. RPCIOD _is_ RPC
   specific, though, because its message dispatching
   is based on the RPC transaction ID.

 - The NFS package maps RTEMS filesystem calls
   to proper RPCs, it does the XDR work and
   hands marshalled RPC requests to RPCIOD.
   All of the calls are synchronous, i.e. they
   block until they get a reply.

1) Performance
- - - - - - - -
Performance sucks (due to the lack of
readahead/delayed write and caching). On a fast
(100Mb/s) ethernet, it takes about 20s to copy a
10MB file from NFS to NFS.  I found, however, that
vxWorks' NFS client doesn't seem to be any
faster...

Since there is no buffer cache with read-ahead
implemented, all NFS reads are synchronous RPC
calls. Every read operation involves sending a
request and waiting for the reply. As long as the
overhead (sending request + processing it on the
server) is significant compared to the time it
takes to transferring the actual data, increasing
the amount of data per request results in better
throughput. The UDP packet size limit imposes a
limit of 8k per RPC call, hence reading from NFS
in chunks of 8k is better than chunks of 1k [but
chunks >8k are not possible, i.e., simply not
honoured: read(a_nfs_fd, buf, 20000) returns
8192]. This is similar to the old linux days
(mount with rsize=8k).  You can let stdio take
care of the buffering or use 8k buffers with
explicit read(2) operations. Note that stdio
honours the file-system's st_blksize field
if newlib is compiled with HAVE_BLKSIZE defined.
In this case, stdio uses 8k buffers for files
on NFS transparently. The blocksize NFS 
reports can be tuned with a global variable
setting (see nfs.c for details).

Further increase of throughput can be achieved
with read-ahead (issuing RPC calls in parallel
[send out request for block n+1 while you are
waiting for data of block n to arrive]).  Since
this is not handled by the file system itself, you
would have to code this yourself e.g., using
parallel threads to read from a single file from
interleaved offsets.

Another obvious improvement can be achieved if
processing the data takes a significant amount of
time. Then, having a pipeline of threads for
reading data and processing them makes sense
[thread b processes chunk n while thread a blocks
in read(chunk n+1)].

Some performance figures:
Software: src/nfsTest.c:nfsReadTest() [data not
          processed in any way].
Hardware: MVME6100
Network:  100baseT-FD
Server:   Linux-2.6/RHEL4-smp [dell precision 420]
File:     10MB

Results:
Single threaded ('normal') NFS read, 1k buffers: 3.46s (2.89MB/s)
Single threaded ('normal') NFS read, 8k buffers: 1.31s (7.63MB/s)
Multi  threaded; 2 readers, 8k buffers/xfers:    1.12s (8.9 MB/s)  
Multi  threaded; 3 readers, 8k buffers/xfers:    1.04s (9.6 MB/s)

2) Reference Platform
- - - - - - - - - - -
RTEMS-NFS was developed and tested on

 o RTEMS-ss20020301 (local patches applied)
 o PowerPC G3, G4 on Synergy SVGM series board
   (custom 'SVGM' BSP, to be released soon)
 o PowerPC 604 on MVME23xx
   (powerpc/shared/motorola-powerpc BSP)
 o Test Environment:
    - RTEMS executable running CEXP
    - rpciod/nfs dynamically loaded from TFTPfs
    - EPICS application dynamically loaded from NFS;
      the executing IOC accesses all of its files
      on NFS.

II Usage
---------

After linking into the system and proper initialization
(rtems-NFS supports 'magic' module initialization when
loaded into a running system with the CEXP loader),
you are ready for mounting NFSes from a server
(I avoid the term NFS filesystem because NFS already
stands for 'Network File System').

You should also read the

  - "RTEMS Resources Used By NFS/RPCIOD"
  - "CAVEATS & BUGS"

below.

1) Initialization
- - - - - - - - - 
NFS consists of two modules who must be initialized:

 a) the RPCIO daemon package; by calling

      rpcUdpInit();

    note that this step must be performed prior to
    initializing NFS:

 b) NFS is initialized by calling

      nfsInit( smallPoolDepth, bigPoolDepth );

    if you supply 0 (zero) values for the pool
    depths, the compile-time default configuration
    is used which should work fine.

NOTE: when using CEXP to load these modules into a
running system, initialization will be performed
automagically.

2) Mounting Remote Server Filesystems
- - - - - - - - - - - - - - - - - - -

There are two interfaces for mounting an NFS:

 - The (non-POSIX) RTEMS 'mount()' call:

     mount( &mount_table_entry_pointer,
            &filesystem_operations_table_pointer,
            options,
            device,
            mount_point )

    Note that you must specify a 'mount_table_entry_pointer'
    (use a dummy) - RTEMS' mount() doesn't grok a NULL for
    the first argument.

     o for the 'filesystem_operations_table_pointer', supply

         &nfs_fs_ops
   
     o options are constants (see RTEMS headers) for specifying
       read-only / read-write mounts.

     o the 'device' string specifies the remote filesystem
       who is to be mounted. NFS expects a string conforming
       to the following format (EBNF syntax):

         [ <uid> '.' <gid> '@' ] <hostip> ':' <path>

       The first optional part of the string allows you
       to specify the credentials to be used for all
       subsequent transactions with this server. If the
       string is omitted, the EUID/EGID of the executing
       thread (i.e. the thread performing the 'mount' - 
       NFS will still 'remember' these values and use them
       for all future communication with this server).
       
       The <hostip> part denotes the server IP address
       in standard 'dot' notation. It is followed by
       a colon and the (absolute) path on the server.
       Note that no extra characters or whitespace must
       be present in the string. Example 'device' strings
       are:

         "300.99@192.168.44.3:/remote/rtems/root"

         "192.168.44.3:/remote/rtems/root"

    o the 'mount_point' string identifies the local
      directory (most probably on IMFS) where the NFS
      is to be mounted. Note that the mount point must
      already exist with proper permissions.

 - Alternate 'mount' interface. NFS offers a more
   convenient wrapper taking three string arguments:

	nfsMount(uidgid_at_host, server_path, mount_point)

   This interface does DNS lookup (see reentrancy note
   below) and creates the mount point if necessary.
   
   o the first argument specifies the server and
     optionally the uid/gid to be used for authentication.
     The semantics are exactly as described above:

       [ <uid> '.' <gid> '@' ] <host>
     
     The <host> part may be either a host _name_ or
     an IP address in 'dot' notation. In the former
     case, nfsMount() uses 'gethostbyname()' to do
     a DNS lookup.

     IMPORTANT NOTE: gethostbyname() is NOT reentrant/
     thread-safe and 'nfsMount()' (if not provided with an
     IP/dot address string) is hence subject to race conditions.
 
   o the 'server_path' and 'mount_point' arguments
     are described above.
     NOTE: If the mount point does not exist yet,
           nfsMount() tries to create it.

   o if nfsMount() is called with a NULL 'uidgid_at_host'
     argument, it lists all currently mounted NFS

3) Unmounting
- - - - - - -
An NFS can be unmounted using RTEMS 'unmount()'
call (yep, it is unmount() - not umount()):

  unmount(mount_point)

Note that you _must_ supply the mount point (string
argument). It is _not_ possible to specify the
'mountee' when unmounting. NFS implements no
convenience wrapper for this (yet), essentially because
(although this sounds unbelievable) it is non-trivial
to lookup the path leading to an RTEMS filesystem
directory node.

4) Unloading
- - - - - - -
After unmounting all NFS from the system, the NFS
and RPCIOD modules may be stopped and unloaded.
Just call 'nfsCleanup()' and 'rpcUdpCleanup()'
in this order. You should evaluate the return value
of these routines which is non-zero if either
of them refuses to yield (e.g. because there are
still mounted filesystems).
Again, when unloading is done by CEXP this is
transparently handled.

5) Dumping Information / Statistics
- - - - - - - - - - - - - - - - - -

Rudimentary RPCIOD statistics are printed
to a file (stdout when NULL) by

  int rpcUdpStats(FILE *f)

A list of all currently mounted NFS can be
printed to a file (stdout if NULL) using

  int nfsMountsShow(FILE *f)

For convenience, this routine is also called
by nfsMount() when supplying NULL arguments.

III Implementation Details
--------------------------

1) RPCIOD
- - - - -

RPCIOD was created to

a) avoid non-reentrant librpc calls.
b) support 'asynchronous' operation over a single
   socket.

RPCIOD is a daemon thread handling 'transaction objects'
(XACTs) through an UDP socket.  XACTs are marshalled RPC
calls/replies associated with RPC servers and requestor
threads.

requestor thread:                 network:

       XACT                        packet  
        |                            |
        V                            V
  | message queue |              ( socket )
        |                            |  ^
        ---------->          <-----  |  |
                     RPCIOD             |
                   /       --------------
           timeout/         (re) transmission
                         

A requestor thread drops a transaction into 
the message queue and goes to sleep.  The XACT is
picked up by rpciod who is listening for events from
three sources:

  o the request queue
  o packet arrival at the socket
  o timeouts

RPCIOD sends the XACT to its destination server and
enqueues the pending XACT into an ordered list of
outstanding transactions.

When a packet arrives, RPCIOD (based on the RPC transaction
ID) looks up the matching XACT and wakes up the requestor
who can then XDR-decode the RPC results found in the XACT
object's buffer.

When a timeout expires, RPCIOD examines the outstanding
XACT that is responsible for the timeout. If its lifetime
has not expired yet, RPCIOD resends the request. Otherwise,
the XACT's error status is set and the requestor is woken up.

RPCIOD dynamically adjusts the retransmission intervals
based on the average round-trip time measured (on a per-server
basis).

Having the requestors event driven (rather than blocking
e.g. on a semaphore) is geared to having many different
requestors (one synchronization object per requestor would
be needed otherwise).

Requestors who want to do asynchronous IO need a different
interface which will be added in the future.

1.a) Reentrancy
- - - - - - - - 
RPCIOD does no non-reentrant librpc calls.

1.b) Efficiency
- - - - - - - - 
We shouldn't bother about efficiency until pipelining (read-ahead/
delayed write) and caching are implemented. The round-trip delay
associated with every single RPC transaction clearly is a big
performance killer.

Nevertheless, I could not withstand the temptation to eliminate
the extra copy step involved with socket IO:

A user data object has to be XDR encoded into a buffer. The 
buffer given to the socket where it is copied into MBUFs.
(The network chip driver might even do more copying).

Likewise, on reception 'recvfrom' copies MBUFS into a user
buffer which is XDR decoded into the final user data object.

Eliminating the copying into (possibly multiple) MBUFS by
'sendto()' is actually a piece of cake. RPCIOD uses the
'sosend()' routine [properly wrapped] supplying a single
MBUF header who directly points to the marshalled buffer
:-)

Getting rid of the extra copy on reception was (only a little)
harder: I derived a 'XDR-mbuf' stream from SUN's xdr_mem which
allows for XDR-decoding out of a MBUF chain who is obtained by
soreceive().

2) NFS
- - - -
The actual NFS implementation is straightforward and essentially
'passive' (no threads created). Any RTEMS task executing a
filesystem call dispatched to NFS (such as 'opendir()', 'lseek()'
or 'unlink()') ends up XDR encoding arguments, dropping a
XACT into RPCIOD's message queue and going to sleep.
When woken up by RPCIOD, the XACT is decoded (using the XDR-mbuf
stream mentioned above) and the properly cooked-up results are
returned.

3) RTEMS Resources Used By NFS/RPCIOD
- - - - - - - - - - - - - - - - - - -

The RPCIOD/NFS package uses the following resources. Some
parameters are compile-time configurable - consult the
source files for details.

RPCIOD:
 o 1 task 
 o 1 message queue
 o 1 socket/filedescriptor
 o 2 semaphores (a third one is temporarily created during
   rpcUdpCleanup()).
 o 1 RTEMS EVENT (by default RTEMS_EVENT_30).
   IMPORTANT: this event is used by _every_ thread executing
              NFS system calls and hence is RESERVED.
 o 3 events only used by RPCIOD itself, i.e. these must not
   be sent to RPCIOD by no other thread (except for the intended
   use, of course). The events involved are 1,2,3.
 o preemption disabled sections:      NONE
 o sections with interrupts disabled: NONE
 o NO 'timers' are used (timer code would run in IRQ context)
 o memory usage: n.a

NFS:
 o 2 message queues
 o 2 semaphores
 o 1 semaphore per mounted NFS
 o 1 slot in driver entry table (for major number)
 o preemption disabled sections:      NONE
 o sections with interrupts disabled: NONE
 o 1 task + 1 semaphore temporarily created when
   listing mounted filesystems (rtems_filesystem_resolve_location())

4) CAVEATS & BUGS
- - - - - - - - -
Unfortunately, some bugs crawl around in the filesystem generics.
(Some of them might already be fixed in versions later than
rtems-ss-20020301).
I recommend to use the patch distributed with RTEMS-NFS.

 o RTEMS uses/used (Joel said it has been fixed already) a 'short'
   ino_t which is not enough for NFS.
   The driver detects this problem and enables a workaround. In rare
   situations (mainly involving 'getcwd()' improper inode comparison
   may result (due to the restricted size, stat() returns st_ino modulo
   2^16). In most cases, however, st_dev is compared along with st_ino
   which will give correct results (different files may yield identical
   st_ino but they will have different st_dev). However, there is 
   code (in getcwd(), for example) who assumes that files residing
   in one directory must be hosted by the same device and hence omits
   the st_dev comparison. In such a case, the workaround will fail.
 
   NOTE: changing the size (sys/types.h) of ino_t from 'short' to 'long'
         is strongly recommended. It is NOT included in the patch, however
         as this is a major change requiring ALL of your sources to
         be recompiled.

   THE ino_t SIZE IS FIXED IN GCC-3.2/NEWLIB-1.10.0-2 DISTRIBUTED BY
   OAR.

 o You may work around most filesystem bugs by observing the following
   rules:

    * never use chroot() (fixed by the patch)
    * never use getpwent(), getgrent() & friends - they are NOT THREAD
      safe (fixed by the patch)
    * NEVER use rtems_libio_share_private_env() - not even with the
      patch applied. Just DONT - it is broken by design.
    * All threads who have their own userenv (who have called
      rtems_libio_set_private_env()) SHOULD 'chdir("/")' before
      terminating. Otherwise, (i.e. if their cwd is on NFS), it will
      be impossible to unmount the NFS involved.

 o The patch slightly changes the semantics of 'getpwent()' and
   'getgrent()' & friends (to what is IMHO correct anyways - the patch is
   also needed to fix another problem, however): with the patch applied,
   the passwd and group files are always accessed from the 'current' user
   environment, i.e. a thread who has changed its 'root' or 'uid' might
   not be able to access these files anymore.
      
 o NOTE: RTEMS 'mount()' / 'unmount()' are NOT THREAD SAFE.

 o The NFS protocol has no 'append' or 'seek_end' primitive. The client
   must query the current file size (this client uses cached info) and
   change the local file pointer accordingly (in 'O_APPEND' mode).
   Obviously, this involves a race condition and hence multiple clients
   writing the same file may lead to corruption.

IV Licensing & Disclaimers
--------------------------

NFS is distributed under the SLAC License - consult the
separate 'LICENSE' file.

Government disclaimer of liability
- - - - - - - - - - - - - - - - -
Neither the United States nor the United States Department of Energy,
nor any of their employees, makes any warranty, express or implied,
or assumes any legal liability or responsibility for the accuracy,
completeness, or usefulness of any data, apparatus, product, or process
disclosed, or represents that its use would not infringe privately
owned rights.

Stanford disclaimer of liability
- - - - - - - - - - - - - - - - -
Stanford University makes no representations or warranties, express or
implied, nor assumes any liability for the use of this software.

Maintenance of notice
- - - - - - - - - - -
In the interest of clarity regarding the origin and status of this
software, Stanford University requests that any recipient of it maintain
this notice affixed to any distribution by the recipient that contains a
copy or derivative of this software.