summaryrefslogtreecommitdiffstats
path: root/bsps/powerpc/shared/altivec/README
blob: 61ebb8ddedea787b27c5f6ffe4c40c2d04633d48 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
RTEMS ALTIVEC SUPPORT 
=====================

1. History
----------

Altivec support was developed and maintained as a user-extension
outside of RTEMS. This extension is still available (unbundled)
from Till Straumann <strauman@slac.stanford.edu>; it is useful
if an application desires 'lazy switching' of the altivec context.

2. Modes
--------

Altivec support -- the unbundled extension, that is -- can be used
in two ways:

a. All tasks are implicitly AltiVec-enabled.

b. Only designated tasks are AltiVec-enabled. 'Lazy-context switching'
   is implemented to switch AltiVec the context.

Note that the code implemented in this directory supports mode 'a'
and mode 'a' ONLY. For mode 'b' you need the unbundled extension
(which is completely independent of this code).

Mode 'a' (All tasks are AltiVec-enabled)
- - - - - - - - - - - - - - - - - - - - -

The major disadvantage of this mode is that additional overhead is 
involved: tasks that never use the vector unit still save/restore
the volatile vector registers (20 registers * 16bytes each) across
every interrupt and all non-volatile registers (12 registers * 16b each)
during every context switch.

However, saving/restoring e.g., the volatile registers is quite
fast -- on my 1GHz 7457 saving or restoring 20 vector registers
takes only about 1us or even less (if there are cache hits).

The advantage is complete transparency to the user and full ABI
compatibility (exept for ISRs and exception handlers), see below.

Mode 'b' (Only dedicated tasks are AltiVec-enabled)
- - - - - - - - - - - - - - - - - - - - - - - - - -

The advantage of this mode of operation is that the vector-registers
are only saved/restored when a different, altivec-enabled task becomes
ready to run. In particular, if there is only a single altivec-enabled
task then the altivec-context *never* is switched.

Note that this mode of operation is not supported by the code
in this directory -- you need the unbundled altivec extension
mentioned above.

3. Compiler Options
------------------- 

Three compiler options affect AltiVec: -maltivec, -mabi=altivec and
-mvrsave=yes/no.

-maltivec: This lets the cpp define the symbol __ALTIVEC__ and enables
           gcc to emit vector instructions. Note that gcc may use the
           AltiVec engine implicitly, i.e., **without you writing any
           vectorized code**.

-mabi=altivec: This option has two effects:
           i) It ensures 16-byte stack alignment required by AltiVec
              (even in combination with eabi which is RTEMS' default).
           ii) It allows vector arguments to be passed in vector registers.

-mvrsave=yes/no: Instructs gcc to emit code which sets the VRSAVE register
           indicating which vector registers are 'currently in use'.
           Because the altivec support does not use this information *) the
           option has no direct affect but it is desirable to compile with
           -mvrsave=no so that no unnecessary code is generated.

          *) The file vec_sup_asm.S conditionally disables usage of
             the VRSAVE information if the preprocessor symbol
             'IGNORE_VRSAVE' is defined, which is the default.

             If 'IGNORE_VRSAVE' is undefined then the code *does*
             use the VRSAVE information but I found that this does
             not execute noticeably faster.

IMPORTANT NOTES
===============

AFAIK, RTEMS uses the EABI which requires a stack alignment of only 8 bytes
which is NOT enough for AltiVec (which requires 16-byte alignment).

There are two ways for obtaining 16-byte alignment:

I)  Compile with -mno-eabi (ordinary SYSV ABI has 16-byte alignment)
II) Compile with -mabi=altivec (extension to EABI; maintains 16-byte alignment
    but also allows for passing vector arguments in vector registers)

Note that it is crucial to compile ***absolutely everything*** with the same
ABI options (or a linker error may occur). In particular, this includes

 - newlibc multilib variant
 - RTEMS proper 
 - application + third-party code

IMO the proper compiler options for Mode 'a' would be

    -maltivec -mabi=altivec -mvrsave=no

Note that the -mcpu=7400 option also enables -maltivec and -mabi=altivec
but leaves -mvrsave at some 'default' which is probably 'no'.
Compiling with -mvrsave=yes does not produce incompatible code but
may have a performance impact (since extra code is produced to maintain
VRSAVE).

4. Multilib Variants
--------------------

The default GCC configuration for RTEMS contains a -mcpu=7400 multilib
variant which is the correct one to choose.

5. BSP 'custom' file.
---------------------

Now that you have the necessary newlib and libgcc etc. variants
you also need to build RTEMS accordingly.

In you BSP's make/custom/<bsp>.cfg file make sure the CPU_CFLAGS
select the desired variant:

for mode 'a':

   CPU_CFLAGS = ... -mcpu=7400

Note that since -maltivec globally defines __ALTIVEC__ RTEMS automatially
enables code that takes care of switching the AltiVec context as necessary.
This is transparent to application code.

6. BSP support
--------------

It is the BSP's responsibility to initialize MSR_VE, VSCR and VRSAVE
during early boot, ideally before any C-code is executed (because it
may, theoretically, use vector instructions).

The BSP must

 - set MSR_VE
 - clear VRSAVE; note that the probing algorithm for detecting
   whether -mvrsave=yes or 'no' was used relies on the BSP
   clearing VRSAVE during early start. Since no interrupts or
   context switches happen before the AltiVec support is initialized
   clearing VRSAVE is no problem even if it turns out that -mvrsave=no
   was in effect (eventually a value of all-ones will be stored
   in VRSAVE in this case).
 - clear VSCR

7. PSIM note
------------

PSIM supports the AltiVec instruction set with the exception of
the 'data stream' instructions for cache prefetching. The RTEMS
altivec support includes run-time checks to skip these instruction
when executing on PSIM.

Note that AltiVec support within PSIM must be enabled at 'configure'
time by passing the 'configure' option

--enable-sim-float=altivec

Note also that PSIM's AltiVec support has many bugs. It is recommended
to apply the patches filed as an attachment with gdb bug report #2461
prior to building PSIM.

The CPU type and corresponding multilib must be changed when
building RTEMS/psim:

  edit make/custom/psim.cfg and change

    CPU_CFLAGS = ... -mcpu=603e

  to

    CPU_CFLAGS = ... -mcpu=7400

This change must be performed *before* configuring RTEMS/psim.