-
Include Files
We used the "obvious" mappings for the length types. Fortunately, on
all environments of immediate interest, the data types are correct and
we did not have to invent a further abstraction for these.
-
Opaque Types
The opaque types are defined as void*'s and are internally cast
as pointers into the correct context. While this is performant, future
work may dictate a more complete abstraction of these types.
-
Error Handling
Internal errors are detected via _OSDEP_ASSERT macros. These can
be #defined away to ignore errors or can trap into debuggers or panics.
Region Kill is not yet implemented. Most, but not all, of the assertions
could be converted into region kill operations when that is available.
-
Interrupt Vectors
All the current environments support shared interrupts.
-
Allocation Daemon
All the current environments have a synchronous resource allocators.
The asynch interface is provided via the alloc daemon. There is a "work"
function that does the bulk of the work and a small OS-specific wrapper
to handle the thread/process creation, signalling, and teardown.
-
Metalanguages and Mappers
The current environment has been ported to existing OSes with both
staticly linked modules and with fully dynamic modules. Development tends
to be more pleasant on the latter.
-
Control Block Management
-
Scratch Space Layout
We decided on an approach somewhere between that described in "inlined
scratch space" and "dynamic allocation". The scratch is allocated as part
of the entire control block. When the cb enters a region on behalf of a
channel operation with a larger scratch requirement than that of the current
cb, the cb is reallocated and contents of the persistent fields are copied.
This will only happen on the first entry so the control blocks quickly
reach a quiescent size.
-
Marshalling Area
The marshalling area is unioned into the scratch space. This is acceptable
because scratch is not preserved across a channel op and only channel ops
need to marshal arguments. Contrary to what is described in Implementer's
Guide, the environment _does_ know the marshalling needs (via the layout
templates) and can compute that into the scratch requirement above.
-
Memory Management
Internally, the memory allocator is abstracted via the OSDEP_MEM_ALLOC
facility. It allows the environment to decide if allocations should be
allowed to block or not.
-
Inter-Module Communication
The IMC code currently take advantage of architectural flexibility
that allows regions to be running in different address domains or with
different levels of privilege. The life cycles of channels, the "pipes"
between regions, are very carefully synchronized through the use of REGION_LOCK
and other mutual exclusion primitives.
-
Region life cycle
One of the subtler things in the implementation is the synchronization
of the region teardown. Because of the highl level of queueing
in the UDI model it is sometimes difficult to determine exactly when things
are idle enough to be destroyed.
-
PIO
-
Serialization
In UDI, 1.01 the serialization domain argument was added to PIO map. This
allows a more realistic granularity of serialization. The UDI Reference
Implementation uses that entry from udiprops.txt to allocate and use appropriate
number of serialization facilities.
-
Temporary registers
The temporary register set is sized based upon the number of serialization
domains and is stored in the PIO handle.
-
Endianness of temporary registers
The trans engine internally represents the temporary registers as arrays
of native words of native endianness, but the least significant word is
always at index zero. There is a large comment block inside udi_piotrans.c
that describes this.
-
Interrupts
-
Shared Interrupts
All the target OSes so far have supported shared interrupts and the ability
to associate a void * with the interrupt handler so none of the reference
environments have had to implement any of the unsightly options described.
-
Interrupt Preprocessing
We had an interesting challenge implementing interrupt preprocessing.
All the existing OSes that have UDI ports the interrupt handlers are
synchronous. Typically the process looks something like this:
-
OS gets an interrupt from the hardware.
-
OS masks interrupts of that type.
-
OS calls the driver code.
-
Driver code looks at hardware. If that device is interrupting,
services the hardware, performs other work, and returns a "claimed" flag
.
-
OS tests the return value to see if that interrupt was claimed or not.
-
If interrupt wasn't claimed, call next driver (which may be the same one
again) on the list of drivers that may be servicing this particular interrupt.
Repeat above steps.
-
OS unmasks interrupt source.
Our problems begin because the reference implementation really didn't
want to change the host OS. The host OS driver model is meant
for, well, drivers and not driver infrastructure like ours.
(The other place where this limitation of a layered approach shows
up is in the bridge mapper when we want to walk the bus.)
Interrupt preprocessing really is about separating the hardware mechanics
from the "other work" above. The "look at hardware" step is
performed by a PIO transaction list provided by the driver.
The catch is that udi_pio_trans is specified to be asynchronous.
Unfortunately, this means by the time udi_pio_trans returns, we don't know
the return value gleaned from the callback of the trans list. Thus
we don't know the status to return to the OS interrupt code and we'll probably
end up spinning in the host OS loop wildly thrashing the interrupt handlers.
This was "solved" in the reference implementation by making sure that udi_pio_trans
was never really asynchronous, which admittedly negates the parallelization/scheduling
opportunities afforded us by the UDI specification.
What we really want at some point is to make the host OS loop know
about this model and leave the interrupt active, but masked at the PIC/APIC
level until we get the callback from the PIO transaction.