Let the immediately following instruction be a two-word coprocessor load or store. Unfortunately these never worked well enough for it to be usable. The default is signed. Maverikc Board Electrical Schematics.
|Published (Last):||3 December 2011|
|PDF File Size:||3.84 Mb|
|ePub File Size:||17.34 Mb|
|Price:||Free* [*Free Regsitration Required]|
Mainline GCC support has never worked for it but there is a modified compiler available that does and that is able to generate Crunch-accelerated Debian packages. Discussion specific to it usually happens on the linux-cirrus mailing list. The revision of a chip is printed as the 5th and 6th characters of the second line of text on the chip housing. The now rare D0 revision has a more extensive range of hardware bugs than the later revisions; from D1-E2 no further modifications were made to the design of the Maverick unit.
Here we only attempt to work around the bugs in the later series. Cirrus stopped development of its ARM devices on 1st April no joke! Registers It has 16 bit registers, which can be treated as single- or double-precision floating point values, or as or bit integers. Single-precision floats live in the top 32 bits of the register and, when they are written, the lower 32 bits are zeroed. It also has four bit multiply-accumulate integer registers which are not used by GCC. Instruction set It provides instructions to add, subtract, multiply, compare, negate and give absolute value for all these types, to shift the registers in the two integer modes, and to convert between the data types.
These operations can only be done between Maverick registers, but data can be copied between Maverick and ARM registers and between Maverick registers and main memory. Operating modes The FPU can operate in several modes, controlled by bits in its status register: ISAT: Deselects saturating arithmetic for integer operations and selects the usual C-like overflowing. The default is saturating, which is wrong for C.
UI: Unsigned integer: in comparisons between integers, the values as considered signed or unsigned when they are compared, unlike the ARM and FPA and VFP comparisons which set the condition codes which are then considered signed or unsigned when a decision is made.
The default is signed. The default is asynchronous i. LAME gains 2. The default is non-forwarding. Instruction format MaverickCrunch instructions are bit words that are interleaved with the regular ARM instrution stream. It appears as co-processors 4, 5 and 6 and its instruction words in hexadecimal match the regular expression 0x. In GCC output, this is further restricted to 0xe[cde] Most crucially, it fails to take proper account of the way that the FPU sets the condition code registers after a comparison, so the code it generates sometimes gets floating point and bit integer comparisons wrong as well as failing to account for several of the hardware bugs.
GCC does not use: the bit integer operations. It performs these in ARM registers as usual. It has a -mfix-cirrus-invalid-insns flag, which is supposed to ensure that the two instructions following a branch are not Cirrus one but fails to do so, and that every cfldrd , cfldr64 , cfstrd , cfstr64 is followed by one non-Cirrus instruction, which should fix bugs 1 and 2. There are three versions of it, all based on gcc Some real-life programs compiled with it do seem to work though.
The modifications are published as a megabyte tarball from which a single monolithic patch can be derived by diffing it against the mainline source releases. What a crock! Futaris patches futaris patches for gcc Futaris' strategy includes disabling all conditional instructions other than branch and all bit integer operations.
Here is how to build a futaris-patched compiler, a summary of their merits, and some benchmarks. It disables all bit integer operations which appear to have more unidentified hardware bugs, as shown by the openssl testsuite. The -mcirrus-di flag enables them, caveat emptor. The unpublished futaris patches for 4. This thread on binutils mailing list explains why unwind support is needed. As you can see in Sec 9. The above patch incorrectly calls the iWMMXt pop functions. A new Pop MV registers instruction needs to be added to the table, along with changes to Sec 7.
At the moment, only the development branch git of libunwind supports ARM processors. Joseph S. That illustrates the sort of thing that needs changing to implement unwind support for a new coprocessor. Obviously you need to get the unwind specification in the official ARM EABI documents first before implementing it in GCC, and binutils will also need to support generating correct information given.
Hardware bugs See cirrus. The following is from the EP rev E2 errata : Definitions 1. A branch is taken and it is one of the two instructions in the branch delay slot. An exception occurs. An interrupt occurs. In the sample I have tested a TS it is not operating in serialised mode by these criteria because no exceptions are enabled.
Source: dspsc. Arithmetic into accumulators: cfmadd32 , cfmadda32 , cfmsub32 , cfmsuba Effects: in a bit register load, the top 32 bits are loaded with junk; in a bit memory store, an extra 32 bits of junk are written to memory in the word following the bits that were correctly written.
An instruction may be nonexecuted because it is conditional and the condition is false, e. GCC does not emit conditional Maverick instructions, and the branch case would be covered by mainline's -mcirrus-fix-invalid-insns flag if that code were not broken: in fact it turns b;cfxxx;non-cirrus into b;nop;cfxxx;non-cirrus thereby causing the bug to occur!
Futaris and Cirrus remove this flag. A test program tickles the bug in both ways on revision E1 silicon. Let the second instruction be an instruction with the same target, but not be executed. Execute a third instruction at least one of whose operands is the target of the previous two instructions.
For example, assume no pipeline interlocks other than the dependencies involving register c0 in the following instruction sequence: cfadd32 c0, c1, c2 cfsub32ne c0, c3, c4 ; assume this does not execute cfstr32 c0, [r2, 0x0] In this particular case, the incorrect value stored at the address in r2 is the previous value in c0, not the expected one resulting from the cfadd Suggested fix: cfadd32 c0, c1, c2 nop ; inserted extra instruction here nop ; inserted extra instruction here cfsub32ne c0, c3, c4 ; assume this does not execute nop ; inserted extra instruction here nop ; inserted extra instruction here nop ; inserted extra instruction here cfstr32 c0, [r2, 0x0] The exact interval for safe operation is uncertain.
GCC doesn't emit conditional Maverick instructions and the jump case should fixed by mainline's -mfix-cirrus-invalid-instructions. Let the first instruction be a serialized instruction that does not execute. For an instruction to be serialized, at least one of the following must be true: The processor must be operating in serialized mode. Let the immediately following instruction be a two-word coprocessor load or store. In the case of a load, only the lower 32 bits the first word will be loaded into the target register.
Workaround: cfadd32ne c0, c1, c2 ; assume this does not execute nop ; inserted extra instruction here cfldr64 c3, [r2, 0x0] ; store sequence cfadd32ne c4, c5, c6 ; assume this does not execute nop ; inserted extra instruction here cfstr64 c3, [r2, 0x0] The real-world CPUs I've tested are not running in serialized mode, and GCC does not emit cfmv32sc or cfmvsc If there are serialized ones out there, GCC does not emit conditional Maverick instructions, which just leaves the case of a Maverick instruction being in one of the two slots after a branch that is taken, which is covered by -mcirrus-fix-invalid-insns.
Execute an instruction that is a data operation not a move between ARM and coprocessor registers whose destination is one of the general purpose register c0 through c Execute an instruction that is a two-word coprocessor store either cfstr64 or cfstrd , where the destination register of the first instruction is the source of the store instruction, that is, the second instruction stores the result of the first one to memory.
Finally, the first and second instruction must appear to the coprocessor with the correct relative timing; this timing is not simply proportional to the number of intervening instructions and is difficult to predict in general. The result is that the lower 32 bits of the result stored to memory will be correct, but the upper the 32 bits will be wrong.
The value appearing in the target register will still be correct. Examples from LAME : cfmuld mvd1, mvd1, mvd0 mov r2, r7 mov r3, r5 mov r0, r8 ldr r1, [pc, ] cfstrd mvd1, [sp, 8] cfmuld mvd1, mvd1, mvd0 mov r0, r8 mov r3, r4 ldr r1, [pc, ] cfstrd mvd1, [sp] cfldrd mvd0, [r8, 8] cfaddd mvd0, mvd1, mvd0 cfstrd mvd0, [r8, 8] but a sample system was not operating with forwarding enabled.
Under Linux on the sample board I use, forward is disabled by default. Enabling forwarding in a test program on revision E1 hardware, I have been unable to get this bug to bite.
The instructions shift by an unpredictable amount, but cause no other side effects. Possible workarounds include: Disable interrupts when executing cfldr32 or cfmv64lr instructions. Avoid executing these two instructions.
Do not depend on the sign extension to occur; that is, ignore the upper word in any calculations involving data loaded using these instructions. Add extra code to sign extend the lower word after it is loaded by explicitly forcing the upper word to be all zeroes or all ones, as appropriate. It is possible to do this selectively in exception or interrupt handler code. If the instruction preceding the interrupted instruction can be determined, and it is a cfldr32 or cfmv64lr , the instruction may be re-executed or explicitly sign extended before returning from interrupt or exception.
Mainline GCC does not emit cfldr32 , and use of cfmv64lr is disabled as buggy. In three places it is used as the first of a two-instruction sequence: in all cases the top 32 bits are either overwritten or ignored by the second instruction. Verdict: Not a problem. This error can occur if the following is true: The first instruction must be a coprocessor compare instruction, one of cfcmp32 , cfcmp64 , cfcmps , and cfcmpd.
The second instruction: has an accumulator as a destination. GCC does not use the accumulator instructions. This error will occur under the following conditions: The first instruction: must update a coprocessor accumulator. The second instruction is not a coprocessor data path instruction. Coprocessor data path instructions include any instruction that does not move data to or from memory or to or from the ARM registers.
The second consecutive instruction: is a coprocessor load or store. When the error occurs, the result is either coprocessor register or memory corruption. Here are several examples: cfstr64ne c0, [r0, 0x0] ; assume does not execute cfldrs c2, [r2, 0x8] ; could corrupt c2! The software workaround involves avoiding a pair of consecutive instructions with these properties. For example, if a conditional coprocessor two-word load or store appears, ensure that the following instruction is not a coprocessor load or store: cfstr64ne c0, [r0, 0x0] ; assume does not execute nop ; separate two instructions cfldrs c2, [r2, 0x8] ; c2 will be ok Another workaround is to ensure that the first instruction is not conditional: cfstr64 c0, [r0, 0x0] ; executes cfldrs c2, [r2, 0x8] ; c2 will be ok Note: If both instructions depend on the same condition code, the error should not occur, as either both or neither will execute.
GCC does not emit conditional Maverick instructions. The sign is unaffected. The Cirrus crunch softfloat library has integer asm code to check for denorm values before these operations e.
Making fast floating point math work on the Cirrus MaverickCrunch floating point unit
Mainline GCC support has never worked for it but there is a modified compiler available that does and that is able to generate Crunch-accelerated Debian packages. Discussion specific to it usually happens on the linux-cirrus mailing list. The revision of a chip is printed as the 5th and 6th characters of the second line of text on the chip housing. The now rare D0 revision has a more extensive range of hardware bugs than the later revisions; from D1-E2 no further modifications were made to the design of the Maverick unit. Here we only attempt to work around the bugs in the later series. Cirrus stopped development of its ARM devices on 1st April no joke!
[ARM] Cirrus EP93xx Maverick Crunch Support - "bge" pattern
The MaverickCrunch is a floating point math coprocessor core intended for digital audio. Plagued with hardware bugs and poor compiler support, it was seldom used in any of the devices based on those chips and the product line was discontinued on April 1, The coprocessor has 16 bit registers which can be used for or bit integer and floating point operations and its floating point format is based on the IEEE standard. It has its own instruction set which performs floating point addition, subtraction, multiplication, negation, absolute value, and comparisons as well as addition, multiplication and bit shifts on integers. It also has four bit registers on which can perform a bit multiply-and-accumulate instruction and a status register, as well as conversions between integer and floating point values and instructions to move data between itself and the ARM registers or memory. It operates in parallel with the main processor, both processors receiving their instructions from a single bit instruction stream. Thus, to use it efficiently, integer and floating point instructions must be interleaved so as to keep both processors busy.
Datasheet EP9302-CQZ - Cirrus Logic PROCESSOR ARM 9, SMD, 9302, LQFP208
High-Performance, Networked, ARM9, System-on-Chip Processor