*MACfns* v4.0 is a set of **257 ultra-fast Assembler
functions** which you can use to speed up your APL+Win applications.

Extensively documented, *MACfns* executes faster, in less space, and with greater accuracy than APL primitives, avoiding
WS FULL and LIMIT ERRORs and often returning smaller results. A parallel suite of APL analog
functions is also included for study and migration.

*MACfns* requires:

- a version of APL+Win at least equal to v3.6+ (or APL+DOS 6.0)

Download a sample *MACfns* APL+Win workspace:

You can download this workspace and try the Assembler functions it contains with any version of APL+Win greater or equal to APL+Win v3.6+.

Note: The MACFNS1.W3 workspace is delivered "for trial only purposes": you are not allowed to include these Assembler functions in sofwtare you distribute or sell!

The *MACfns* product itself comes as a workspace (**173KB MACFNS.w3**) and an APL component file (**1623KB MACFNS.sf**), both
packaged into a ZIP file (**653KB MACfns40.zip**).

No installation is necessary: simply unzip into any folder accessible to APL.

MACfnsin any way. What you are licensingMACfnsconsists of Software and Documentation. The Software consists of Compiled Assembly Code, APL cover functions, and APL analog functions. The terms of this Agreement pertain to both the Software and the Documentation. This Agreement also covers both the initial version you receive, and any subsequent versions or updates you may receive. Sykes Systems, Inc. is the author and owner of all title, rights, and interest inMACfns. Your License Sykes Systems, Inc. hereby grants _____________________ ("You") a nonexclusive license to useMACfnsSoftware and Documentation on your own computers, and to incorporate the Software (but not the Documentation) into products distributed to others. You may modify the Documentation, APL cover functions, and APL analog functions as needed for your purposes. You may modify the Compiled Assembly Code, but only in the ways described in the Documentation (under "Customization"). You may make as many copies of the Software and Documentation as needed, but you must safeguard them as you would your own proprietary and confidential information.MACfnsis protected by United States copyright laws and international treaties, and you must treat it accordingly. What you may not do You may not distribute the Documentation ofMACfnsto others outside your organization under any circumstances. You may not distribute the Software ofMACfnsto others outside your organization, except as incorporated into and embedded within your products. In particular, the Software of MACfns must not be directly accessible or examinable by users of your products. You may not place any component ofMACfnsso that it is accessible via a public network such as the Internet. You may not modify the Compiled Assembly Code except as described in the Documentation (under "Customization"). You may not reverse engineer, disassemble, decompile, or make any attempt to discover the source code of theMACfnsCompiled Assembly Code, nor allow others to do so. You may not sublicense, rent, lease, or lend any component ofMACfnsto others. You may not transfer this License to another person or legal entity without written authorization from Sykes Systems, Inc. Limited Warranty Sykes Systems, Inc. warrants that for a period of six months after delivery ofMACfnsto you that the Software will perform in substantial accordance with the Documentation. We do not warrant the merchantability or fitness ofMACfnsfor any particular purpose. We will not be liable for any direct, incidental, or consequential damages arising from the use of, or inability to use,MACfns, nor for claims from another party. Term and Termination This License Agreement takes effect upon your receipt of MACfns and remains effective until terminated. You may terminate it at any time by destroying all copies ofMACfnsin your possession. It will also automatically terminate if you fail to comply with any term or condition of this License Agreement. You agree on termination to destroy all copies of the Software and Documentation in your possession. ConfidentialityMACfnscontains trade secrets and proprietary know-how that belong to Sykes Systems, Inc. and it is being made available to you in strict confidence. You must ensure the protection and confidentiality ofMACfns. Any use or disclosure of the Software or Documentation, or of its algorithms or protocols, other than in strict accordance with this License Agreement, may be actionable as a violation of our trade secret rights. General Provisions This written License Agreement is the exclusive agreement between us concerningMACfns. It may be modified only in writing signed by both of us. In the event of litigation between us concerningMACfns, the prevailing party in the litigation will be entitled to recover attorney fees and expenses from the other party. This License agreement is governed by the laws of California. Your Name: _________________________ _________________________ (Print) (Signature) Company: _________________________ _________________________ (Date) Title: _________________________ Please return to Lescasse Consulting Eric Lescasse 18 rue de la Belle Feuille 92100 Boulogne France

MACfnsin any way. What you are licensingMACfnsconsists of Software and Documentation. The Software consists of Compiled Assembly Code, APL cover functions, and APL analog functions. The terms of this Agreement pertain to both the Software and the Documentation. This Agreement also covers both the initial version you receive, and any subsequent versions or updates you may receive. Sykes Systems, Inc. is the author and owner of all title, rights, and interest inMACfns. Your License Sykes Systems, Inc. hereby grants _____________________ ("You") a nonexclusive license to useMACfnsSoftware and Documentation on your own computers. You are an individual person, and will not share the Software or Documentation with other persons. You may modify the Documentation, APL cover functions, and APL analog functions as needed for your purposes. You may modify the Compiled Assembly Code, but only in the ways described in the Documentation (under "Customization"). You may make as many copies of the Software and Documentation as needed, but you must safeguard them as you would your own proprietary and confidential information.MACfnsis protected by United States copyright laws and international treaties, and you must treat it accordingly. What you may not do You may not distribute the Documentation ofMACfnsto others under any circumstances. You may not distribute the Software ofMACfnsto others under any circumstances. You may not place any component ofMACfnsso that it is accessible via a public network such as the Internet. You may not modify the Compiled Assembly Code except as described in the Documentation (under "Customization"). You may not reverse engineer, disassemble, decompile, or make any attempt to discover the source code of theMACfnsCompiled Assembly Code, nor allow others to do so. You may not sublicense, rent, lease, or lend any component ofMACfnsto others. You may not transfer this License to another person or legal entity without written authorization from Sykes Systems, Inc. Limited Warranty Sykes Systems, Inc. warrants that for a period of six months after delivery ofMACfnsto you that the Software will perform in substantial accordance with the Documentation. We do not warrant the merchantability or fitness ofMACfnsfor any particular purpose. We will not be liable for any direct, incidental, or consequential damages arising from the use of, or inability to use,MACfns, nor for claims from another party. Term and Termination This License Agreement takes effect upon your receipt of MACfns and remains effective until terminated. You may terminate it at any time by destroying all copies ofMACfnsin your possession. It will also automatically terminate if you fail to comply with any term or condition of this License Agreement. You agree on termination to destroy all copies of the Software and Documentation in your possession. ConfidentialityMACfnscontains trade secrets and proprietary know-how that belong to Sykes Systems, Inc. and it is being made available to you in strict confidence. You must ensure the protection and confidentiality ofMACfns. Any use or disclosure of the Software or Documentation, or of its algorithms or protocols, other than in strict accordance with this License Agreement, may be actionable as a violation of our trade secret rights. General Provisions This written License Agreement is the exclusive agreement between us concerningMACfns. It may be modified only in writing signed by both of us. In the event of litigation between us concerningMACfns, the prevailing party in the litigation will be entitled to recover attorney fees and expenses from the other party. This License agreement is governed by the laws of California. Your Name: _________________________ _________________________ (Print) (Signature) _________________________ (Date) Please return to: Lescasse Consulting Eric Lescasse 18 rue de la Belle Feuille 92100 Boulogne France

*MACfns* is a large suite of APL functions (250) which uses
assembly language to achieve extraordinary speed and unmatched
precision. We will discuss some of the characteristics,
advantages, and complexities of using assembly language, with
code examples and illustrations of the development and usage of
*MACfns*.

Assembly language is a [truly] primitive computer language which translates [barely] human-readable code directly into the low-level CPU (central processing unit) instructions which the computer executes, called machine code. Essentially, it is one-to-one mapping of mnemonics into machine instructions, the language of the hardware. Programming at any level beneath assembly language requires a soldering iron (although a yet lower level, called microcode, is accessible to the implementers of the microprocessor chips themselves). Therefore, assembly language is intimately connected with the machine upon which it executes. The assembly language for an IBM(r) PowerPC(r) chip is completely different from that for a z9(r) mainframe, an UltraSPARC(r) chip, an Intel(r) Itanium(r) chip, or an Intel Pentium(r) processor.

Fortunately, Intel Corporation has diligently maintained a strategy of backwards compatibility in what it calls Intel Architecture (IA), which is the instruction set and underlying logical structure for the x86 family of processors (which does not include the Itanium). This family, started almost three decades ago, is the broadest and the most commercially successful microprocessor architecture ever invented. Starting with the 8086/8087/8088, it evolved through the 80286/80287, the i386(r)/387, the i486(r), the Pentium and its several generations, and continues today with the Core Duo(r) processors. It is an amazing testament to Intel that most assembly language programs written in 1978 for a 16-bit 8088 processor will execute successfully in 2006 on a 64-bit Pentium Extreme Edition or Core 2 Duo machine. (In fact, with minimal translation, 8-bit assembly programs written for a 1973-release Intel 8080 will also work.) Furthermore, other manufacturers, notably AMD(r), have maintained close compatibility with IA, in some cases even leading the way to new extensions.

For the balance of this paper, we will refer to "assembly
language" exclusively as that for the instruction set of IA, in
which *MACfns* is written. We will further base our discussion
only on the single-CPU chips on which all APL implementations
to date are based, and upon which we can comment knowledgeably.
A minor nomenclature issue: humans write "assembly code" in
"assembly language"; an "assembler" translates assembly code
into "machine code", which is what machines execute. Sometimes
assembly code is called "source code", "assembler code", or
confusingly, just "assembler" (meaning the human-authored code,
not the translator). The machine code is sometimes called
"object code", but even more confusingly, is also called
"assembler code"; we have been guilty of this imprecision in
*MACfns* documentation. However, in this paper we will use and
maintain the distinction between the terms assembly language,
assembly code, and machine code.

Assembly language has instructions for the basic arithmetic and logical operations, data movement, comparisons, branching, shifting, flag manipulation, low-level I/O, and miscellaneous operations. A second set of assembly instructions manipulates the floating point unit (FPU), which provides a richer set of mathematical operations. And yet a third set, having several subsets, provides access to various SIMD (single-instruction multiple-data) capabilities of the latest generations of processors. Intel has made tremendous strides in the underlying implementation of these instructions over the years, with names like pipelining, prefetching, superscalar execution, instruction reordering, speculative execution, shadow registers, retirement buffers, and branch prediction. While these optimizations benefit the speed of programs, and often affect the optimal selection and arrangement of assembly instructions in a program, they do not affect the correctness of the program -- it executes precisely the way it was programmed.

As in APL, instructions in assembly language are interpretive. An instruction cannot safely execute without having the results of prior instructions. Most instructions execute in only one or two cycles, but some of the more complicated can take 50 or more, and others can be executed in multiples per cycle. (The cycle time of the machine is the reciprocal of its frequency rating, measured in hertz; a 3 GHz machine runs at three billion cycles per second.) While it was once possible to predict the execution time of an assembly language program from its emitted instructions and the speed of the machine in which it executed, this is no longer true; one must understand the optimizations underlying IA to be able to even approximate the execution time. Furthermore, the interactions of multiprocessing and multiple threads executing concurrently introduces unpredictability, such as the untoward flushing of memory caches.

For our purposes, however, we shall concentrate on what
differs between APL and assembly language, and why *MACfns* is
implemented in assembly language rather than in a higher-level
language such as C (in which the APL+Win(r) interpreter itself is
implemented). Consider the following, typed in desk calculator
mode:

⎕←A←121212+192947 314159

It seems trivial, but literally thousands of machine instructions are executed for this calculation, taking a few microseconds. The user is prompted and the machine goes into a wait state awaiting input. Each character entered is translated and recorded in a screen buffer; the text is rationalized (backspaces and cursor movements resolved). Upon receipt of a carriage return (Enter), storage is allocated for the statement; the line is parsed and tokenized; the symbol table is updated; the digits are converted into numbers via a series of multiplications and additions, then classified as Boolean, integer, or floating point; and (depending on the APL system) the line is evaluated syntactically. Finally, APL is ready to execute the line.

Since the line has no leading ∇, ), or ], APL interprets it as an APL statement. The constants are moved to an execution buffer. The interpreter branches to an entry for the dyadic + routine, which evaluates the rank, length, and type of its arguments (checking for RANK, LENGTH, and DOMAIN errors), allocates storage for the result (checking for WS FULL), and runs a [one-iteration] loop which adds the two numbers (checking for overflow and LIMIT error), storing the result in the temporary memory entry. The A← assignment then, after assuring that A is neither a function nor label, attaches the temporary memory entry to A, freeing any storage previously associated with A. Finally, the ⎕← assignment is evaluated by allocating storage for a character vector, formatting the value via a series of divisions and subtractions, outputting each digit to the screen or terminal, and freeing the storage. Then APL outputs a new line and six-space indent and returns to a wait state.

Whew! The exact sequence differs slightly in different APL implementations, but you get the picture.

Now consider the following assembly code:

MOV EAX,121212 ADD EAX,192947

This moves the 32-bit constant 121212 into the register EAX (of which there are only eight [16 in 64-bit processors]), then adds another 32-bit constant to it. At the same time, it sets flags indicating the sign of the result, the parity of its low 8 bits, and whether it overflowed 4 bits (for BCD arithmetic), 31 bits (signed doubleword arithmetic), or 32 bits (unsigned arithmetic). That's all it does. There is no input or output; there are no changes to memory. It also executes in less than a nanosecond.

During Sykes Systems' development of *MACfns*, we have
developed tools to let us assemble and execute such instructions
on the fly (naturally, the diamond separates statements):

RUN 'MOV EAX,121212 ⋄ ADD EAX,192947' 10 bytes code: EAX=314159 EBX=0 ECX=0 EDX=0 EBP=0 ESI=0 EDI=0 FLG=514 FLAGS: JA, JG [no flags]

This displays the seven general-purpose registers and the flags register upon completion. These eight values are the explicit result of ⎕CALL. The stack register ESP is not included, nor is the instruction pointer EIP, segment registers, descriptor table and status registers, nor other special-purpose registers.

Perhaps we want to examine the machine code itself, and then run it separately:

⎕←MX←CODE 'MOV EAX,121212 ⋄ ADD EAX,192947' +|Ù⍀¨∣⍙€

This gibberish can be clarified by using ∆AF from *MACfns* (atomic
function, here used as (⎕AV⍳MX)-⎕IO):

∆AF MX 184 124 217 1 0 5 179 241 2 0 256⊥0 1 217 124 121212 256⊥0 2 241 179 192947 2 HEX ∆AF MX ⍝ for the cognoscenti, or masochists B8 7C D9 01 00 05 B3 F1 02 00

We see that the constants are indeed embedded in the code (backwards, the much-debated "little endian" characteristic of Intel Architecture). We can infer that 184 (B8) is the opcode (operation or instruction code) for MOV EAX and 5 is the opcode for ADD EAX (it's a little more complicated, however).

Now that we have assembled our teeny assembly code program into machine code MX, let's try to run it using APL+Win's ⎕CALL, which enables us to execute the machine code directly:

⎕CALL MX DOMAIN ERROR ⎕CALL MX ^

We are missing two crucial elements. The first is that ⎕CALL requires the first four bytes to be a system-dependent signature value (it differs between APL+Win and APL+DOS) to help discourage inappropriate use. Its absence is the cause of the DOMAIN ERROR. The second is that after our snippet of code runs, it doesn't know where to go; it fact it will scamper off into the workspace, executing all manner of nonsense until, in a few nanoseconds, APL tosses us out on the street (i.e., Windows(r)) for our misbehavior.

The signature value is easily obtained from any MACfns machine code (what we loosely refer to as assembler code in our documentation):

↑⍙AF 2000042035 82 ⎕DR ↑⍙AF ⍝ show it as characters 386w

The final instruction executed must be a return-from-procedure instruction (which pops the system stack and jumps to a memory address which ⎕CALL initially placed there for a graceful exit). Here is a proper assembly program for APL+Win (DD defines the signature as a doubleword [four bytes], and RETN returns from a near procedure, which characterizes all MACfns):

AC←'DD 2000042035 ⋄ MOV EAX,121212 ⋄ ADD EAX,192947 ⋄ RETN'

which we can assemble into machine code,

∆AF ⎕←MC←CODE AC 386w+|Ù⍀¨∣⍙€Ã 51 56 54 119 184 124 217 1 0 5 179 241 2 0 195

and execute,

⎕CALL MC ⍝ 121212+192947 314159 0 0 0 0 0 0 514

Voilà! Our first working assembly code program.

Herein lies a major advantage of assembly code and the resultant machine code for MACfns: it is extremely lean. The overhead of ⎕CALL MC is about the same as that for A+B when A and B are integer scalars and the expression has already been parsed, tokenized, and evaluated for syntax. There is no interpretive overhead or analysis, storage allocation or movement of data (except for the result of ⎕CALL itself), or interface to external or asynchronous processes. Furthermore, for small programs assembly code can be simpler and less verbose than C code.

If we are extremely careful, we can modify and run machine code directly. (If we make a mistake, we could lose our APL session when we execute it.) For example, before we noticed that the embedded constant 121212 was encoded as ∆AF 124 217 1 0, which is now the sixth-ninth bytes of MC. We can change the constant in variable MC,

∆AF MC[6 7 8 9] 124 217 1 0 MC[6 7 8 9]←82 ⎕DR 7053 ⍝ or ∆AF⌽(4⍴256)⊤7053

and rerun the code,

⎕CALL MC ⍝ 7053+192947 200000 0 0 0 0 0 0 530

to see the effect. This is the basis of the mechanism MACfns offers to customize the machine code for user-settable defaults and other characteristics which extend its flexibility and utility. As you might guess, the smallest machine code we can run in APL+Win is

⎕CALL 2000042035 195 ⍝ or ⎕CALL '386wÃ' 0 0 0 0 0 0 0 530

which is solely the signature value and a RETN instruction, five bytes in total.

So far the code we've produced does nothing useful. Below
is a more substantial program with practical utility. While it
would never be released in *MACfns* (it's far too limited), it does
illustrate our style of writing assembly code. Observe that
assembly code uses the semicolon instead of a lamp for comments.

GET 'GCDI' ⍝ The code resides in an APL component file. ; ∇ Z←L ∆GCDI R ; [1] Z←↑(L,R)⎕CALL ⍙GCDI ∇ CPULEV EQU 0 ; Greatest common divisor of integers in EAX and EBX saved to EAX. ; The result is positive unless both arguments are 0 or ¯2*31. ; For Boolean arguments, (L ∆GCDI¨R)≡L∨R ; Copyright Sykes Systems, Inc. 9Nov2006/Roy ⊂MACfns⊃ NEG EAX JG SHORT POS JZ SHORT ZER NEG EAX POS: NEG EBX ; 1≤EAX≤2147483647 or EAX=¯2147483648 JG SHORT LP JZ SHORT XIT NEG EBX LP: XOR EDX,EDX ;⊤Euclidean algorithm calculates EAX=L GCD R DIV EBX ;∣(EAX EDX)←0 EBX⊤EAX (ignore quotient EAX) MOV EAX,EBX ;∣divisor is result or next dividend EAX TEST EDX,EDX ;∣EDX is remainder ÷ MOV EBX,EDX ;∣remainder is next divisor if nonzero EBX JNZ SHORT LP ;⊥12-byte/6-inst. loop RETN ; exit ZER: MOV EAX,EBX ; EAX was 0, so return ∣EBX NEG EBX JL SHORT XIT XCHG EAX,EBX XIT: RETN ; exit

Here is how we would assemble GCDI into code to be released
in *MACfns*. The machine code is stored in a global variable named
⍙GCDI, which is called by an APL function named ∆GDCI:

CPL 'GCDI' ⍝ Compile (assemble) code as global ⍙GCDI. 37 0 3 40 ⍝ The result is some statistics about the size of the code. ⍙GCDI ⍝ the generated machine code 2000042035 75487479 ¯654895244 75488247 ¯604563852 ¯201862575 ¯762985589 ¯193602933 ¯138179645 ¯1828619045 195 1229210439 538976288 2006110900 ⎕←S←¯3↑⍙GCDI ⍝ its last three integers 1229210439 538976288 2006083100 (82 ⎕DR 2⍴S),(0,3⍴100)⊤↑⌽S ⍝ decipher the suffix GCDI 2006 8 31 0

The assembly process automatically inserts the leading signature
value. It also appends the root name of the function, its
timestamp, and its required CPU level (here for any IA-32 chip,
such as an i386), all of which are documented standards of *MACfns*.
The trailing 195, the machine code for RETN, also happens to be
visible in ⍙GCDI; it usually is not.

⍙GDCI is an integer vector rather than a character vector.
As long as its right argument is simple and homogeneous (numeric
or character), ⎕CALL does not care about the rank, shape, or
datatype of what it executes; it simply points the machine (via
the instruction pointer) to the fifth byte of data (i.e., that
immediately following the signature value) and lets it run.
We chose the integer vector representation for *MACfns* because
Booleans are too sparse, characters display sloppily (as we saw
in MX and MC above) and are fragile (editing them can change them
unexpectedly), and floating point can contain values that are not
real numbers (such as infinities and NaN's, not-a-number) which
can freeze APL when displayed or be changed unexpectedly by APL
operations.

Below would be the APL cover function in *MACfns*. It has
standard naming and calling conventions, cryptic documentation
solely to assist memory (the full documentation is in a separate
file, and would be named dGCDI), and a distinctively-constructed
copyright notice on line [2] ('⍝∇' and 'Copyright' are separated
by ∆AF 8 255 (or (¯1↓⎕AV),⎕TCBS), invisible here but available to
code management systems).

∇ Z←L ∆GCDI R [1] ⍝∇Greatest common divisor of two integer scalars. [2] Z←↑(L,R)⎕CALL ⍙GCDI⍝∇Copyright 2006 Sykes Systems, Inc. 9Nov2006 ⊂MACfns⊃ ∇

Atypically, the numeric arguments in ∆GCDI are presented directly
to ⎕CALL, and the result is [the first element of] that of ⎕CALL.
Since ⎕CALL only allows up to seven integers in its left argument,
and returns only eight, this method is extremely limiting.
Furthermore, there is no error checking beyond that which ⎕CALL
itself performs. Later we shall see how *MACfns* typically passes
and checks data.

All assembly-based functions in *MACfns* also have one or more
APL analog functions, which can be used for comparison, testing,
study, or in migrations to other APL systems. Here is what the
one for ∆GCDI might look like:

∇ Z←L aGCDI R [1] ⍝∇Greatest common divisor of two integer scalars. [2] ⍝∇Copyright 2006 Sykes Systems, Inc. 9Nov2006 ⊂MACfns⊃ [3] ⍝ The result is an integer scalar, positive [4] ⍝ unless both arguments are 0 or ¯2147483648. [5] ⍝ ⎕ERROR(2≠⎕NC'L')/'VALENCE ERROR' [6] ⍝ ⎕ERROR(~1 1≡(⍴1/L),⍴1/R)/'RANK ERROR' [7] ⍝ ⎕ERROR(323≠⎕DR 2,L,R)/'DOMAIN ERROR' [8] L←↑L ⋄ Z←↑R [9] :repeat [10] R←Z [11] :until 0=L←(Z←L)∣R [12] :if Z≠¯2147483648 ⍝ ¯2*31 [13] Z←∣Z [14] :endif ∇

We find they work identically, here used with each (¨):

L←149+⍳11 ⊃L (L ∆GCDI¨180) (L aGCDI¨180) 150 151 152 153 154 155 156 157 158 159 160 30 1 4 9 2 5 12 1 2 3 20 30 1 4 9 2 5 12 1 2 3 20

but even in this small application ∆GCDI is about three times faster than aGCDI. For less numerically tractable arguments, such as

L←506750217 2140522190 2025047149 1220210790 1857457613 R←235578269 1766677554 1169568863 953647643 712205927 L ∆GCDI¨R 1 2 1 1 1

∆GCDI is over fifteen times faster. Actually, the machine code is hundreds of times faster, but the ∆GCDI cover function and its ⎕CALL, catenate (L,R), and first (↑) take the vast majority of the time here; the each (¨) masks the difference even more.

This brings us to a second compelling reason we use assembly language: testing and iteration are hundreds to thousands of times faster in machine code than in APL, and several times faster than in compiled languages. Often no explicit tests at all are needed, the flags set by preceding instructions providing all the information needed. Special-case testing, for which the cost in APL must be carefully weighed against the expected gains, becomes far more feasible in assembly language. ∆UNTAB is typical of a function with many useful special cases for which the APL testing is far more expensive (to the point of not being justified) than the testing in assembly language.

More individualized treatment of particular cases also enables greater accuracy and speed. For example, our original prototype assembly code for the hyperbolic sine (∆SINH or 5○R) was only about twice as fast as APL and no more accurate. The time was dominated by the mathematical computation ((*R)-*-R)÷2, but careful numerical analysis disclosed a wealth of individual optimizations for different ranges (almost two dozen in all). Assembly language allows us to test scalar-by-scalar for all these cases, and invoke much faster and more accurate code for each. The result was that the minimum speedup increased to over four times, with speedups of ten or more times not uncommon, and rare cases exceeding 1,000 times.

Another powerful reason we use assembly language is that it affords us direct access to the data structures of APL variables and the internal representation of data. For example, ∆NELMe R computes the number of elements in each item of a nested (↑¨⍴¨,¨R). It is up to 20-400 times faster than APL not only because it can iterate vastly faster than each (¨), but also because the information is already available internally in each APL item. APL code must either ravel each item to obtain the number of elements (thus potentially copying the entire array), or perform expensive multiplications via ×/¨⍴¨R, which moreover fails if R is empty. ∆NELMe also moves far less data (see below).

Similarly, ∆REVA (which reverses all axes of an array) can be more than 100 times faster than APL on Boolean arrays because it can simultaneously manipulate the data at the bit, byte (8 bits), word (16 bits), or doubleword (32 bits) level. Likewise, ∆F2LOG R (⌊2⍟∣R) does not actually compute any expensive logarithms; instead, it directly analyzes the internal binary exponent of floating point numbers to achieve its 25-150 times speedup (you can do the same thing in APL, albeit more slowly, by using ⎕DR).

A fourth reason to use assembly language is avoidance, or anticipation, of data movement. Modern processors are increasingly memory-bound, which means the CPU is idling awaiting data to be fetched from main memory. The main architectural chip optimizations are prefetching anticipated data and the proliferation of caches -- primary, secondary, and even tertiary, both on-chip and off-chip -- to stage that data closer to the CPU. ∆DIVID (divide increment by decrement, or (R+1)÷R-1) moves data only once, whereas APL must move it thrice. The typical speed ratio of 2.5-5 times essentially reflects this reduced memory access more than any improvements we could make to division and addition. A substantial part of the speedup to ∆NELMe is also because large intermediate nested arrays (,¨ and ⍴¨) need not be created and destroyed.

A final reason we use assembly language is that it enables
algorithms which would actually be more difficult to code, and
much more expensive to run, in other languages, including APL.
These tend to be high-iteration or bit-fiddling kinds of problems,
such as encoding and decoding data for transmission, data
encryption and compression, and one we have implemented in *MACfns*,
∆CRC (cyclic redundancy check). aCRC is not a trivial exercise
in APL, but in assembly language the fundamental operation uses
only four machine instructions per byte of data, and the
resulting machine code runs 1,000-4,000 times faster than APL.
There are other cases where assembly language has access to
instructions, facilities, and hardware which are unavailable in
other languages or APL.

∆GCDI uses the values of its arguments directly as the left
argument to ⎕CALL; they therefore must be integers. Only eight
functions in *MACfns* do so, including ∆CHKTS, which checks a
single ⎕TS-form date or timestamp T, conveniently a seven-element
integer vector. This method is extremely lean and fast; in fact
the overhead of calling the APL cover function alone is about
double the cost of invoking the ⎕CALL and executing the machine
code. Thus, while ∆CHKTS T is about 3-4 times faster than even
the most highly-optimized APL code (APL analog function aCHKTS1
[some functions in *MACfns* have multiple APL analog functions,
some coded for clarity or pedagogy and some for speed]),
expression ↑T ⎕CALL ⍙CHKTS is about 9-12 times faster, and most
of that time is due to the overhead of the ⎕CALL and first (↑).

However, this method of passing arguments and returning
results directly is generally too restrictive. The normal way
*MACfns* does so is indirectly, by referencing the ⎕STPTR of the
variable names:

⎕STPTR'R Z L' ⍝ differs between workspaces 12 4 9

These values are arbitrary but unchanging and unique symbolic
handles for each name in a workspace, and are independent of the
class, referent, or value of the name. (They used to be symbol
table pointers in APL+PC, hence the name ⎕STPTR.) APL2000(r)
supplies a set of internal subroutine calls with APL+Win, called
Interpreter Support Services, which enable assembly programs to
use these indirect pointers to inquire about, establish, modify,
and erase variables. Here is how *MACfns* uses them:

L←6 ⋄ R←⍳10 ⍝ the arguments for upper triangular matrix (⎕STPTR'R Z L')⎕CALL ⍙UTRI ⍝ run the machine code 0 10 0 0 ¯4 37389060 37311392 583 Z ⍝ the result has been created: 1 2 3 4 5 6 7 8 9 10 0 2 3 4 5 6 7 8 9 10 0 0 3 4 5 6 7 8 9 10 0 0 0 4 5 6 7 8 9 10 0 0 0 0 5 6 7 8 9 10 0 0 0 0 0 6 7 8 9 10

The ⎕STPTR's of the names of the right argument, result, and left argument are presented to ⎕CALL. ⎕CALL executes the machine code in ⍙UTRI, passing the ⎕STPTR handles to it in registers EAX, EBX, and ECX. The machine code calls APL2000's Interpreter Support Services with these handles to reference the arguments L and R and create the result Z, which it then fills in before returning back to APL.

This is all rather clumsy, which is why *MACfns* has APL cover
functions to handle the interaction. Here is the one for ∆UTRI:

∇ Z←L ∆UTRI R [1] →⍙rzl⎕CALL ⍙UTRI⍝∇L-row upper triangular mat from vec R: R×[⎕IO+1](⍳L)∘.≤⍳⍴R [2] ⎕ERROR'VALENCE ERROR'⍝∇Copyright 2006 Sykes Systems, Inc. 31Aug2006 ⊂MACfns⊃ [3] ⎕ERROR'DOMAIN ERROR' [4] ⎕ERROR'RANK ERROR' [5] ⎕ERROR'WS FULL' [6] ⎕ERROR'NONCE ERROR' [7] ⎕ERROR'LIMIT ERROR' ∇

As in ∆GCDI, we see standard naming and documentation conventions, but here we encounter two new aspects. One is the use of ⍙rzl on line [1] instead of ⎕STPTR'R Z L' as the left argument of ⎕CALL; ⍙rzl is a global variable which is preset (usually during workspace initialization in ⎕LX) as

⍙rzl←⎕STPTR'R Z L

We use ⍙rzl because computing ⎕STPTR'R Z L' is fairly expensive, since it must look up (and if necessary enter) the names in the symbol table. In this case, executing ⎕STPTR'R Z L' takes about as much time as executing ⍙rzl ⎕CALL ⍙UTRI, so using ⍙rzl instead effectively increases the speed of ∆UTRI by about 50% in small cases (the balance being the overhead of the ∆UTRI function call, localizing its names, and the branch). To minimize the number of ⎕STPTR variables, the order of the ⎕STPTR's presented to MACfns is always the same: right argument, result, {left argument, {temporary local}}. Thus only a maximum of four are needed.

The other new aspect is the branch on line [1] and the error
signaling on lines [2] to [7]. Just before returning to APL,
⎕CALL loads a return code into the EAX register, which is the
first element of the result of ⎕CALL, to which APL branches,
normally to 0 (end the function). This lean mechanism is the
method *MACfns* uses to signal errors. *MACfns* also sometimes uses
the mechanism to branch to a line of APL code to complete a task,
or to deal with cases for which the machine code is not prepared.
A notable advantage of this method is that all possible errors
are listed visibly in the body of the APL cover function.

The Interpreter Support Services do have several weaknesses
which we try to circumvent as much as possible in *MACfns*. They
are relatively slow; they perform inadequate error checking; they
have undocumented behavior and return codes; they impose small
and arbitrary size limits; and they lack a number of useful
facilities. Nonetheless, on the whole they are adequate to our
needs for *MACfns* development.

There are two main disadvantages to assembly language:

- It is cumbersome, hard to write, slow (to write), and fraught
with mystery.
APL programmers would be aghast at the primitiveness of assembly language and its facilities. The following code fragment is a loop [which is the only way] to add two arrays:

L22: MOV EAX,[ESI] ; get an item from L ADD ESI,4 MOV EDX,[EBX] ; get an item from R ADD EBX,4 ADD EAX,EDX ; add them JO SHORT OVF ; jump if overflow MOV [EDI],EAX ; store the result in Z ADD EDI,4 DEC ECX JNZ SHORT L22

We need ten instructions for each item in this 22-byte loop. Six out of the seven general-purpose registers are used (only EBP is available) -- heaven forbid we need do something more complicated than addition. It only handles the case when both arrays and the result are of integer datatype (⎕DR type 323), and of the same size, which is but one of at least 14 simple cases which APL gracefully accommodates (not to mention nested arrays). The code does not show the machinations at label OVF needed to handle integer overflow (blowup to floating point).

As does APL, assembly language offers many ways of doing things. Below is another way of expressing the same loop:

L23: LODSD ; get an item from L ADD EAX,[EBX] ; add it to R JO SHORT OVF ; jump if overflow ADD EBX,4 STOSD ; store the result in Z LOOP L23

While this loop is only six instructions and 11 bytes, it still uses five of seven registers (EDX is now also free). However, it is also inexplicably (if one is not intimately familiar with the specifications of modern processors) much slower. Both loops would be prefaced in

*MACfns*by about 200 handcrafted assembly statements to initialize, check, and prepare everything, and the Interpreter Support Services called doubtless have many more. Finally, both loops shown above, while operational, have deficiencies; the actual loop in*MACfns*would be more complex (and faster).Of course, it's not quite as bad as it first seems. We have accumulated quite a set of tools and macros (bodies of drop-in code) during the development of

*MACfns*and our work for clients. Nonetheless, new challenges arrive constantly. Furthermore, one of the key advantage of*MACfns*is speed; simply-adequate working code (like the loops above) is not sufficient.A major mitigating factor in our development in assembly language is that we use APL. We have many tools and utilities written in APL (such as CODE, HEX, and RUN that we used above) which assist in our development;

*MACfns*itself is a vital component. We use APL in algorithm design, modeling, timing, testing, and verification. We use it to explore and verify properties of the chips, workspace structure, and data representations. We even use*MACfns*and APL to optimize our assembly language, bootstrapping their capabilities to enhance their capabilities. - Assembly language is machine dependent.
Although

*MACfns*is currently based on Intel Architecture, its dominance lessens our concern about migration to other platforms (no one has approached us yet). We provide APL analog functions (often more than one) for every function in*MACfns*to enable our clients to migrate to other APL systems and hardware architectures.By machine dependence we mean the evolution of the microprocessors on which IA runs. Each new optimization Intel or AMD introduces is one we must digest and consider incorporating into MACfns; the timing differences in the loops above are an illustration. Given limited resources, we can incorporate only some. As we alluded to in the beginning of this paper, we have not yet addressed hyperthreading or multi-core processors. The APL+Win platform itself, stable for many years, may introduce other changes with which we must cope; APLNext(r) is yet another consideration.

Were

*MACfns*written in a higher-level language, these changes would be much simpler, some automatically performed by a compiler. On the other hand, the utter reliability of*MACfns*(no one has ever reported a bug), its high-precision accuracy, its speed, and in some cases even its functionality, would be compromised by depending on notoriously fickle compilers.

Because it is based in IA-32 assembly language, MACfns runs only on 32-bit Intel Architecture machines, which includes Intel processors such as the Pentium, Celeron(r), Xeon(r), and Core Duo, and AMD processors such as the Athlon(r) and Opteron(r). It also runs on 64-bit IA machines using the APL+Win 32-bit interpreter. It does not run on RISC processors or mainframes.

Because *MACfns* uses APL+Win Interpreter Support Services,
and incorporates knowledge of the structure of variables in an
APL+Win workspace, it runs only under APL+Win (or APL+DOS). It
does not run under APLNext, APL+UNIX(r), APL2(r), APLX(r), or
Dyalog APL(r). However, *MACfns* has no direct dependencies on the
Windows operating system; thus (a slightly different, but fully
compatible) version of *MACfns* also runs in APL+DOS under the DOS
operating system.

*MACfns* has been developed over the last 19 years by Sykes
Systems, Inc., with its APL antecedents going back over 35 years.
It is written by and specifically for APL programmers, and is
thus easily understood by them. While the advantages of assembly
code are a major contributor to its extraordinary performance, it
is the thoughtful design, attention to detail, careful
programming, and literate documentation which truly distinguish
it. *MACfns* is in some ways a set of superior primitive functions
which improves both programmer and machine productivity. We can
summarize the benefits of *MACfns* as follows:

- Faster than APL+Win, typically from a few times to an order of magnitude or more. Also typically 2-8 times faster, more general, and handling larger arguments, than equivalent functions in the 'ASMFNS' workspace distributed with APL+Win.
- Uses less storage by detecting identities, avoiding numeric promotion, consolidating pointers, and using less (usually no) intermediate storage. Avoids WS FULL and storage thrashing.
- Sometimes accepts larger arguments and produces larger array results, and handles arrays of all rank and shape, avoiding LIMIT ERROR's; all size limits are documented, including those for corresponding APL expressions.
- Extended, and documented, domain and range for floating point calculations, avoiding LIMIT ERROR's and improving accuracy.
- Greater accuracy via improved algorithms, use of extended precision hardware, and careful attention to floating point precision considerations, intermediate calculations, and rounding.
- Generality in the definition of the utility functions to increase their usefulness in more situations.
- Extensive customization options, wherein documented locations within the machine code may be changed for useful effect; the APL cover functions may also be customized.
- Improved, consistent, and fully-documented error handling.
- Well-written APL analog functions to complement the APL cover functions and machine code. These are useful in testing and timing, during migration to other systems, and for studying APL technique. Multiple techniques are often provided.
- Extensive and complete documentation, including identities, limits, errors, related functions, and examples (430 pages in all).
- Reliable; never unexpected performance, undocumented errors, or catastrophic termination of the APL session. No bugs have ever been reported, although we have uncovered some and have reported them to our users.

We have described characteristics of assembly code, and how
and why we use it for *MACfns*. We encourage you to explore *MACfns*,
and to give us suggestions on future direction (the queue is
always growing, but the order changes). We thank those of you
who have purchased and are using *MACfns*. Your financial support
enables continuing improvement and new capabilities in the future.

Attached to this paper are several documents provided with
*MACfns* Release 3.0 (31Aug2006) which may be of general interest,
or which augment the information presented in this paper. The
papers in the APL2000 User Conference notebooks from 2002-2005
also provide particularly comprehensive overviews of *MACfns*.

-----------------------------------------------------------------

AMD, Athlon, and Opteron are trademarks of Advanced Micro Devices, Inc. APLNext, APL2000, APL+UNIX, and APL+Win are trademarks of APLNow LLC. Dyalog APL is a trademark of Dyalog Ltd. Celeron, Core Duo, i386, i486, Intel, Itanium, Pentium, and Xeon are trademarks of Intel Corporation. APL2, IBM, PowerPC, and z9 are trademarks of International Business Machines Corporation APLX is a trademark of Micro APL Ltd. Windows is a trademark of Microsoft Corporation. SPARC is a trademark if Sparc International, Inc. UNIX is a trademark of X/Open Company, Ltd.-----------------------------------------------------------------

∇ Z←aCHKTS1 R;A;B [1] ⍝∇Check ⎕TS-form timestamp. [2] ⍝∇Copyright 2006 Sykes Systems, Inc. 31Aug2006 ⊂MACfns⊃ [3] ⍝ Limits are 1800 1 1 0 0 0 0 to 2200 12 31 23 59 59 999 [4] ⍝ (algorithm okay from 1600 to 3599). [5] ⍝ aCHKTS1 implicitly checks and engenders errors (like ∆CHKTS). [6] ⍝ aCHKTS explicitly checks and signals errors (unlike ∆CHKTS). [7] ⍝ slowest→fastest: aCHKTS, aCHKTS1, ∆CHKTS, ↑R ⎕CALL ⍙CHKTS [8] (R B A)←3⍴Z←7⍴R ⎕CALL'386wÃ' ⍝ check rank, length, domain; pad [9] ⍝ absolute year limits 1600-3599: [10] :if Z←Z≡1800 1 1 0 0 0 0⌈2200 12 31 23 59 59 999⌊Z ⍝ assure range [11] :andif A>28 ⍝ of valid dates, done for 91.99% [12] :andif A>0 31 28 31 30 31 30 31 31 30 31 30 31[⎕IO+B] ⍝ 7.94% [13] Z←29 2 0≡A,B,=/×4 100 400∣R ⍝ .07% [14] :endif ∇

-----------------------------------------------------------------

-----------------------------------------------------------------MACfns31Aug2006 Z←∆AF R Atomic function maps characters in ⎕AV to integers 0-255 or vice versa. The name is derived from ⎕AF in IBM's APL2 product. ∆AF is useful in avoiding clumsy constructions in documentation and code. For example, instead of referring to the Euro character as ⎕AV[⎕IO+2] (€), or setting ⎕IO and using ⎕AV[2] or ⎕AV[3], or explaining elsewhere that ⎕IO has a particular value, ∆AF 2 always suffices. If R is character, ∆AF is equivalent to (⎕AV⍳R)-⎕IO or (⍴R)⍴↑82 323 ⎕DR R. If R is numeric, ∆AF is equivalent to ⎕AV[⎕IO+R] or (⍴R)⍴↑((⎕DR R),82)⎕DR R, so all items of R must be integers 0-255 (else ∆AF signals an INDEX or DOMAIN ERROR). The shape of the result is that of the argument, and its type is integer or character. ∆AF is equivalent to [the APL coded, not assembly coded] function AV from the 'ASMFNS' workspace supplied by APL2000 with APL+Win version 3.6.02. ∆AF differs from APL+Win version 3.6.02 in the following respects: 1. Integer tolerance for floating point is slightly different, and more consistent, in ∆AF than in APL. 2. ∆AF needs space only for its result, but the equivalent APL expressions need from 3% more to nine times the space. 3. ∆AF is up to 2.5-8 times faster than (⎕AV⍳R)-⎕IO for [almost] any nonempty character array. ∆AF is up to 4-8 times faster than ⎕AV[⎕IO+R] for 70 or more integers, 10-60 times for 50 or more Boolean items, and 10-25 times for three or more floating point items. ∆AF is always faster than AV. 4. APL and AV can process up to 214,748,352 items. ∆AF R can process more, depending on the datatype of R as shown in the following table, which assumes R is a vector: ∆AF Datatype of R Expression Alternative Maximum items character (⎕AV⍳R)-⎕IO ↑82 323 ⎕DR R 268,000,126 Boolean ⎕AV[⎕IO+R] ↑ 11 82 ⎕DR R 1,072,000,504 integer ⎕AV[⎕IO+R] ↑323 82 ⎕DR R 429,496,719 floating point ⎕AV[⎕IO+R] ↑645 82 ⎕DR R 238,609,288 Errors: DOMAIN ERROR The argument is nested or heterogeneous. An item is from 0 to 255 but is not an integer. INDEX ERROR An item is negative or exceeds 255. LIMIT ERROR The result size would exceed 1,072,000,528 bytes (see the table above). Related functions: ∆CDR (change data representation), ∆XLATE (L[⎕AV⍳R]), ∆AVEPS (⎕AV∊R). Examples: ∆AF 2 3⍴'MACfns' 77 65 67 102 110 115 ∆AF 77 65 67 102 110 115 33 MACfns! (Documentation for WS FULL, which is common to almost all MACfns functions, is described elsewhere, and is not repeated in the detailed documentation of each function. The documentation is generally formatted for 60-line pages.)

-----------------------------------------------------------------MACfns31Aug2006 Z←∆SINH R Hyperbolic sine is equivalent to 5○R, returning the hyperbolic sine of R. The magnitude of R must not exceed 1025×⍟2 (about 710.47586). The magnitude of the result is greater than or equal to that of the argument, and the sign of the result is that of the argument. The hyperbolic sine is the reciprocal of the hyperbolic cosecant (∆RECIP ∆CSCH). ∆SINH is the odd component of the exponential function. The even component is ∆COSH (6○R), so (for (∣R)≤1024×⍟2, about 709.7827) (∆SINH R)+∆COSH R approximates ∆EPOW R (*R). ∆SINH is the inverse of ∆ASINH (¯5○R); that is, R≡∆SINH ∆ASINH R and (for (∣R)≤1025×⍟2) R≡∆ASINH ∆SINH R. ∆SINH differs from APL+Win version 3.6.02 in the following respects: 1. ∆SINH is slightly more accurate than 5○R, and always preserves the identities (∆SINH R)=-∆SINH-R and (×∆SINH R)≡×R and (∆SINH∣R)≥∣R (5○R sometimes does not). It also uses a different algorithm for integers than for floating point; the integer algorithm is faster and returns slightly more accurate results. 2. 5○R returns 0 for values of magnitude less than 2*¯1022 (about 2.2251E¯308); ∆SINH returns these values. 3. 5○R signals a DOMAIN ERROR if the magnitude of any item exceeds 1025×⍟2; ∆SINH signals a LIMIT ERROR. 4. If R is empty, 5○R returns an integer result; otherwise, it returns a floating point result. If R is empty or Boolean and all 0's, ∆SINH returns Boolean 0's; otherwise, ∆SINH also returns a floating point result. 5. ∆SINH is faster than APL for 6-10 or more items, up to 4-12 times for floating point, and 10-16 times for integer or Boolean. It makes no copy (and uses no space) if R is all Boolean 0's, and so can exceed 1,000 times faster. 6. APL can process up to 134,217,723 floating point items, 178,956,964 integers, or 214,748,352 Boolean items. ∆SINH can process up to 134,000,000 items (unlimited if R is all Boolean 0's). APL can process nested arrays; ∆SINH cannot. Errors: DOMAIN ERROR The argument is character and not empty. LIMIT ERROR The argument has more than 134,000,000 items and is not all Boolean 0's. An item exceeds magnitude 710.47586007394386. NONCE ERROR The argument is nested or heterogeneous. Related functions: ∆CSCH (÷5○R), ∆COSH (6○R), ∆TANH (7○R), ∆ASINH (¯5○R), ∆ACSCH (¯5○÷R), ∆EPOW (*R), ∆GD (¯3○5○R). Examples: A←¯1 0 .881373587 1 1.443635475 2 ⊃A (∆SINH A) ¯1 0 0.881373587 1 1.443635475 2 ¯1.175201194 0 1 1.175201194 2 3.626860408 R←3 3.951613336 3.989326806 4 4.025670416 ⊃R (∆SINH R) 3 3.951613336 3.989326806 4 4.025670416 10.01787493 26 27 27.2899172 28 Z←(∆HALF-⌿∆EPOW 1 ¯1∘.×R) (∆HALF ∆SUBR ∆EPOW R) (-∆SINH-R) D←¯1+2×⍳85 ⍝ odd coefficients 1 3 5 ... 169 Z←Z,((R∘.*D)+.÷!D) ((∆EPOW R)-∆COSH R) (∆RECIP ∆CSCH R) Z∊⊂∆SINH R ⍝ identities 1 1 1 1 1 1 G←.5×1+5*.5 ⍝ Golden ratio H←.2 .5,G,2 5 ⋄ ⊃H (⍟H) (∆SINH ⍟H) 0.2 0.5 1.618033989 2 5 ¯1.609437912 ¯0.6931471806 0.4812118251 0.6931471806 1.609437912 ¯2.4 ¯0.75 0.5 0.75 2.4

-----------------------------------------------------------------MACfns31Aug2006 Z←L ∆UTRI R Upper triangular matrix creates an upper triangular matrix having L rows and ⍴R columns. L must be a nonnegative integer scalar or one-item vector, and R must be a simple numeric (or empty character) vector. When R is Boolean and all 1's, then ∆UTRI is equivalent to the following expression, which generates an upper triangular Boolean histogram: (⍳L)∘.≤⍳⍴R However, in general ∆UTRI multiplies each row of this histogram by its vector right argument, and is equivalent to the following: R×[⎕IO+1](⍳L)∘.≤⍳⍴R or ((⍳L)∘.≤⍳⍴R)×(L,⍴R)⍴R The shape of the matrix result is L,⍴R. The type of the result is Boolean if empty, that of the right argument otherwise (APL returns empty results as integer). If the result is not empty, ∆UTRI is always faster than the faster (which can differ) of the two expressions above in APL+Win version 3.6.02. L ∆UTRI R⍴1 is faster than (⍳L)∘.≤⍳⍴R if the result has 75 or more items (but usually less). Speedups vary widely, ranging from 5-7 times faster for 500-item results up to several hundred times for large Boolean results, and are greatest for Boolean right arguments and smallest for floating point. When L exceeds ⍴R, speedups grow as L increases relative to ⍴R. ∆UTRI needs space only for its result; APL needs 2-33 times the space. APL can return up to 214,748,352 items; ∆UTRI can return up to 134,000,000 items. APL can process nested arrays; ∆UTRI cannot. Errors: DOMAIN ERROR The left argument is not a nonnegative integer, or exceeds 2,147,483,647. The right argument is character and not empty. LIMIT ERROR The result would contain more than 134,000,000 items (134E6<L×⍴R). NONCE ERROR The right argument is nested or simple heterogeneous. RANK ERROR The left argument is not a scalar or one-item vector. The right argument is not a vector. VALENCE ERROR No left argument is supplied. Related functions: ∆opAND (L∘.^R). Examples: 5 ∆UTRI¨(5⍴1) (9 .2 30),⍳¨5 10 1 1 1 1 1 9 0.2 30 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 0 1 1 1 1 0 0.2 30 0 2 3 4 5 0 2 3 4 5 6 7 8 9 10 0 0 1 1 1 0 0 30 0 0 3 4 5 0 0 3 4 5 6 7 8 9 10 0 0 0 1 1 0 0 0 0 0 0 4 5 0 0 0 4 5 6 7 8 9 10 0 0 0 0 1 0 0 0 0 0 0 0 5 0 0 0 0 5 6 7 8 9 10

MACfns31Aug2006 ASMFNS SeveralMACfnsare replacements for functions contained in the 'ASMFNS' workspace supplied by APL2000 with APL+Win version 3.6.02. All are faster and handle larger arguments and return larger results; several offer other advantages such as greater generality or customizability. The following table lists the functions from the 'ASMFNS' workspace which have directMACfnsreplacements, and summarizes the differences. See the documentation in the individual MACfns detailed descriptions for more details. Those noted as "(APL)" are APL coded, not assembly coded, functions; these were at one time written in assembly language (either for APL+PC or APL+DOS), but were never rewritten for APL+Win. The numbers in the column entitled "Typical Speedup Ratio" represent the cpu time used by the 'ASMFNS' function divided by the cpu time used by theMACfnsfunction in a variety of common cases. These ratios tend to be conservative. The numbers in the column entitled "Maximum Result Size Ratio" are the maximum ⎕SIZE of the result of theMACfnsfunction divided by the maximum ⎕SIZE of the result of the 'ASMFNS' function. All ratios are rounded. In both cases, higher is better (MACfnsbeing more advantageous). Maximum Workspace Typical Result 'ASMFNS'MACfnsSpeedup Size Name Name Ratio Ratio Other benefits/differences --------- --------- ------- ------ -------------------------- AV (APL) ∆AF 2.5-8 1.1-5 uses less space DTBR ∆DTBR 5-25 5 customizable fill, (APL) identity detection INDEX1 ∆INDEX_ 1.5-10 .5-10 uses less space (APL) LJUSTIFY ∆LJUST 10-30 5 customizable fill, (APL) identity detection LOWERCASE ∆LCASEDOS 2-3 2.5 (none) MATtoSS ∆MATSS 2-3 8 specifiable (or no) fill, customizable defaults RJUSTIFY ∆RJUST 10-30 5 customizable fill, (APL) identity detection ⍙RPL ∆TXTRPL 3-8 5 uses less space (APL) SSLEN ∆SSLENS 4-8 1.25 uses less space (APL) SSSHAPE ∆SSSHAPE 4-6 (same) uses less space (APL) SStoMAT ∆SSMAT 1.5-2.5 8 specifiable fill, customizable options TEXTREPL ∆TXTRPL .2-4 8 result sometimes differs TRANSLATE ∆XLATE 1.5-3 2.5 (none, but ∆XLATE requires L to have 256 elements) :if L ⍙UCASEX ∆UCASEDOS 2-3 2.5 (none) UPPERCASE ∆UCASEDOS WHERE ∆INDS 1.3-6 1.25 handles all datatypes WHERE R∊0 ∆ZNDS WORDREPL ∆WRDRPL 5-20 5 uses less space, result (APL) often differs The following functions in 'ASMFNS' currently have no MACfns equivalent: APL coded: DLB DLTB DTB NBLENGTH SSASSIGN SSCAT SSCOMPRESS SSDEB SSDLB SSDLTB SSDROP SSDTB SSFIND SSINDEX SSTAKE SSUNIQUE TELPRINT assembler-based: DEB DIV OVER ROWFIND ∆∆VR We welcome suggestions as to which of these (or others) would be useful to you.

GetMACfns Get MACfns, assembler code, documentation, or APL analog. MACfnsFid Return name of file used by GetMACfns (var or niladic fn). ∆CPULEVEL CPU level for MACfns. ∆DOSHAND Convert tie number to DOS handle.

∆CHKTS Check ⎕TS-form timestamp. ∆DATEBASE Compute days since 1Jan1900 from YYYY MM DD dates. ∆DATEPACK Pack YYYY MM DD dates as scalars {per L={0-5}}. ∆DATEREP Compute YYYY MM DD dates from days since 1Jan1900. ∆DATEUNP Unpack scalars as YYYY MM DD dates {per L={0-5{,cuspyear}}}. ∆DATE2BAS Compute days since 1Jan1900 from two YYYY MM DD dates. ∆DATE2REP Compute YYYY MM DD dates from two days since 1Jan1900. ∆DOSTS Check and pack ⎕TS-form timestamp as scalar. ∆DOST Unpack scalar into ⎕TS-form timestamp. ∆CHK2DS Check two 3↑⎕TS-form datestamps. ∆DOS2DS Check and pack two 3↑⎕TS-form datestamps as scalar. ∆DOS2D Unpack scalar into two 3↑⎕TS-form datestamps.

∆ALLOC Allocate arbitrary array of ⎕DR type L and shape R. ∆AF Atomic function: ⎕AV[⎕IO+R] or (⎕AV⍳R)-⎕IO ∆CDR Change data representation: L ⎕DR R ⍝ singleton L ∆BREPI 32-column Boolean representation of integer: ⍉1=(32⍴2)⊤⍉R ∆IREPB Integer representation of 32-col Boolean: ⍉(0 ¯2,30⍴2)⊥⍉R ∆ENDIAN Reverse byte order of integer or floating point values. ∆COERCE Coerce numeric or empty R to type L∊11 323 645. ∆DEMOTE Demote numeric items to most compact representation. ∆CRC Cyclic redundancy 32-bit check for simple R. ∆DRe Data representation each: ⎕DR¨R ∆DRNe Data representation each, negating nested: (⎕DR¨R)×¯1*×≡¨R

∆AF Atomic function: ⎕AV[⎕IO+R] or (⎕AV⍳R)-⎕IO ∆BTCCS Blank terminal control characters. ∆LCASE Translate to lowercase. ∆LCASEDOS Translate to lowercase for DOS. ∆UCASE Translate to uppercase. ∆UCASEDOS Translate to uppercase for DOS. ∆XLATE Translate character R to 256-character L: L[⎕AV⍳R]

∆AF Atomic function: ⎕AV[⎕IO+R] or (⎕AV⍳R)-⎕IO ∆AVEPS Flag membership of all characters: ⎕AV∊R ∆AVNUB Distinct characters: ⎕AV~⎕AV~R ∆AVFREQ Frequency of all characters: +/⎕AV∘.≡,R ∆BEQ Flag blanks (' '=simple): R∊' ' ∆BNE Flag nonblanks (' '≠simple): ~R∊' ' ∆NLEQ Flag newlines (⎕TCNL=simple): R∊⎕TCNL ∆NLNE Flag non-newlines (⎕TCNL≠simple): ~R∊⎕TCNL ∆NULEQ Flag nulls (⎕TCNUL=simple): R∊⎕TCNUL ∆NULNE Flag non-nulls (⎕TCNUL≠simple): ~R∊⎕TCNUL ∆STS Flag start of matches of string R in string L: L ⎕SS R ∆STSNOV Flag start of nonoverlapped matches of string R in string L. ∆STSF First index of string R in string L or ↑↓/L if (2∊⍴L)^⍬≡0⍴L. ∆TXTRPL Replace strings L ('/old1/new1/old2/new2...') in string R. ∆WRDRPL Replace words L ('/old1/new1/old2/new2...') in string R. ∆UNTAB Replace tabs in string R {L={tabinc{,tab{,fill{,⊂dels}}}}}. ∆NLLF Insert LF's after NL's in string, or NL,LF's between rows. ∆XLATE Translate character R to 256-character L: L[⎕AV⍳R]

∆DTBR Delete trailing blank rows: (⌽∨\⌽R∨.≠' ')⌿R ∆CJUST Center justify character: (⌈.5×-⌿+/^\' '=⊃R(⌽R))⌽R ∆LJUST Left justify character: (+/^\' '=R)⌽R ∆RJUST Right justify character: (+/∨\' '≠⌽R)⌽R ∆MATDS Matrix to delimited string {L={delimiter{,{fill∣¯1}}}}. ∆MATSS Matrix to segmented string {L={delimiter{,{fill∣¯1}}}}. ∆MATNV Character matrix to nested vector {L={fill∣¯1}}. ∆DSMAT Delimited string to matrix {L={delimiter{,fill}}}. ∆DSMATSZ Delimited string to matrix size {L={delimiter{,ignored}}}. ∆SSMAT Segmented string to matrix {L={fill}}. ∆SSMATSZ Segmented string to matrix size {L=ignored}. ∆NVMAT Nested vector to character matrix {L=fill}: ⊃R ∆SSLENS Segment lengths in segmented string. ∆SSSHAPE Number of segments in segmented string: +/R=1↑R

∆I11FMT Integer 11-column formatting: 'I11'⎕FMT simple ∆I12FMT Integer 12-column formatting: 'I12'⎕FMT simple

∆CAT_ Catenate along the first axis: L⍪R ∆COLMAT Coalesce trailing (all but the first) axes to form matrix. ∆ROWMAT Coalesce leading (all but the last) axes to form matrix. ∆ONECOL Coalesce all axes to form one-column matrix: ,[⍬],R ∆ONEROW Coalesce all axes to form one-row matrix: ,[⎕IO-.5],R ∆REVA Reverse all axes (matrix ⊖⌽R): (⍴R)⍴⌽,R ∆EXTEND Truncate or pad raveled R with last item to shape L. ∆ENCLOSEe Enclose each (⊂[⍬]R): ⊂¨R ∆SOSNV Scalar or simple to nested vector: ⍎(1=≡R←1/R)/'R←,⊂R'

∆INDEX_ First axis indexing (matrix L[R;]): (⊂R)⌷[⎕IO]L ∆SQUAD_ First axis indexing (matrix R[L;]): (⊂L)⌷[⎕IO]R

∆INDS Indices of nonzeros in vector (bitvec R/⍳⍴R): (~R∊0)/⍳⍴,R ∆ZNDS Indices of zeros in vector (bitvec (~R)/⍳⍴R): (R∊0)/⍳⍴,R ∆ODOMETER Generalized odometer: ⊃,↑∘.,/,R ⍝ nested array of intvecs ∆DTBLENS Length of rows sans trailing blanks: +/∨\' '≠⌽R ∆ROWFIND Flag rows of character L containing vector R: ∨/R⍷1/L ∆ROWIOTA Locate rows of R in matrix L; ⎕IO-1 if not found ⍝ ⎕CT=0 ∆ROLL Random integers from ⍳¨R (?R) or of shape L from ⍳R (?L⍴R). ∆SSLENS Segment lengths in segmented string.

∆NOT Logical negation: ~R ⍝ bit ∆AND Logical and: L^R ⍝ bit ∆ANDNOT Logical and not: L>R ⍝ bit ∆OR Logical or: L∨R ⍝ bit ∆ORNOT Logical or not: L≥R ⍝ bit ∆NAND Logical not and: L⍲R ⍝ bit ∆NANDNOT Logical not and not: L≤R ⍝ bit ∆NOR Logical not or: L⍱R ⍝ bit ∆NORNOT Logical not or not: L<R ⍝ bit ∆XOR Logical exclusive or: L≠R ⍝ bit ∆NXOR Logical not exclusive or: L=R ⍝ bit ∆opAND Outer product logical and: L∘.^R ⍝ bit

∆MATCH Flag if arrays identical: L≡R ⍝ ⎕CT=0 ∆EMATCH Flag if arrays identical (L≡R with ⎕CT=0), including ⎕DR. ∆ALL Flag if all ones (^/,bit, ^/,1=simple): ~0∊R∊1 ⍝ ⎕CT=0 ∆ANY Flag if any ones: 1∊R ⍝ ⎕CT=0 ∆ANYe Flag if any ones in each (1=simple): 1∊¨R ⍝ ⎕CT=0 ∆NALL Flag if not all ones (~^/,bit, ∨/,1≠simple): 0∊R∊1 ⍝ ⎕CT=0 ∆NONE Flag if no ones: ~1∊R ⍝ ⎕CT=0 ∆ALLZ Flag if all zeros (~∨/,bit, ^/,0=simple): ~0∊R∊0 ∆ANYZ Flag if any zeros: 0∊R ∆NALLZ Flag if not all zeros (∨/,bit, ∨/,0≠simple): 0∊R∊0 ∆NOZ Flag if no zeros: ~0∊R ∆EVEN Flag even values: 0=2∣⌊∣R ⍝ ⎕CT=1<≡R ∆ODD Flag odd values: 0≠2∣⌊∣R ⍝ ⎕CT=1<≡R ∆ZEQ Flag zeros (0=simple): R∊0 ∆ZNE Flag nonzeros (0≠simple): ~R∊0 ∆ZGT Flag negative items: 0>R ⍝ 1≥≡R ∆ZLT Flag positive items: 0<R ⍝ 1≥≡R ∆ZLE Flag nonnegative items: 0≤R ⍝ 1≥≡R ∆ZGE Flag nonpositive items: 0≥R ⍝ 1≥≡R ∆BEQ Flag blanks (' '=simple): R∊' ' ∆BNE Flag nonblanks (' '≠simple): ~R∊' ' ∆NLEQ Flag newlines (⎕TCNL=simple): R∊⎕TCNL ∆NLNE Flag non-newlines (⎕TCNL≠simple): ~R∊⎕TCNL ∆NULEQ Flag nulls (⎕TCNUL=simple): R∊⎕TCNUL ∆NULNE Flag non-nulls (⎕TCNUL≠simple): ~R∊⎕TCNUL ∆NVEC End partition vector (flag end of runs); ⎕CT=1<≡R. ∆PVEC Partition vector (flag start of runs); ⎕CT=1<≡R.

∆SIGN Sign (¯1 if negative, 1 if positive, or 0): ×R ⍝ 1≥≡R ∆ABS Absolute value (magnitude): ∣R ⍝ 1≥≡R ∆NABS Negate absolute value (negated magnitude): -∣R ⍝ 1≥≡R ∆FABS Floor absolute value (integer magnitude): ⌊∣R ⍝ ⎕CT=1<≡R ∆NEG Negate (change sign): -R ⍝ 1≥≡R ∆NOT Logical negation: ~R ⍝ bit ∆NPOW Negative one power (1=even, ¯1=odd): ¯1*R ⍝ int ∆NNPOW Negate negative one power (¯1=even, 1=odd): -¯1*R ⍝ int ∆ONEMAX Higher of one or R: 1⌈R ⍝ 1≥≡R ∆ONEMIN Lower of one or R: 1⌊R ⍝ 1≥≡R ∆ZMAX Higher of zero or R (zero negative items): 0⌈R ⍝ 1≥≡R ∆ZMIN Lower of zero or R (zero positive items): 0⌊R ⍝ 1≥≡R ∆HILO Highest and lowest: (⌈/,R),⌊/,R ⍝ ⎕DR-based if empty

∆CEIL Ceiling (lowest integer not below): ⌈R ⍝ ⎕CT=1<≡R ∆FLOOR Floor (highest integer not above): ⌊R ⍝ ⎕CT=1<≡R ∆FABS Floor absolute value (integer magnitude): ⌊∣R ⍝ ⎕CT=1<≡R ∆FHALF Floor halve: ⌊R÷2 ⍝ ⎕CT=1<≡R ∆IRND Integer round: ⌊.5+R ⍝ ⎕CT=1<≡R ∆IRNDM Integer round magnitude: (×R)×⌊.5+∣R ⍝ ⎕CT=1<≡R ∆F2LOG Adjusted floor two log: (¯65536×0=R)+⌊2⍟∣R+0=R ⍝ ⎕CT=1<≡R ∆F10LOG Adjusted floor ten log: (1+0>R)+⌊10⍟1⌈∣R ⍝ ⎕CT=1<≡R

∆DEC Decrement by one: R-1 ⍝ 1≥≡R ∆DEN Decrement negation by one: ¯1-R ⍝ 1≥≡R ∆DECH Decrement by half: R-.5 ⍝ 1≥≡R ∆INC Increment by one: R+1 ⍝ 1≥≡R ∆INN Increment negation by one: 1-R ⍝ 1≥≡R ∆INCH Increment by half: R+.5 ⍝ 1≥≡R ∆NEG Negate (change sign): -R ⍝ 1≥≡R ∆NABS Negate absolute value (negated magnitude): -∣R ⍝ 1≥≡R ∆ADDR Add reciprocal ((1+R*2)÷R): R+÷R ⍝ 1≥≡R ∆SUBR Subtract reciprocal ((¯1+R*2)÷R): R-÷R ⍝ 1≥≡R

∆DOUB Double: R×2 ⍝ 1≥≡R ∆DECD Decrement double: ¯1+R×2 ⍝ 1≥≡R ∆HALF Halve: R÷2 ⍝ 1≥≡R ∆HINC Halve increment: (R+1)÷2 ⍝ 1≥≡R ∆FHALF Floor halve: ⌊R÷2 ⍝ ⎕CT=1<≡R ∆HSQAR Halve square: (R*2)÷2 ⍝ 1≥≡R ∆HDIV Half divided by: .5÷R ⍝ 1≥≡R ∆TWODIV Two divided by: 2÷R ⍝ 1≥≡R ∆RECIP Reciprocal: ÷R ⍝ 1≥≡R ∆SUMRECIP Sum reciprocals in simple R: +/,÷R ∆NEGR Negate reciprocal: -÷R ⍝ 1≥≡R ∆INCR Increment reciprocal ((R+1)÷R): 1+÷R ⍝ 1≥≡R ∆DECR Decrement reciprocal ((1-R)÷R): ¯1+÷R ⍝ 1≥≡R ∆INNR Increment negated reciprocal ((R-1)÷R): 1-÷R ⍝ 1≥≡R ∆DENR Decrement negated reciprocal ((¯1-R)÷R): ¯1-÷R ⍝ 1≥≡R ∆ADDR Add reciprocal ((1+R*2)÷R): R+÷R ⍝ 1≥≡R ∆SUBR Subtract reciprocal ((¯1+R*2)÷R): R-÷R ⍝ 1≥≡R ∆RINC Reciprocal increment: ÷R+1 ⍝ 1≥≡R ∆RDEC Reciprocal decrement: ÷R-1 ⍝ 1≥≡R ∆RINN Reciprocal incremented negation: ÷1-R ⍝ 1≥≡R ∆RDEN Reciprocal decremented negation: ÷¯1-R ⍝ 1≥≡R ∆DIVI Divide by increment: R÷R+1 ⍝ 1≥≡R ∆DIVD Divide by decrement: R÷R-1 ⍝ 1≥≡R ∆DIVN Divide by incremented negation: R÷1-R ⍝ 1≥≡R ∆DIVC Divide by decremented negation: R÷¯1-R ⍝ 1≥≡R ∆DIVDI Divide decrement by increment: (R-1)÷R+1 ⍝ 1≥≡R ∆DIVID Divide increment by decrement: (R+1)÷R-1 ⍝ 1≥≡R ∆DIVIN Divide increment by incremented negation: (R+1)÷1-R ⍝ 1≥≡R ∆DIVNI Divide incremented negation by increment: (1-R)÷R+1 ⍝ 1≥≡R ∆UTRI L-row upper triangular mat from vec R: R×[⎕IO+1](⍳L)∘.≤⍳⍴R

∆DIVPI Divide by pi: R÷○1 ⍝ 1≥≡R ∆DIV2PI Divide by twice pi: R÷○2 ⍝ 1≥≡R ∆DIVHPI Divide by half pi: R÷○.5 ⍝ 1≥≡R ∆PIDIV Pi divided by: ○÷R ⍝ 1≥≡R ∆PI Pi times: ○R ⍝ 1≥≡R ∆PIDOUB Pi times double: ○R×2 ⍝ 1≥≡R ∆PIHALF Pi times halve: ○R÷2 ⍝ 1≥≡R ∆PISQAR Pi times square: ○R*2 ⍝ 1≥≡R ∆SQRTDPI Square root of divide by pi: (R÷○1)*.5 ⍝ 1≥≡R ∆PIDTR Degrees to radians: ○R÷180 ⍝ 1≥≡R ∆PIRTD Radians to degrees: R÷○÷180 ⍝ 1≥≡R

∆ONEMOD One modulus (fractional part): 1∣R ⍝ ⎕CT=1<≡R ∆TWOMOD Two modulus: 2∣R ⍝ ⎕CT=1<≡R ∆EVEN Flag even values: 0=2∣⌊∣R ⍝ ⎕CT=1<≡R ∆ODD Flag odd values: 0≠2∣⌊∣R ⍝ ⎕CT=1<≡R ∆NPOW Negative one power (1=even, ¯1=odd): ¯1*R ⍝ int ∆NNPOW Negate negative one power (¯1=even, 1=odd): -¯1*R ⍝ int

∆RECIP Reciprocal: ÷R ⍝ 1≥≡R ∆SQAR Square: R*2 ⍝ 1≥≡R ∆HSQAR Halve square: (R*2)÷2 ⍝ 1≥≡R ∆RSQAR Reciprocal square: R*¯2 ⍝ 1≥≡R ∆PISQAR Pi times square: ○R*2 ⍝ 1≥≡R ∆SQRT Square root: R*.5 ⍝ 1≥≡R ∆SQRTA Square root absolute value: (∣R)*.5 ⍝ 1≥≡R ∆SQRTD Square root double: (R×2)*.5 ⍝ 1≥≡R ∆SQRTDPI Square root of divide by pi: (R÷○1)*.5 ⍝ 1≥≡R ∆RSQRT Reciprocal square root: R*¯.5 ⍝ 1≥≡R ∆RSQRTA Reciprocal square root absolute value: (∣R)*¯.5 ⍝ 1≥≡R ∆CUBE Cube: R*3 ⍝ 1≥≡R ∆CUBERT Cube root: (×R)×(∣R)*÷3 ⍝ 1≥≡R ∆FOUTHPW Fourth power: R*4 ⍝ 1≥≡R ∆FOUTHRT Fourth root: R*.25 ⍝ 1≥≡R

∆ELOG Base-e (natural) logarithm: ⍟R ⍝ 1≥≡R ∆SUMELOG Sum base-e (natural) logarithms in simple R: +/,⍟R ∆EPOW Base-e (natural) exponential: *R ⍝ 1≥≡R ∆SINH Hyperbolic sine: 5○R ⍝ 1≥≡R ∆COSH Hyperbolic cosine: 6○R ⍝ 1≥≡R ∆TWOLOG Base-two logarithm: 2⍟R ⍝ 1≥≡R ∆TWOPOW Base-two exponential: 2*R ⍝ 1≥≡R ∆TENLOG Base-ten logarithm: 10⍟R ⍝ 1≥≡R ∆TENPOW Base-ten exponential: 10*R ⍝ 1≥≡R ∆FAC Factorial: !R ⍝ int ∆F2LOG Adjusted floor two log: (¯65536×0=R)+⌊2⍟∣R+0=R ⍝ ⎕CT=1<≡R ∆F10LOG Adjusted floor ten log: (1+0>R)+⌊10⍟1⌈∣R ⍝ ⎕CT=1<≡R ∆NPOW Negative one power (1=even, ¯1=odd): ¯1*R ⍝ int ∆NNPOW Negate negative one power (¯1=even, 1=odd): -¯1*R ⍝ int

∆SIN Sine of radians: 1○R ⍝ 1≥≡R ∆SIND Sine of degrees: 1○○R÷180 ⍝ 1≥≡R ∆COS Cosine of radians: 2○R ⍝ 1≥≡R ∆COSD Cosine of degrees: 2○○R÷180 ⍝ 1≥≡R ∆TAN Tangent of radians: 3○R ⍝ 1≥≡R ∆TAND Tangent of degrees: 3○○R÷180 ⍝ 1≥≡R ∆ASIN Arcsine in radians: ¯1○R ⍝ 1≥≡R ∆ASIND Arcsine in degrees: (¯1○R)÷○÷180 ⍝ 1≥≡R ∆ACOS Arccosine in radians: ¯2○R ⍝ 1≥≡R ∆ACOSD Arccosine in degrees: (¯2○R)÷○÷180 ⍝ 1≥≡R ∆ATAN Arctangent in radians: ¯3○R ⍝ 1≥≡R ∆ATAND Arctangent in degrees: (¯3○R)÷○÷180 ⍝ 1≥≡R ∆CSC Cosecant of radians: ÷1○R ⍝ 1≥≡R ∆SEC Secant of radians: ÷2○R ⍝ 1≥≡R ∆COT Cotangent of radians: ÷3○R ⍝ 1≥≡R ∆ACSC Arccosecant in radians: ¯1○÷R ⍝ 1≥≡R ∆ASEC Arcsecant in radians: ¯2○÷R ⍝ 1≥≡R ∆ACOT Arccotangent in radians: ¯3○÷R ⍝ 1≥≡R ∆SINC Sine cardinal of radians: (1○R)÷R ⍝ 1≥≡R ∆SINCPI Sine cardinal pi times: (1○○R)÷○R ⍝ 1≥≡R ∆SINCOS Sine and cosine of radians: 1 2○⊂R ⍝ 1≥≡R ∆GD Gudermannian function: ¯3○5○R ⍝ 1≥≡R ∆AGD Inverse Gudermannian function: ¯5○3○R ⍝ 1≥≡R

∆PYTH Hypotenuse of unit tri. from side ((1+R*2)*.5): 4○R ⍝ 1≥≡R ∆APYTH Side of unit tri. from hypot. (R×(1-R*¯2)*.5): ¯4○R ⍝ 1≥≡R ∆APYTHA Side of unit tri. from ∣hypot. ((¯1+R*2)*.5): ¯4○∣R ⍝ 1≥≡R ∆ZPYTH Side of unit tri. from other side ((1-R*2)*.5): 0○R ⍝ 1≥≡R

∆SINH Hyperbolic sine: 5○R ⍝ 1≥≡R ∆COSH Hyperbolic cosine: 6○R ⍝ 1≥≡R ∆TANH Hyperbolic tangent: 7○R ⍝ 1≥≡R ∆ASINH Hyperbolic arcsine: ¯5○R ⍝ 1≥≡R ∆ACOSH Hyperbolic arccosine: ¯6○R ⍝ 1≥≡R ∆ATANH Hyperbolic arctangent: ¯7○R ⍝ 1≥≡R ∆CSCH Hyperbolic cosecant: ÷5○R ⍝ 1≥≡R ∆SECH Hyperbolic secant: ÷6○R ⍝ 1≥≡R ∆COTH Hyperbolic cotangent: ÷7○R ⍝ 1≥≡R ∆ACSCH Hyperbolic arccosecant: ¯5○÷R ⍝ 1≥≡R ∆ASECH Hyperbolic arcsecant: ¯6○÷R ⍝ 1≥≡R ∆ACOTH Hyperbolic arccotangent: ¯7○÷R ⍝ 1≥≡R ∆ELOG Base-e (natural) logarithm: ⍟R ⍝ 1≥≡R ∆EPOW Base-e (natural) exponential: *R ⍝ 1≥≡R ∆GD Gudermannian function: ¯3○5○R ⍝ 1≥≡R ∆AGD Inverse Gudermannian function: ¯5○3○R ⍝ 1≥≡R

∆GENB Generate random bits of shape R. ∆GENC Generate random characters of shape R. ∆GENI Generate random 4-byte integers of shape R. ∆GENP Generate random positive 4-byte integers of shape R. ∆ROLL Random integers from ⍳¨R (?R) or of shape L from ⍳R (?L⍴R).

∆RANKe Rank each: ↑¨⍴¨⍴¨R ∆SHAPEe Shape each: ⍴¨R ∆FAXe First axis each (1 if scalar): ↑¨(⍴¨R),¨1 ∆LAXe Last axis each (1 if scalar): ↑¨⌽¨1,¨⍴¨R ∆NELMe Number of elements each: ↑¨⍴¨,¨R ∆DRe Data representation each: ⎕DR¨R ∆DRNe Data representation each, negating nested: (⎕DR¨R)×¯1*×≡¨R ∆ENCLOSEe Enclose each (⊂[⍬]R): ⊂¨R ∆ANYe Flag if any ones in each (1=simple): 1∊¨R ⍝ ⎕CT=0

∆AVFREQ Frequency of all characters: +/⎕AV∘.≡,R ∆ODOMETER Generalized odometer: ⊃,↑∘.,/,R ⍝ nested array of intvecs ∆opAND Outer product logical and: L∘.^R ⍝ bit ∆SIGNS Signs summary (count neg,0,pos): +/¯1 0 1∘.=,×R ⍝ 1≥≡R ∆UTRI L-row upper triangular mat from vec R: R×[⎕IO+1](⍳L)∘.≤⍳⍴R

∆ALL Flag if all ones (^/,bit, ^/,1=simple): ~0∊R∊1 ⍝ ⎕CT=0 ∆ALLZ Flag if all zeros (~∨/,bit, ^/,0=simple): ~0∊R∊0 ∆ANY Flag if any ones: 1∊R ⍝ ⎕CT=0 ∆ANYZ Flag if any zeros: 0∊R ∆NALL Flag if not all ones (~^/,bit, ∨/,1≠simple): 0∊R∊1 ⍝ ⎕CT=0 ∆NALLZ Flag if not all zeros (∨/,bit, ∨/,0≠simple): 0∊R∊0 ∆NONE Flag if no ones: ~1∊R ⍝ ⎕CT=0 ∆NOZ Flag if no zeros: ~0∊R ∆MATCH Flag if arrays identical: L≡R ⍝ ⎕CT=0 ∆EMATCH Flag if arrays identical (L≡R with ⎕CT=0), including ⎕DR. ∆NESTED Flag if nested: 1<≡R ∆SIMPLE Flag if simple: 1≥≡R ∆SIGNS Signs summary (count neg,0,pos): +/¯1 0 1∘.=,×R ⍝ 1≥≡R ∆HILO Highest and lowest: (⌈/,R),⌊/,R ⍝ ⎕DR-based if empty ∆AVG Average items in simple R (mean): (+/,R)÷1⌈×/⍴R ∆SUM Sum items in simple R: +/,R ∆SUMABS Sum absolute values in simple R: +/,∣R ∆SUMELOG Sum base-e (natural) logarithms in simple R: +/,⍟R ∆SUMRECIP Sum reciprocals in simple R: +/,÷R ∆SUMSQAR Sum squares in simple R: +/,R*2 ∆SUMZEQ Number of zeros (+/,~bit, +/,0=simple): +/,R∊0 ∆SUMZNE Number of nonzeros (+/,bit, +/,0≠simple): +/,~R∊0 ∆SSSHAPE Number of segments in segmented string: +/R=1↑R ∆CRC Cyclic redundancy 32-bit check for simple R. ∆NVEC End partition vector (flag end of runs); ⎕CT=1<≡R. ∆PVEC Partition vector (flag start of runs); ⎕CT=1<≡R.

∆ALLZ Flag if all zeros (~∨/,bit, ^/,0=simple): ~0∊R∊0 ∆ANYZ Flag if any zeros: 0∊R ∆NALLZ Flag if not all zeros (∨/,bit, ∨/,0≠simple): 0∊R∊0 ∆NOZ Flag if no zeros: ~0∊R ∆ZEQ Flag zeros (0=simple): R∊0 ∆ZNE Flag nonzeros (0≠simple): ~R∊0 ∆SUMZEQ Number of zeros (+/,~bit, +/,0=simple): +/,R∊0 ∆SUMZNE Number of nonzeros (+/,bit, +/,0≠simple): +/,~R∊0 ∆ZNDS Indices of zeros in vector (bitvec (~R)/⍳⍴R): (R∊0)/⍳⍴,R ∆INDS Indices of nonzeros in vector (bitvec R/⍳⍴R): (~R∊0)/⍳⍴,R ∆ZMAX Higher of zero or R (zero negative items): 0⌈R ⍝ 1≥≡R ∆ZMIN Lower of zero or R (zero positive items): 0⌊R ⍝ 1≥≡R

© 2017 Lescasse Consulting. All rights reserved.