MACfns v4.0

MACfns v4.0 is a set of 257 ultra-fast Assembler functions which you can use to speed up your APL+Win applications.

Extensively documented, MACfns executes faster, in less space, and with greater accuracy than APL primitives, avoiding WS FULL and LIMIT ERRORs and often returning smaller results. A parallel suite of APL analog functions is also included for study and migration.

Requirements

MACfns requires:

  • a version of APL+Win at least equal to v3.6+ (or APL+DOS 6.0)

Download

Download a sample MACfns APL+Win workspace:



You can download this workspace and try the Assembler functions it contains with any version of APL+Win greater or equal to APL+Win v3.6+.

Note: The MACFNS1.W3 workspace is delivered "for trial only purposes": you are not allowed to include these Assembler functions in sofwtare you distribute or sell!

The MACfns product itself comes as a workspace (173KB MACFNS.w3) and an APL component file (1623KB MACFNS.sf), both packaged into a ZIP file (653KB MACfns40.zip).

No installation is necessary: simply unzip into any folder accessible to APL.

Versions History

Check the recent changes made to MACfns:

Recent History »

Licenses

                  License Agreement for MACfns

                                           Date _________________

Read the following License Agreement carefully.  By signing it,
you consent to be bound by and become a party to this Agreement.
If you don't agree to all the terms of this Agreement, you may
not examine or use MACfns in any way.

What you are licensing

  MACfns consists of Software and Documentation.  The Software
  consists of Compiled Assembly Code, APL cover functions, and
  APL analog functions.  The terms of this Agreement pertain to
  both the Software and the Documentation.  This Agreement also
  covers both the initial version you receive, and any subsequent
  versions or updates you may receive.

  Sykes Systems, Inc. is the author and owner of all title,
  rights, and interest in MACfns.

Your License

  Sykes Systems, Inc. hereby grants _____________________ ("You")
  a nonexclusive license to use MACfns Software and Documentation
  on your own computers, and to incorporate the Software (but not
  the Documentation) into products distributed to others.

  You may modify the Documentation, APL cover functions, and APL
  analog functions as needed for your purposes.

  You may modify the Compiled Assembly Code, but only in the ways
  described in the Documentation (under "Customization").

  You may make as many copies of the Software and Documentation
  as needed, but you must safeguard them as you would your own
  proprietary and confidential information.

  MACfns is protected by United States copyright laws and
  international treaties, and you must treat it accordingly.

What you may not do

  You may not distribute the Documentation of MACfns to others
  outside your organization under any circumstances.

  You may not distribute the Software of MACfns to others outside
  your organization, except as incorporated into and embedded
  within your products.  In particular, the Software of MACfns
  must not be directly accessible or examinable by users of your
  products.

  You may not place any component of MACfns so that it is
  accessible via a public network such as the Internet.

  You may not modify the Compiled Assembly Code except as
  described in the Documentation (under "Customization").

  You may not reverse engineer, disassemble, decompile, or make
  any attempt to discover the source code of the MACfns Compiled
  Assembly Code, nor allow others to do so.

  You may not sublicense, rent, lease, or lend any component of
  MACfns to others.

  You may not transfer this License to another person or legal
  entity without written authorization from Sykes Systems, Inc.

Limited Warranty

  Sykes Systems, Inc. warrants that for a period of six months
  after delivery of MACfns to you that the Software will perform
  in substantial accordance with the Documentation.

  We do not warrant the merchantability or fitness of MACfns for
  any particular purpose.

  We will not be liable for any direct, incidental, or
  consequential damages arising from the use of, or inability to
  use, MACfns, nor for claims from another party.

Term and Termination

  This License Agreement takes effect upon your receipt of MACfns
  and remains effective until terminated.  You may terminate it
  at any time by destroying all copies of MACfns in your
  possession.  It will also automatically terminate if you fail
  to comply with any term or condition of this License Agreement.
  You agree on termination to destroy all copies of the Software
  and Documentation in your possession.

Confidentiality

  MACfns contains trade secrets and proprietary know-how that
  belong to Sykes Systems, Inc. and it is being made available
  to you in strict confidence.  You must ensure the protection
  and confidentiality of MACfns.  Any use or disclosure of the
  Software or Documentation, or of its algorithms or protocols,
  other than in strict accordance with this License Agreement,
  may be actionable as a violation of our trade secret rights.

General Provisions

  This written License Agreement is the exclusive agreement
  between us concerning MACfns.  It may be modified only in
  writing signed by both of us.

  In the event of litigation between us concerning MACfns, the
  prevailing party in the litigation will be entitled to recover
  attorney fees and expenses from the other party.

  This License agreement is governed by the laws of California.


Your Name:  _________________________   _________________________
            (Print)                     (Signature)

Company:    _________________________   _________________________
                                        (Date)

Title:      _________________________


Please return to
  Lescasse Consulting
  Eric Lescasse
  18 rue de la Belle Feuille
  92100 Boulogne
  France
            Personal Use License Agreement for MACfns

                                           Date _________________

Read the following License Agreement carefully.  By signing it,
you consent to be bound by and become a party to this Agreement.
If you don't agree to all the terms of this Agreement, you may
not examine or use MACfns in any way.

What you are licensing

  MACfns consists of Software and Documentation.  The Software
  consists of Compiled Assembly Code, APL cover functions, and
  APL analog functions.  The terms of this Agreement pertain to
  both the Software and the Documentation.  This Agreement also
  covers both the initial version you receive, and any subsequent
  versions or updates you may receive.

  Sykes Systems, Inc. is the author and owner of all title,
  rights, and interest in MACfns.

Your License

  Sykes Systems, Inc. hereby grants _____________________ ("You")
  a nonexclusive license to use MACfns Software and Documentation
  on your own computers.  You are an individual person, and will
  not share the Software or Documentation with other persons.

  You may modify the Documentation, APL cover functions, and APL
  analog functions as needed for your purposes.

  You may modify the Compiled Assembly Code, but only in the ways
  described in the Documentation (under "Customization").

  You may make as many copies of the Software and Documentation
  as needed, but you must safeguard them as you would your own
  proprietary and confidential information.

  MACfns is protected by United States copyright laws and
  international treaties, and you must treat it accordingly.

What you may not do

  You may not distribute the Documentation of MACfns to others
  under any circumstances.

  You may not distribute the Software of MACfns to others under
  any circumstances.

  You may not place any component of MACfns so that it is
  accessible via a public network such as the Internet.

  You may not modify the Compiled Assembly Code except as
  described in the Documentation (under "Customization").

  You may not reverse engineer, disassemble, decompile, or make
  any attempt to discover the source code of the MACfns Compiled
  Assembly Code, nor allow others to do so.

  You may not sublicense, rent, lease, or lend any component of
  MACfns to others.

  You may not transfer this License to another person or legal
  entity without written authorization from Sykes Systems, Inc.

Limited Warranty

  Sykes Systems, Inc. warrants that for a period of six months
  after delivery of MACfns to you that the Software will perform
  in substantial accordance with the Documentation.

  We do not warrant the merchantability or fitness of MACfns for
  any particular purpose.

  We will not be liable for any direct, incidental, or
  consequential damages arising from the use of, or inability to
  use, MACfns, nor for claims from another party.

Term and Termination

  This License Agreement takes effect upon your receipt of MACfns
  and remains effective until terminated.  You may terminate it
  at any time by destroying all copies of MACfns in your
  possession.  It will also automatically terminate if you fail
  to comply with any term or condition of this License Agreement.
  You agree on termination to destroy all copies of the Software
  and Documentation in your possession.

Confidentiality

  MACfns contains trade secrets and proprietary know-how that
  belong to Sykes Systems, Inc. and it is being made available
  to you in strict confidence.  You must ensure the protection
  and confidentiality of MACfns.  Any use or disclosure of the
  Software or Documentation, or of its algorithms or protocols,
  other than in strict accordance with this License Agreement,
  may be actionable as a violation of our trade secret rights.

General Provisions

  This written License Agreement is the exclusive agreement
  between us concerning MACfns.  It may be modified only in
  writing signed by both of us.

  In the event of litigation between us concerning MACfns, the
  prevailing party in the litigation will be entitled to recover
  attorney fees and expenses from the other party.

  This License agreement is governed by the laws of California.


Your Name:  _________________________   _________________________
            (Print)                     (Signature)

                                        _________________________
                                        (Date)

Please return to:
  Lescasse Consulting
  Eric Lescasse
  18 rue de la Belle Feuille
  92100 Boulogne
  France

Introduction to MACfns

What is MACfns?

MACfns is a large suite of APL functions (250) which uses assembly language to achieve extraordinary speed and unmatched precision. We will discuss some of the characteristics, advantages, and complexities of using assembly language, with code examples and illustrations of the development and usage of MACfns.

What is Assembly Language?

Assembly language is a [truly] primitive computer language which translates [barely] human-readable code directly into the low-level CPU (central processing unit) instructions which the computer executes, called machine code. Essentially, it is one-to-one mapping of mnemonics into machine instructions, the language of the hardware. Programming at any level beneath assembly language requires a soldering iron (although a yet lower level, called microcode, is accessible to the implementers of the microprocessor chips themselves). Therefore, assembly language is intimately connected with the machine upon which it executes. The assembly language for an IBM(r) PowerPC(r) chip is completely different from that for a z9(r) mainframe, an UltraSPARC(r) chip, an Intel(r) Itanium(r) chip, or an Intel Pentium(r) processor.

Fortunately, Intel Corporation has diligently maintained a strategy of backwards compatibility in what it calls Intel Architecture (IA), which is the instruction set and underlying logical structure for the x86 family of processors (which does not include the Itanium). This family, started almost three decades ago, is the broadest and the most commercially successful microprocessor architecture ever invented. Starting with the 8086/8087/8088, it evolved through the 80286/80287, the i386(r)/387, the i486(r), the Pentium and its several generations, and continues today with the Core Duo(r) processors. It is an amazing testament to Intel that most assembly language programs written in 1978 for a 16-bit 8088 processor will execute successfully in 2006 on a 64-bit Pentium Extreme Edition or Core 2 Duo machine. (In fact, with minimal translation, 8-bit assembly programs written for a 1973-release Intel 8080 will also work.) Furthermore, other manufacturers, notably AMD(r), have maintained close compatibility with IA, in some cases even leading the way to new extensions.

For the balance of this paper, we will refer to "assembly language" exclusively as that for the instruction set of IA, in which MACfns is written. We will further base our discussion only on the single-CPU chips on which all APL implementations to date are based, and upon which we can comment knowledgeably. A minor nomenclature issue: humans write "assembly code" in "assembly language"; an "assembler" translates assembly code into "machine code", which is what machines execute. Sometimes assembly code is called "source code", "assembler code", or confusingly, just "assembler" (meaning the human-authored code, not the translator). The machine code is sometimes called "object code", but even more confusingly, is also called "assembler code"; we have been guilty of this imprecision in MACfns documentation. However, in this paper we will use and maintain the distinction between the terms assembly language, assembly code, and machine code.

Assembly language has instructions for the basic arithmetic and logical operations, data movement, comparisons, branching, shifting, flag manipulation, low-level I/O, and miscellaneous operations. A second set of assembly instructions manipulates the floating point unit (FPU), which provides a richer set of mathematical operations. And yet a third set, having several subsets, provides access to various SIMD (single-instruction multiple-data) capabilities of the latest generations of processors. Intel has made tremendous strides in the underlying implementation of these instructions over the years, with names like pipelining, prefetching, superscalar execution, instruction reordering, speculative execution, shadow registers, retirement buffers, and branch prediction. While these optimizations benefit the speed of programs, and often affect the optimal selection and arrangement of assembly instructions in a program, they do not affect the correctness of the program -- it executes precisely the way it was programmed.

As in APL, instructions in assembly language are interpretive. An instruction cannot safely execute without having the results of prior instructions. Most instructions execute in only one or two cycles, but some of the more complicated can take 50 or more, and others can be executed in multiples per cycle. (The cycle time of the machine is the reciprocal of its frequency rating, measured in hertz; a 3 GHz machine runs at three billion cycles per second.) While it was once possible to predict the execution time of an assembly language program from its emitted instructions and the speed of the machine in which it executed, this is no longer true; one must understand the optimizations underlying IA to be able to even approximate the execution time. Furthermore, the interactions of multiprocessing and multiple threads executing concurrently introduces unpredictability, such as the untoward flushing of memory caches.

The Difference Between Assembly Language and APL

For our purposes, however, we shall concentrate on what differs between APL and assembly language, and why MACfns is implemented in assembly language rather than in a higher-level language such as C (in which the APL+Win(r) interpreter itself is implemented). Consider the following, typed in desk calculator mode:

      ⎕←A←121212+192947
314159

It seems trivial, but literally thousands of machine instructions are executed for this calculation, taking a few microseconds. The user is prompted and the machine goes into a wait state awaiting input. Each character entered is translated and recorded in a screen buffer; the text is rationalized (backspaces and cursor movements resolved). Upon receipt of a carriage return (Enter), storage is allocated for the statement; the line is parsed and tokenized; the symbol table is updated; the digits are converted into numbers via a series of multiplications and additions, then classified as Boolean, integer, or floating point; and (depending on the APL system) the line is evaluated syntactically. Finally, APL is ready to execute the line.

Since the line has no leading ∇, ), or ], APL interprets it as an APL statement. The constants are moved to an execution buffer. The interpreter branches to an entry for the dyadic + routine, which evaluates the rank, length, and type of its arguments (checking for RANK, LENGTH, and DOMAIN errors), allocates storage for the result (checking for WS FULL), and runs a [one-iteration] loop which adds the two numbers (checking for overflow and LIMIT error), storing the result in the temporary memory entry. The A← assignment then, after assuring that A is neither a function nor label, attaches the temporary memory entry to A, freeing any storage previously associated with A. Finally, the ⎕← assignment is evaluated by allocating storage for a character vector, formatting the value via a series of divisions and subtractions, outputting each digit to the screen or terminal, and freeing the storage. Then APL outputs a new line and six-space indent and returns to a wait state.

Whew! The exact sequence differs slightly in different APL implementations, but you get the picture.

Now consider the following assembly code:

     MOV   EAX,121212
     ADD   EAX,192947

This moves the 32-bit constant 121212 into the register EAX (of which there are only eight [16 in 64-bit processors]), then adds another 32-bit constant to it. At the same time, it sets flags indicating the sign of the result, the parity of its low 8 bits, and whether it overflowed 4 bits (for BCD arithmetic), 31 bits (signed doubleword arithmetic), or 32 bits (unsigned arithmetic). That's all it does. There is no input or output; there are no changes to memory. It also executes in less than a nanosecond.

Writing and Running Assembly Language in APL+Win

During Sykes Systems' development of MACfns, we have developed tools to let us assemble and execute such instructions on the fly (naturally, the diamond separates statements):

      RUN 'MOV EAX,121212 ⋄ ADD EAX,192947'
10 bytes code:
EAX=314159       EBX=0            ECX=0            EDX=0
EBP=0            ESI=0            EDI=0            FLG=514
FLAGS:  JA, JG [no flags]

This displays the seven general-purpose registers and the flags register upon completion. These eight values are the explicit result of ⎕CALL. The stack register ESP is not included, nor is the instruction pointer EIP, segment registers, descriptor table and status registers, nor other special-purpose registers.

Perhaps we want to examine the machine code itself, and then run it separately:

      ⎕←MX←CODE 'MOV EAX,121212 ⋄ ADD EAX,192947'
+|Ù⍀¨∣⍙€

This gibberish can be clarified by using ∆AF from MACfns (atomic function, here used as (⎕AV⍳MX)-⎕IO):

      ∆AF MX
184 124 217 1 0 5 179 241 2 0
      256⊥0 1 217 124
121212
      256⊥0 2 241 179
192947
      2 HEX ∆AF MX   ⍝ for the cognoscenti, or masochists
B8 7C D9 01 00 05 B3 F1 02 00

We see that the constants are indeed embedded in the code (backwards, the much-debated "little endian" characteristic of Intel Architecture). We can infer that 184 (B8) is the opcode (operation or instruction code) for MOV EAX and 5 is the opcode for ADD EAX (it's a little more complicated, however).

Now that we have assembled our teeny assembly code program into machine code MX, let's try to run it using APL+Win's ⎕CALL, which enables us to execute the machine code directly:

      ⎕CALL MX
DOMAIN ERROR
      ⎕CALL MX
      ^

We are missing two crucial elements. The first is that ⎕CALL requires the first four bytes to be a system-dependent signature value (it differs between APL+Win and APL+DOS) to help discourage inappropriate use. Its absence is the cause of the DOMAIN ERROR. The second is that after our snippet of code runs, it doesn't know where to go; it fact it will scamper off into the workspace, executing all manner of nonsense until, in a few nanoseconds, APL tosses us out on the street (i.e., Windows(r)) for our misbehavior.

The signature value is easily obtained from any MACfns machine code (what we loosely refer to as assembler code in our documentation):

      ↑⍙AF
2000042035
      82 ⎕DR ↑⍙AF    ⍝ show it as characters
386w

The final instruction executed must be a return-from-procedure instruction (which pops the system stack and jumps to a memory address which ⎕CALL initially placed there for a graceful exit). Here is a proper assembly program for APL+Win (DD defines the signature as a doubleword [four bytes], and RETN returns from a near procedure, which characterizes all MACfns):

      AC←'DD 2000042035 ⋄ MOV EAX,121212 ⋄ ADD EAX,192947 ⋄ RETN'

which we can assemble into machine code,

      ∆AF ⎕←MC←CODE AC
386w+|Ù⍀¨∣⍙€Ã
51 56 54 119 184 124 217 1 0 5 179 241 2 0 195

and execute,

      ⎕CALL MC       ⍝ 121212+192947
314159 0 0 0 0 0 0 514

Voilà! Our first working assembly code program.

A key reason why MACfns uses Assembly Language

Herein lies a major advantage of assembly code and the resultant machine code for MACfns: it is extremely lean. The overhead of ⎕CALL MC is about the same as that for A+B when A and B are integer scalars and the expression has already been parsed, tokenized, and evaluated for syntax. There is no interpretive overhead or analysis, storage allocation or movement of data (except for the result of ⎕CALL itself), or interface to external or asynchronous processes. Furthermore, for small programs assembly code can be simpler and less verbose than C code.

If we are extremely careful, we can modify and run machine code directly. (If we make a mistake, we could lose our APL session when we execute it.) For example, before we noticed that the embedded constant 121212 was encoded as ∆AF 124 217 1 0, which is now the sixth-ninth bytes of MC. We can change the constant in variable MC,

      ∆AF MC[6 7 8 9]
124 217 1 0
      MC[6 7 8 9]←82 ⎕DR 7053     ⍝ or ∆AF⌽(4⍴256)⊤7053

and rerun the code,

      ⎕CALL MC       ⍝ 7053+192947
200000 0 0 0 0 0 0 530

to see the effect. This is the basis of the mechanism MACfns offers to customize the machine code for user-settable defaults and other characteristics which extend its flexibility and utility. As you might guess, the smallest machine code we can run in APL+Win is

      ⎕CALL 2000042035 195        ⍝ or ⎕CALL '386wÃ'
0 0 0 0 0 0 0 530

which is solely the signature value and a RETN instruction, five bytes in total.

Sample of MACfns Development

So far the code we've produced does nothing useful. Below is a more substantial program with practical utility. While it would never be released in MACfns (it's far too limited), it does illustrate our style of writing assembly code. Observe that assembly code uses the semicolon instead of a lamp for comments.

      GET 'GCDI'     ⍝ The code resides in an APL component file.
;   ∇ Z←L ∆GCDI R
; [1] Z←↑(L,R)⎕CALL ⍙GCDI ∇
CPULEV EQU 0
; Greatest common divisor of integers in EAX and EBX saved to EAX.
; The result is positive unless both arguments are 0 or ¯2*31.
; For Boolean arguments, (L ∆GCDI¨R)≡L∨R
; Copyright Sykes Systems, Inc. 9Nov2006/Roy ⊂MACfns⊃
     NEG   EAX
     JG    SHORT POS
     JZ    SHORT ZER
     NEG   EAX
POS: NEG   EBX       ; 1≤EAX≤2147483647 or EAX=¯2147483648
     JG    SHORT LP
     JZ    SHORT XIT
     NEG   EBX
LP:  XOR   EDX,EDX   ;⊤Euclidean algorithm calculates EAX=L GCD R
     DIV   EBX       ;∣(EAX EDX)←0 EBX⊤EAX  (ignore quotient EAX)
     MOV   EAX,EBX   ;∣divisor is result or next dividend    EAX
     TEST  EDX,EDX   ;∣EDX is remainder                       ÷
     MOV   EBX,EDX   ;∣remainder is next divisor if nonzero  EBX
     JNZ   SHORT LP  ;⊥12-byte/6-inst. loop
     RETN            ; exit
ZER: MOV   EAX,EBX   ; EAX was 0, so return ∣EBX
     NEG   EBX
     JL    SHORT XIT
     XCHG  EAX,EBX
XIT: RETN            ; exit

Here is how we would assemble GCDI into code to be released in MACfns. The machine code is stored in a global variable named ⍙GCDI, which is called by an APL function named ∆GDCI:

      CPL 'GCDI'     ⍝ Compile (assemble) code as global ⍙GCDI.
37 0 3 40
      ⍝ The result is some statistics about the size of the code.

      ⍙GCDI          ⍝ the generated machine code
2000042035 75487479 ¯654895244 75488247 ¯604563852 ¯201862575
      ¯762985589 ¯193602933 ¯138179645 ¯1828619045 195 1229210439
      538976288 2006110900

      ⎕←S←¯3↑⍙GCDI   ⍝ its last three integers
1229210439 538976288 2006083100
      (82 ⎕DR 2⍴S),(0,3⍴100)⊤↑⌽S  ⍝ decipher the suffix
GCDI     2006 8 31 0

The assembly process automatically inserts the leading signature value. It also appends the root name of the function, its timestamp, and its required CPU level (here for any IA-32 chip, such as an i386), all of which are documented standards of MACfns. The trailing 195, the machine code for RETN, also happens to be visible in ⍙GCDI; it usually is not.

⍙GDCI is an integer vector rather than a character vector. As long as its right argument is simple and homogeneous (numeric or character), ⎕CALL does not care about the rank, shape, or datatype of what it executes; it simply points the machine (via the instruction pointer) to the fifth byte of data (i.e., that immediately following the signature value) and lets it run. We chose the integer vector representation for MACfns because Booleans are too sparse, characters display sloppily (as we saw in MX and MC above) and are fragile (editing them can change them unexpectedly), and floating point can contain values that are not real numbers (such as infinities and NaN's, not-a-number) which can freeze APL when displayed or be changed unexpectedly by APL operations.

Below would be the APL cover function in MACfns. It has standard naming and calling conventions, cryptic documentation solely to assist memory (the full documentation is in a separate file, and would be named dGCDI), and a distinctively-constructed copyright notice on line [2] ('⍝∇' and 'Copyright' are separated by ∆AF 8 255 (or (¯1↓⎕AV),⎕TCBS), invisible here but available to code management systems).

    ∇ Z←L ∆GCDI R
[1]   ⍝∇Greatest common divisor of two integer scalars.
[2]   Z←↑(L,R)⎕CALL ⍙GCDI⍝∇Copyright 2006 Sykes Systems, Inc.
      9Nov2006 ⊂MACfns⊃
    ∇

Atypically, the numeric arguments in ∆GCDI are presented directly to ⎕CALL, and the result is [the first element of] that of ⎕CALL. Since ⎕CALL only allows up to seven integers in its left argument, and returns only eight, this method is extremely limiting. Furthermore, there is no error checking beyond that which ⎕CALL itself performs. Later we shall see how MACfns typically passes and checks data.

All assembly-based functions in MACfns also have one or more APL analog functions, which can be used for comparison, testing, study, or in migrations to other APL systems. Here is what the one for ∆GCDI might look like:

    ∇ Z←L aGCDI R
[1]   ⍝∇Greatest common divisor of two integer scalars.
[2]   ⍝∇Copyright 2006 Sykes Systems, Inc. 9Nov2006 ⊂MACfns⊃
[3]   ⍝ The result is an integer scalar, positive
[4]   ⍝ unless both arguments are 0 or ¯2147483648.
[5]   ⍝ ⎕ERROR(2≠⎕NC'L')/'VALENCE ERROR'
[6]   ⍝ ⎕ERROR(~1 1≡(⍴1/L),⍴1/R)/'RANK ERROR'
[7]   ⍝ ⎕ERROR(323≠⎕DR 2,L,R)/'DOMAIN ERROR'
[8]    L←↑L ⋄ Z←↑R
[9]   :repeat
[10]      R←Z
[11]  :until 0=L←(Z←L)∣R
[12]  :if Z≠¯2147483648  ⍝ ¯2*31
[13]      Z←∣Z
[14]  :endif
    ∇

We find they work identically, here used with each (¨):

      L←149+⍳11
      ⊃L (L ∆GCDI¨180) (L aGCDI¨180)
 150 151 152 153 154 155 156 157 158 159 160
  30   1   4   9   2   5  12   1   2   3  20
  30   1   4   9   2   5  12   1   2   3  20

but even in this small application ∆GCDI is about three times faster than aGCDI. For less numerically tractable arguments, such as

      L←506750217 2140522190 2025047149 1220210790 1857457613
      R←235578269 1766677554 1169568863  953647643  712205927
      L ∆GCDI¨R
1 2 1 1 1

∆GCDI is over fifteen times faster. Actually, the machine code is hundreds of times faster, but the ∆GCDI cover function and its ⎕CALL, catenate (L,R), and first (↑) take the vast majority of the time here; the each (¨) masks the difference even more.

Other Reasons Why MACfns Uses Assembly Code

This brings us to a second compelling reason we use assembly language: testing and iteration are hundreds to thousands of times faster in machine code than in APL, and several times faster than in compiled languages. Often no explicit tests at all are needed, the flags set by preceding instructions providing all the information needed. Special-case testing, for which the cost in APL must be carefully weighed against the expected gains, becomes far more feasible in assembly language. ∆UNTAB is typical of a function with many useful special cases for which the APL testing is far more expensive (to the point of not being justified) than the testing in assembly language.

More individualized treatment of particular cases also enables greater accuracy and speed. For example, our original prototype assembly code for the hyperbolic sine (∆SINH or 5○R) was only about twice as fast as APL and no more accurate. The time was dominated by the mathematical computation ((*R)-*-R)÷2, but careful numerical analysis disclosed a wealth of individual optimizations for different ranges (almost two dozen in all). Assembly language allows us to test scalar-by-scalar for all these cases, and invoke much faster and more accurate code for each. The result was that the minimum speedup increased to over four times, with speedups of ten or more times not uncommon, and rare cases exceeding 1,000 times.

Another powerful reason we use assembly language is that it affords us direct access to the data structures of APL variables and the internal representation of data. For example, ∆NELMe R computes the number of elements in each item of a nested (↑¨⍴¨,¨R). It is up to 20-400 times faster than APL not only because it can iterate vastly faster than each (¨), but also because the information is already available internally in each APL item. APL code must either ravel each item to obtain the number of elements (thus potentially copying the entire array), or perform expensive multiplications via ×/¨⍴¨R, which moreover fails if R is empty. ∆NELMe also moves far less data (see below).

Similarly, ∆REVA (which reverses all axes of an array) can be more than 100 times faster than APL on Boolean arrays because it can simultaneously manipulate the data at the bit, byte (8 bits), word (16 bits), or doubleword (32 bits) level. Likewise, ∆F2LOG R (⌊2⍟∣R) does not actually compute any expensive logarithms; instead, it directly analyzes the internal binary exponent of floating point numbers to achieve its 25-150 times speedup (you can do the same thing in APL, albeit more slowly, by using ⎕DR).

A fourth reason to use assembly language is avoidance, or anticipation, of data movement. Modern processors are increasingly memory-bound, which means the CPU is idling awaiting data to be fetched from main memory. The main architectural chip optimizations are prefetching anticipated data and the proliferation of caches -- primary, secondary, and even tertiary, both on-chip and off-chip -- to stage that data closer to the CPU. ∆DIVID (divide increment by decrement, or (R+1)÷R-1) moves data only once, whereas APL must move it thrice. The typical speed ratio of 2.5-5 times essentially reflects this reduced memory access more than any improvements we could make to division and addition. A substantial part of the speedup to ∆NELMe is also because large intermediate nested arrays (,¨ and ⍴¨) need not be created and destroyed.

A final reason we use assembly language is that it enables algorithms which would actually be more difficult to code, and much more expensive to run, in other languages, including APL. These tend to be high-iteration or bit-fiddling kinds of problems, such as encoding and decoding data for transmission, data encryption and compression, and one we have implemented in MACfns, ∆CRC (cyclic redundancy check). aCRC is not a trivial exercise in APL, but in assembly language the fundamental operation uses only four machine instructions per byte of data, and the resulting machine code runs 1,000-4,000 times faster than APL. There are other cases where assembly language has access to instructions, facilities, and hardware which are unavailable in other languages or APL.

Interpreter Support Services

∆GCDI uses the values of its arguments directly as the left argument to ⎕CALL; they therefore must be integers. Only eight functions in MACfns do so, including ∆CHKTS, which checks a single ⎕TS-form date or timestamp T, conveniently a seven-element integer vector. This method is extremely lean and fast; in fact the overhead of calling the APL cover function alone is about double the cost of invoking the ⎕CALL and executing the machine code. Thus, while ∆CHKTS T is about 3-4 times faster than even the most highly-optimized APL code (APL analog function aCHKTS1 [some functions in MACfns have multiple APL analog functions, some coded for clarity or pedagogy and some for speed]), expression ↑T ⎕CALL ⍙CHKTS is about 9-12 times faster, and most of that time is due to the overhead of the ⎕CALL and first (↑).

However, this method of passing arguments and returning results directly is generally too restrictive. The normal way MACfns does so is indirectly, by referencing the ⎕STPTR of the variable names:

      ⎕STPTR'R Z L'  ⍝ differs between workspaces
12 4 9

These values are arbitrary but unchanging and unique symbolic handles for each name in a workspace, and are independent of the class, referent, or value of the name. (They used to be symbol table pointers in APL+PC, hence the name ⎕STPTR.) APL2000(r) supplies a set of internal subroutine calls with APL+Win, called Interpreter Support Services, which enable assembly programs to use these indirect pointers to inquire about, establish, modify, and erase variables. Here is how MACfns uses them:

      L←6 ⋄ R←⍳10    ⍝ the arguments for upper triangular matrix
      (⎕STPTR'R Z L')⎕CALL ⍙UTRI  ⍝ run the machine code
0 10 0 0 ¯4 37389060 37311392 583
      Z              ⍝ the result has been created:
 1 2 3 4 5 6 7 8 9 10
 0 2 3 4 5 6 7 8 9 10
 0 0 3 4 5 6 7 8 9 10
 0 0 0 4 5 6 7 8 9 10
 0 0 0 0 5 6 7 8 9 10
 0 0 0 0 0 6 7 8 9 10

The ⎕STPTR's of the names of the right argument, result, and left argument are presented to ⎕CALL. ⎕CALL executes the machine code in ⍙UTRI, passing the ⎕STPTR handles to it in registers EAX, EBX, and ECX. The machine code calls APL2000's Interpreter Support Services with these handles to reference the arguments L and R and create the result Z, which it then fills in before returning back to APL.

This is all rather clumsy, which is why MACfns has APL cover functions to handle the interaction. Here is the one for ∆UTRI:

    ∇ Z←L ∆UTRI R
[1]   →⍙rzl⎕CALL ⍙UTRI⍝∇L-row upper triangular mat from vec R:
      R×[⎕IO+1](⍳L)∘.≤⍳⍴R
[2]   ⎕ERROR'VALENCE ERROR'⍝∇Copyright 2006 Sykes Systems, Inc.
      31Aug2006 ⊂MACfns⊃
[3]   ⎕ERROR'DOMAIN ERROR'
[4]   ⎕ERROR'RANK ERROR'
[5]   ⎕ERROR'WS FULL'
[6]   ⎕ERROR'NONCE ERROR'
[7]   ⎕ERROR'LIMIT ERROR'
    ∇

As in ∆GCDI, we see standard naming and documentation conventions, but here we encounter two new aspects. One is the use of ⍙rzl on line [1] instead of ⎕STPTR'R Z L' as the left argument of ⎕CALL; ⍙rzl is a global variable which is preset (usually during workspace initialization in ⎕LX) as

      ⍙rzl←⎕STPTR'R Z L

We use ⍙rzl because computing ⎕STPTR'R Z L' is fairly expensive, since it must look up (and if necessary enter) the names in the symbol table. In this case, executing ⎕STPTR'R Z L' takes about as much time as executing ⍙rzl ⎕CALL ⍙UTRI, so using ⍙rzl instead effectively increases the speed of ∆UTRI by about 50% in small cases (the balance being the overhead of the ∆UTRI function call, localizing its names, and the branch). To minimize the number of ⎕STPTR variables, the order of the ⎕STPTR's presented to MACfns is always the same: right argument, result, {left argument, {temporary local}}. Thus only a maximum of four are needed.

The other new aspect is the branch on line [1] and the error signaling on lines [2] to [7]. Just before returning to APL, ⎕CALL loads a return code into the EAX register, which is the first element of the result of ⎕CALL, to which APL branches, normally to 0 (end the function). This lean mechanism is the method MACfns uses to signal errors. MACfns also sometimes uses the mechanism to branch to a line of APL code to complete a task, or to deal with cases for which the machine code is not prepared. A notable advantage of this method is that all possible errors are listed visibly in the body of the APL cover function.

The Interpreter Support Services do have several weaknesses which we try to circumvent as much as possible in MACfns. They are relatively slow; they perform inadequate error checking; they have undocumented behavior and return codes; they impose small and arbitrary size limits; and they lack a number of useful facilities. Nonetheless, on the whole they are adequate to our needs for MACfns development.

The Disadvantages of Assembly Language

There are two main disadvantages to assembly language:

  1. It is cumbersome, hard to write, slow (to write), and fraught with mystery.

    APL programmers would be aghast at the primitiveness of assembly language and its facilities. The following code fragment is a loop [which is the only way] to add two arrays:

    L22: MOV   EAX,[ESI] ; get an item from L
         ADD   ESI,4
         MOV   EDX,[EBX] ; get an item from R
         ADD   EBX,4
         ADD   EAX,EDX   ; add them
         JO    SHORT OVF ; jump if overflow
         MOV   [EDI],EAX ; store the result in Z
         ADD   EDI,4
         DEC   ECX
         JNZ   SHORT L22
    

    We need ten instructions for each item in this 22-byte loop. Six out of the seven general-purpose registers are used (only EBP is available) -- heaven forbid we need do something more complicated than addition. It only handles the case when both arrays and the result are of integer datatype (⎕DR type 323), and of the same size, which is but one of at least 14 simple cases which APL gracefully accommodates (not to mention nested arrays). The code does not show the machinations at label OVF needed to handle integer overflow (blowup to floating point).

    As does APL, assembly language offers many ways of doing things. Below is another way of expressing the same loop:

    L23: LODSD           ; get an item from L
         ADD   EAX,[EBX] ; add it to R
         JO    SHORT OVF ; jump if overflow
         ADD   EBX,4
         STOSD           ; store the result in Z
         LOOP  L23
    

    While this loop is only six instructions and 11 bytes, it still uses five of seven registers (EDX is now also free). However, it is also inexplicably (if one is not intimately familiar with the specifications of modern processors) much slower. Both loops would be prefaced in MACfns by about 200 handcrafted assembly statements to initialize, check, and prepare everything, and the Interpreter Support Services called doubtless have many more. Finally, both loops shown above, while operational, have deficiencies; the actual loop in MACfns would be more complex (and faster).

    Of course, it's not quite as bad as it first seems. We have accumulated quite a set of tools and macros (bodies of drop-in code) during the development of MACfns and our work for clients. Nonetheless, new challenges arrive constantly. Furthermore, one of the key advantage of MACfns is speed; simply-adequate working code (like the loops above) is not sufficient.

    A major mitigating factor in our development in assembly language is that we use APL. We have many tools and utilities written in APL (such as CODE, HEX, and RUN that we used above) which assist in our development; MACfns itself is a vital component. We use APL in algorithm design, modeling, timing, testing, and verification. We use it to explore and verify properties of the chips, workspace structure, and data representations. We even use MACfns and APL to optimize our assembly language, bootstrapping their capabilities to enhance their capabilities.

  2. Assembly language is machine dependent.

    Although MACfns is currently based on Intel Architecture, its dominance lessens our concern about migration to other platforms (no one has approached us yet). We provide APL analog functions (often more than one) for every function in MACfns to enable our clients to migrate to other APL systems and hardware architectures.

    By machine dependence we mean the evolution of the microprocessors on which IA runs. Each new optimization Intel or AMD introduces is one we must digest and consider incorporating into MACfns; the timing differences in the loops above are an illustration. Given limited resources, we can incorporate only some. As we alluded to in the beginning of this paper, we have not yet addressed hyperthreading or multi-core processors. The APL+Win platform itself, stable for many years, may introduce other changes with which we must cope; APLNext(r) is yet another consideration.

    Were MACfns written in a higher-level language, these changes would be much simpler, some automatically performed by a compiler. On the other hand, the utter reliability of MACfns (no one has ever reported a bug), its high-precision accuracy, its speed, and in some cases even its functionality, would be compromised by depending on notoriously fickle compilers.

Limitations of MACfns

Because it is based in IA-32 assembly language, MACfns runs only on 32-bit Intel Architecture machines, which includes Intel processors such as the Pentium, Celeron(r), Xeon(r), and Core Duo, and AMD processors such as the Athlon(r) and Opteron(r). It also runs on 64-bit IA machines using the APL+Win 32-bit interpreter. It does not run on RISC processors or mainframes.

Because MACfns uses APL+Win Interpreter Support Services, and incorporates knowledge of the structure of variables in an APL+Win workspace, it runs only under APL+Win (or APL+DOS). It does not run under APLNext, APL+UNIX(r), APL2(r), APLX(r), or Dyalog APL(r). However, MACfns has no direct dependencies on the Windows operating system; thus (a slightly different, but fully compatible) version of MACfns also runs in APL+DOS under the DOS operating system.

Benefits of MACfns

MACfns has been developed over the last 19 years by Sykes Systems, Inc., with its APL antecedents going back over 35 years. It is written by and specifically for APL programmers, and is thus easily understood by them. While the advantages of assembly code are a major contributor to its extraordinary performance, it is the thoughtful design, attention to detail, careful programming, and literate documentation which truly distinguish it. MACfns is in some ways a set of superior primitive functions which improves both programmer and machine productivity. We can summarize the benefits of MACfns as follows:

  1. Faster than APL+Win, typically from a few times to an order of magnitude or more. Also typically 2-8 times faster, more general, and handling larger arguments, than equivalent functions in the 'ASMFNS' workspace distributed with APL+Win.
  2. Uses less storage by detecting identities, avoiding numeric promotion, consolidating pointers, and using less (usually no) intermediate storage. Avoids WS FULL and storage thrashing.
  3. Sometimes accepts larger arguments and produces larger array results, and handles arrays of all rank and shape, avoiding LIMIT ERROR's; all size limits are documented, including those for corresponding APL expressions.
  4. Extended, and documented, domain and range for floating point calculations, avoiding LIMIT ERROR's and improving accuracy.
  5. Greater accuracy via improved algorithms, use of extended precision hardware, and careful attention to floating point precision considerations, intermediate calculations, and rounding.
  6. Generality in the definition of the utility functions to increase their usefulness in more situations.
  7. Extensive customization options, wherein documented locations within the machine code may be changed for useful effect; the APL cover functions may also be customized.
  8. Improved, consistent, and fully-documented error handling.
  9. Well-written APL analog functions to complement the APL cover functions and machine code. These are useful in testing and timing, during migration to other systems, and for studying APL technique. Multiple techniques are often provided.
  10. Extensive and complete documentation, including identities, limits, errors, related functions, and examples (430 pages in all).
  11. Reliable; never unexpected performance, undocumented errors, or catastrophic termination of the APL session. No bugs have ever been reported, although we have uncovered some and have reported them to our users.

Summary

We have described characteristics of assembly code, and how and why we use it for MACfns. We encourage you to explore MACfns, and to give us suggestions on future direction (the queue is always growing, but the order changes). We thank those of you who have purchased and are using MACfns. Your financial support enables continuing improvement and new capabilities in the future.

Attached to this paper are several documents provided with MACfns Release 3.0 (31Aug2006) which may be of general interest, or which augment the information presented in this paper. The papers in the APL2000 User Conference notebooks from 2002-2005 also provide particularly comprehensive overviews of MACfns.

-----------------------------------------------------------------

AMD, Athlon, and Opteron are trademarks of Advanced Micro Devices, Inc. APLNext, APL2000, APL+UNIX, and APL+Win are trademarks of APLNow LLC. Dyalog APL is a trademark of Dyalog Ltd. Celeron, Core Duo, i386, i486, Intel, Itanium, Pentium, and Xeon are trademarks of Intel Corporation. APL2, IBM, PowerPC, and z9 are trademarks of International Business Machines Corporation APLX is a trademark of Micro APL Ltd. Windows is a trademark of Microsoft Corporation. SPARC is a trademark if Sparc International, Inc. UNIX is a trademark of X/Open Company, Ltd.

-----------------------------------------------------------------

Sample APL Analog Function From MACfns

    ∇ Z←aCHKTS1 R;A;B
[1]   ⍝∇Check ⎕TS-form timestamp.
[2]   ⍝∇Copyright 2006 Sykes Systems, Inc. 31Aug2006 ⊂MACfns⊃
[3]   ⍝ Limits are 1800 1 1 0 0 0 0 to 2200 12 31 23 59 59 999
[4]   ⍝   (algorithm okay from 1600 to 3599).
[5]   ⍝ aCHKTS1  implicitly checks and engenders errors (like ∆CHKTS).
[6]   ⍝ aCHKTS   explicitly checks and signals errors (unlike ∆CHKTS).
[7]   ⍝ slowest→fastest:  aCHKTS, aCHKTS1, ∆CHKTS, ↑R ⎕CALL ⍙CHKTS
[8]    (R B A)←3⍴Z←7⍴R ⎕CALL'386wÃ' ⍝ check rank, length, domain; pad
[9]                                 ⍝ absolute year limits 1600-3599:
[10]  :if Z←Z≡1800 1 1 0 0 0 0⌈2200 12 31 23 59 59 999⌊Z ⍝ assure range
[11]  :andif A>28                   ⍝ of valid dates, done for 91.99%
[12]  :andif A>0 31 28 31 30 31 30 31 31 30 31 30 31[⎕IO+B]   ⍝ 7.94%
[13]      Z←29 2 0≡A,B,=/×4 100 400∣R                         ⍝  .07%
[14]  :endif
    ∇

-----------------------------------------------------------------

Examples of Documentation from MACfns

MACfns                                                  31Aug2006

      Z←∆AF R                                     Atomic function

maps characters in ⎕AV to integers 0-255 or vice versa.  The name
is derived from ⎕AF in IBM's APL2 product.  ∆AF is useful in
avoiding clumsy constructions in documentation and code.  For
example, instead of referring to the Euro character as ⎕AV[⎕IO+2]
(€), or setting ⎕IO and using ⎕AV[2] or ⎕AV[3], or explaining
elsewhere that ⎕IO has a particular value, ∆AF 2 always suffices.

     If R is character, ∆AF is equivalent to (⎕AV⍳R)-⎕IO or
(⍴R)⍴↑82 323 ⎕DR R.  If R is numeric, ∆AF is equivalent to
⎕AV[⎕IO+R] or (⍴R)⍴↑((⎕DR R),82)⎕DR R, so all items of R must
be integers 0-255 (else ∆AF signals an INDEX or DOMAIN ERROR).
The shape of the result is that of the argument, and its type
is integer or character.  ∆AF is equivalent to [the APL coded,
not assembly coded] function AV from the 'ASMFNS' workspace
supplied by APL2000 with APL+Win version 3.6.02.  ∆AF differs
from APL+Win version 3.6.02 in the following respects:

1.  Integer tolerance for floating point is slightly different,
    and more consistent, in ∆AF than in APL.

2.  ∆AF needs space only for its result, but the equivalent APL
    expressions need from 3% more to nine times the space.

3.  ∆AF is up to 2.5-8 times faster than (⎕AV⍳R)-⎕IO for [almost]
    any nonempty character array.  ∆AF is up to 4-8 times faster
    than ⎕AV[⎕IO+R] for 70 or more integers, 10-60 times for 50
    or more Boolean items, and 10-25 times for three or more
    floating point items.  ∆AF is always faster than AV.

4.  APL and AV can process up to 214,748,352 items.  ∆AF R can
    process more, depending on the datatype of R as shown in the
    following table, which assumes R is a vector:
                                                        ∆AF
      Datatype of R   Expression    Alternative   Maximum items
      character       (⎕AV⍳R)-⎕IO  ↑82 323 ⎕DR R    268,000,126
      Boolean         ⎕AV[⎕IO+R]   ↑ 11 82 ⎕DR R  1,072,000,504
      integer         ⎕AV[⎕IO+R]   ↑323 82 ⎕DR R    429,496,719
      floating point  ⎕AV[⎕IO+R]   ↑645 82 ⎕DR R    238,609,288

Errors:
DOMAIN ERROR   The argument is nested or heterogeneous.
               An item is from 0 to 255 but is not an integer.
INDEX ERROR    An item is negative or exceeds 255.
LIMIT ERROR    The result size would exceed 1,072,000,528 bytes
                   (see the table above).

Related functions:  ∆CDR (change data representation),
                    ∆XLATE (L[⎕AV⍳R]), ∆AVEPS (⎕AV∊R).
Examples:

      ∆AF 2 3⍴'MACfns'
  77  65  67
 102 110 115

      ∆AF 77 65 67 102 110 115 33
MACfns!


(Documentation for WS FULL, which is common to almost all MACfns
functions, is described elsewhere, and is not repeated in the
detailed documentation of each function.  The documentation is
generally formatted for 60-line pages.)
-----------------------------------------------------------------
MACfns                                                  31Aug2006

      Z←∆SINH R                                   Hyperbolic sine

is equivalent to 5○R, returning the hyperbolic sine of R.
The magnitude of R must not exceed 1025×⍟2 (about 710.47586).
The magnitude of the result is greater than or equal to that of
the argument, and the sign of the result is that of the argument.
The hyperbolic sine is the reciprocal of the hyperbolic cosecant
(∆RECIP ∆CSCH).  ∆SINH is the odd component of the exponential
function.  The even component is ∆COSH (6○R), so (for
(∣R)≤1024×⍟2, about 709.7827) (∆SINH R)+∆COSH R approximates
∆EPOW R (*R).  ∆SINH is the inverse of ∆ASINH (¯5○R); that is,
R≡∆SINH ∆ASINH R and (for (∣R)≤1025×⍟2) R≡∆ASINH ∆SINH R.  ∆SINH
differs from APL+Win version 3.6.02 in the following respects:

1.  ∆SINH is slightly more accurate than 5○R, and always
    preserves the identities (∆SINH R)=-∆SINH-R and (×∆SINH R)≡×R
    and (∆SINH∣R)≥∣R (5○R sometimes does not).  It also uses a
    different algorithm for integers than for floating point;
    the integer algorithm is faster and returns slightly more
    accurate results.

2.  5○R returns 0 for values of magnitude less than 2*¯1022
    (about 2.2251E¯308); ∆SINH returns these values.

3.  5○R signals a DOMAIN ERROR if the magnitude of any item
    exceeds 1025×⍟2; ∆SINH signals a LIMIT ERROR.

4.  If R is empty, 5○R returns an integer result; otherwise,
    it returns a floating point result.  If R is empty or Boolean
    and all 0's, ∆SINH returns Boolean 0's; otherwise, ∆SINH also
    returns a floating point result.

5.  ∆SINH is faster than APL for 6-10 or more items, up to 4-12
    times for floating point, and 10-16 times for integer or
    Boolean.  It makes no copy (and uses no space) if R is all
    Boolean 0's, and so can exceed 1,000 times faster.

6.  APL can process up to 134,217,723 floating point items,
    178,956,964 integers, or 214,748,352 Boolean items.  ∆SINH
    can process up to 134,000,000 items (unlimited if R is all
    Boolean 0's).  APL can process nested arrays; ∆SINH cannot.

Errors:
DOMAIN ERROR   The argument is character and not empty.
LIMIT ERROR    The argument has more than 134,000,000 items and
                   is not all Boolean 0's.
               An item exceeds magnitude 710.47586007394386.
NONCE ERROR    The argument is nested or heterogeneous.

Related functions:  ∆CSCH (÷5○R), ∆COSH (6○R), ∆TANH (7○R),
                    ∆ASINH (¯5○R), ∆ACSCH (¯5○÷R), ∆EPOW (*R),
                    ∆GD (¯3○5○R).
Examples:

      A←¯1 0 .881373587 1 1.443635475 2
      ⊃A (∆SINH A)
 ¯1           0 0.881373587 1           1.443635475 2
 ¯1.175201194 0 1           1.175201194 2           3.626860408

      R←3 3.951613336 3.989326806 4 4.025670416
      ⊃R (∆SINH R)
  3           3.951613336  3.989326806  4          4.025670416
 10.01787493 26           27           27.2899172 28

      Z←(∆HALF-⌿∆EPOW 1 ¯1∘.×R) (∆HALF ∆SUBR ∆EPOW R) (-∆SINH-R)
      D←¯1+2×⍳85          ⍝ odd coefficients 1 3 5 ... 169
      Z←Z,((R∘.*D)+.÷!D) ((∆EPOW R)-∆COSH R) (∆RECIP ∆CSCH R)
      Z∊⊂∆SINH R          ⍝ identities
1 1 1 1 1 1

      G←.5×1+5*.5         ⍝ Golden ratio
      H←.2 .5,G,2 5 ⋄ ⊃H (⍟H) (∆SINH ⍟H)
  0.2          0.5          1.618033989  2            5
 ¯1.609437912 ¯0.6931471806 0.4812118251 0.6931471806 1.609437912
 ¯2.4         ¯0.75         0.5          0.75         2.4
-----------------------------------------------------------------
MACfns                                                  31Aug2006

      Z←L ∆UTRI R                         Upper triangular matrix

creates an upper triangular matrix having L rows and ⍴R columns.
L must be a nonnegative integer scalar or one-item vector, and R
must be a simple numeric (or empty character) vector.  When R is
Boolean and all 1's, then ∆UTRI is equivalent to the following
expression, which generates an upper triangular Boolean histogram:

      (⍳L)∘.≤⍳⍴R

However, in general ∆UTRI multiplies each row of this histogram
by its vector right argument, and is equivalent to the following:

      R×[⎕IO+1](⍳L)∘.≤⍳⍴R
or
      ((⍳L)∘.≤⍳⍴R)×(L,⍴R)⍴R

The shape of the matrix result is L,⍴R.  The type of the result
is Boolean if empty, that of the right argument otherwise (APL
returns empty results as integer).

     If the result is not empty, ∆UTRI is always faster than the
faster (which can differ) of the two expressions above in APL+Win
version 3.6.02.  L ∆UTRI R⍴1 is faster than (⍳L)∘.≤⍳⍴R if the
result has 75 or more items (but usually less).  Speedups vary
widely, ranging from 5-7 times faster for 500-item results up to
several hundred times for large Boolean results, and are greatest
for Boolean right arguments and smallest for floating point.
When L exceeds ⍴R, speedups grow as L increases relative to ⍴R.

     ∆UTRI needs space only for its result; APL needs 2-33 times
the space.  APL can return up to 214,748,352 items; ∆UTRI can
return up to 134,000,000 items.  APL can process nested arrays;
∆UTRI cannot.

Errors:
DOMAIN ERROR   The left argument is not a nonnegative integer,
                   or exceeds 2,147,483,647.
               The right argument is character and not empty.
LIMIT ERROR    The result would contain more than 134,000,000
                   items (134E6<L×⍴R).
NONCE ERROR    The right argument is nested or simple
                   heterogeneous.
RANK ERROR     The left argument is not a scalar or one-item
                   vector.
               The right argument is not a vector.
VALENCE ERROR  No left argument is supplied.

Related functions:  ∆opAND (L∘.^R).

Examples:

      5 ∆UTRI¨(5⍴1) (9 .2 30),⍳¨5 10
  1 1 1 1 1    9 0.2 30    1 2 3 4 5    1 2 3 4 5 6 7 8 9 10
  0 1 1 1 1    0 0.2 30    0 2 3 4 5    0 2 3 4 5 6 7 8 9 10
  0 0 1 1 1    0 0   30    0 0 3 4 5    0 0 3 4 5 6 7 8 9 10
  0 0 0 1 1    0 0    0    0 0 0 4 5    0 0 0 4 5 6 7 8 9 10
  0 0 0 0 1    0 0    0    0 0 0 0 5    0 0 0 0 5 6 7 8 9 10
-----------------------------------------------------------------
MACfns                                                  31Aug2006

                             ASMFNS


     Several MACfns are replacements for functions contained in
the 'ASMFNS' workspace supplied by APL2000 with APL+Win version
3.6.02.  All are faster and handle larger arguments and return
larger results; several offer other advantages such as greater
generality or customizability.

     The following table lists the functions from the 'ASMFNS'
workspace which have direct MACfns replacements, and summarizes
the differences.  See the documentation in the individual MACfns
detailed descriptions for more details.  Those noted as "(APL)"
are APL coded, not assembly coded, functions; these were at one
time written in assembly language (either for APL+PC or APL+DOS),
but were never rewritten for APL+Win.

     The numbers in the column entitled "Typical Speedup Ratio"
represent the cpu time used by the 'ASMFNS' function divided by
the cpu time used by the MACfns function in a variety of common
cases.  These ratios tend to be conservative.  The numbers in the
column entitled "Maximum Result Size Ratio" are the maximum ⎕SIZE
of the result of the MACfns function divided by the maximum ⎕SIZE
of the result of the 'ASMFNS' function.  All ratios are rounded.
In both cases, higher is better (MACfns being more advantageous).

                               Maximum
Workspace             Typical  Result
'ASMFNS'   MACfns     Speedup   Size
  Name      Name       Ratio    Ratio  Other benefits/differences
---------  ---------  -------  ------  --------------------------
AV (APL)   ∆AF        2.5-8     1.1-5  uses less space

DTBR       ∆DTBR        5-25    5      customizable fill,
(APL)                                  identity detection

INDEX1     ∆INDEX_    1.5-10    .5-10  uses less space
(APL)

LJUSTIFY   ∆LJUST      10-30    5      customizable fill,
(APL)                                  identity detection

LOWERCASE  ∆LCASEDOS    2-3     2.5    (none)

MATtoSS    ∆MATSS       2-3     8      specifiable (or no) fill,
                                       customizable defaults

RJUSTIFY   ∆RJUST      10-30    5      customizable fill,
(APL)                                  identity detection

⍙RPL       ∆TXTRPL      3-8     5      uses less space
(APL)

SSLEN      ∆SSLENS      4-8     1.25   uses less space
(APL)

SSSHAPE    ∆SSSHAPE     4-6    (same)  uses less space
(APL)

SStoMAT    ∆SSMAT     1.5-2.5   8      specifiable fill,
                                       customizable options

TEXTREPL   ∆TXTRPL     .2-4     8      result sometimes differs

TRANSLATE  ∆XLATE     1.5-3     2.5    (none, but ∆XLATE requires
                                        L to have 256 elements)
           :if L
⍙UCASEX    ∆UCASEDOS    2-3     2.5    (none)
UPPERCASE  ∆UCASEDOS

WHERE      ∆INDS      1.3-6     1.25   handles all datatypes
WHERE R∊0  ∆ZNDS

WORDREPL   ∆WRDRPL      5-20    5      uses less space, result
(APL)                                  often differs

     The following functions in 'ASMFNS' currently have no MACfns
equivalent:

APL coded:  DLB       DLTB        DTB     NBLENGTH  SSASSIGN
            SSCAT     SSCOMPRESS  SSDEB   SSDLB     SSDLTB
            SSDROP    SSDTB       SSFIND  SSINDEX   SSTAKE
            SSUNIQUE  TELPRINT

assembler-based:  DEB  DIV  OVER  ROWFIND  ∆∆VR

We welcome suggestions as to which of these (or others) would be
useful to you.

MACfns Assembler Functions

MACfns and files

GetMACfns Get MACfns, assembler code, documentation, or APL analog.
MACfnsFid Return name of file used by GetMACfns (var or niladic fn).
∆CPULEVEL CPU level for MACfns.
∆DOSHAND  Convert tie number to DOS handle.

Dates

∆CHKTS    Check ⎕TS-form timestamp.
∆DATEBASE Compute days since 1Jan1900 from YYYY MM DD dates.
∆DATEPACK Pack YYYY MM DD dates as scalars {per L={0-5}}.
∆DATEREP  Compute YYYY MM DD dates from days since 1Jan1900.
∆DATEUNP  Unpack scalars as YYYY MM DD dates {per L={0-5{,cuspyear}}}.
∆DATE2BAS Compute days since 1Jan1900 from two YYYY MM DD dates.
∆DATE2REP Compute YYYY MM DD dates from two days since 1Jan1900.
∆DOSTS    Check and pack ⎕TS-form timestamp as scalar.
∆DOST     Unpack scalar into ⎕TS-form timestamp.
∆CHK2DS   Check two 3↑⎕TS-form datestamps.
∆DOS2DS   Check and pack two 3↑⎕TS-form datestamps as scalar.
∆DOS2D    Unpack scalar into two 3↑⎕TS-form datestamps.

Datatype manipulation and internal representation

∆ALLOC    Allocate arbitrary array of ⎕DR type L and shape R.
∆AF       Atomic function:  ⎕AV[⎕IO+R] or (⎕AV⍳R)-⎕IO
∆CDR      Change data representation:  L ⎕DR R ⍝ singleton L
∆BREPI    32-column Boolean representation of integer:  ⍉1=(32⍴2)⊤⍉R
∆IREPB    Integer representation of 32-col Boolean:  ⍉(0 ¯2,30⍴2)⊥⍉R
∆ENDIAN   Reverse byte order of integer or floating point values.
∆COERCE   Coerce numeric or empty R to type L∊11 323 645.
∆DEMOTE   Demote numeric items to most compact representation.
∆CRC      Cyclic redundancy 32-bit check for simple R.
∆DRe      Data representation each:  ⎕DR¨R
∆DRNe     Data representation each, negating nested:  (⎕DR¨R)ׯ1*×≡¨R

Character translation

∆AF       Atomic function:  ⎕AV[⎕IO+R] or (⎕AV⍳R)-⎕IO
∆BTCCS    Blank terminal control characters.
∆LCASE    Translate to lowercase.
∆LCASEDOS Translate to lowercase for DOS.
∆UCASE    Translate to uppercase.
∆UCASEDOS Translate to uppercase for DOS.
∆XLATE    Translate character R to 256-character L:  L[⎕AV⍳R]

Character searching and changing

∆AF       Atomic function:  ⎕AV[⎕IO+R] or (⎕AV⍳R)-⎕IO
∆AVEPS    Flag membership of all characters:  ⎕AV∊R
∆AVNUB    Distinct characters:  ⎕AV~⎕AV~R
∆AVFREQ   Frequency of all characters:  +/⎕AV∘.≡,R
∆BEQ      Flag blanks (' '=simple):  R∊' '
∆BNE      Flag nonblanks (' '≠simple):  ~R∊' '
∆NLEQ     Flag newlines (⎕TCNL=simple):  R∊⎕TCNL
∆NLNE     Flag non-newlines (⎕TCNL≠simple):  ~R∊⎕TCNL
∆NULEQ    Flag nulls (⎕TCNUL=simple):  R∊⎕TCNUL
∆NULNE    Flag non-nulls (⎕TCNUL≠simple):  ~R∊⎕TCNUL
∆STS      Flag start of matches of string R in string L:  L ⎕SS R
∆STSNOV   Flag start of nonoverlapped matches of string R in string L.
∆STSF     First index of string R in string L or ↑↓/L if (2∊⍴L)^⍬≡0⍴L.
∆TXTRPL   Replace strings L ('/old1/new1/old2/new2...') in string R.
∆WRDRPL   Replace words L ('/old1/new1/old2/new2...') in string R.
∆UNTAB    Replace tabs in string R {L={tabinc{,tab{,fill{,⊂dels}}}}}.
∆NLLF     Insert LF's after NL's in string, or NL,LF's between rows.
∆XLATE    Translate character R to 256-character L:  L[⎕AV⍳R]

Character data restructuring

∆DTBR     Delete trailing blank rows:  (⌽∨\⌽R∨.≠' ')⌿R
∆CJUST    Center justify character:  (⌈.5×-⌿+/^\' '=⊃R(⌽R))⌽R
∆LJUST    Left justify character:  (+/^\' '=R)⌽R
∆RJUST    Right justify character:  (+/∨\' '≠⌽R)⌽R
∆MATDS    Matrix to delimited string {L={delimiter{,{fill∣¯1}}}}.
∆MATSS    Matrix to segmented string {L={delimiter{,{fill∣¯1}}}}.
∆MATNV    Character matrix to nested vector {L={fill∣¯1}}.
∆DSMAT    Delimited string to matrix {L={delimiter{,fill}}}.
∆DSMATSZ  Delimited string to matrix size {L={delimiter{,ignored}}}.
∆SSMAT    Segmented string to matrix {L={fill}}.
∆SSMATSZ  Segmented string to matrix size {L=ignored}.
∆NVMAT    Nested vector to character matrix {L=fill}:  ⊃R
∆SSLENS   Segment lengths in segmented string.
∆SSSHAPE  Number of segments in segmented string:  +/R=1↑R

Formatting

∆I11FMT   Integer 11-column formatting:  'I11'⎕FMT simple
∆I12FMT   Integer 12-column formatting:  'I12'⎕FMT simple

General data restructuring

∆CAT_     Catenate along the first axis:  L⍪R
∆COLMAT   Coalesce trailing (all but the first) axes to form matrix.
∆ROWMAT   Coalesce leading (all but the last) axes to form matrix.
∆ONECOL   Coalesce all axes to form one-column matrix:  ,[⍬],R
∆ONEROW   Coalesce all axes to form one-row matrix:  ,[⎕IO-.5],R
∆REVA     Reverse all axes (matrix ⊖⌽R):  (⍴R)⍴⌽,R
∆EXTEND   Truncate or pad raveled R with last item to shape L.
∆ENCLOSEe Enclose each (⊂[⍬]R):  ⊂¨R
∆SOSNV    Scalar or simple to nested vector:  ⍎(1=≡R←1/R)/'R←,⊂R'

Data selection

∆INDEX_   First axis indexing (matrix L[R;]):  (⊂R)⌷[⎕IO]L
∆SQUAD_   First axis indexing (matrix R[L;]):  (⊂L)⌷[⎕IO]R

Indices

∆INDS     Indices of nonzeros in vector (bitvec R/⍳⍴R):  (~R∊0)/⍳⍴,R
∆ZNDS     Indices of zeros in vector (bitvec (~R)/⍳⍴R):  (R∊0)/⍳⍴,R
∆ODOMETER Generalized odometer:  ⊃,↑∘.,/,R ⍝ nested array of intvecs
∆DTBLENS  Length of rows sans trailing blanks:  +/∨\' '≠⌽R
∆ROWFIND  Flag rows of character L containing vector R:  ∨/R⍷1/L
∆ROWIOTA  Locate rows of R in matrix L; ⎕IO-1 if not found ⍝ ⎕CT=0
∆ROLL     Random integers from ⍳¨R (?R) or of shape L from ⍳R (?L⍴R).
∆SSLENS   Segment lengths in segmented string.

Logical functions

∆NOT      Logical negation:  ~R ⍝ bit
∆AND      Logical and:  L^R ⍝ bit
∆ANDNOT   Logical and not:  L>R ⍝ bit
∆OR       Logical or:  L∨R ⍝ bit
∆ORNOT    Logical or not:  L≥R ⍝ bit
∆NAND     Logical not and:  L⍲R ⍝ bit
∆NANDNOT  Logical not and not:  L≤R ⍝ bit
∆NOR      Logical not or:  L⍱R ⍝ bit
∆NORNOT   Logical not or not:  L<R ⍝ bit
∆XOR      Logical exclusive or:  L≠R ⍝ bit
∆NXOR     Logical not exclusive or:  L=R ⍝ bit
∆opAND    Outer product logical and:  L∘.^R ⍝ bit

Relational functions

∆MATCH    Flag if arrays identical:  L≡R ⍝ ⎕CT=0
∆EMATCH   Flag if arrays identical (L≡R with ⎕CT=0), including ⎕DR.
∆ALL      Flag if all ones (^/,bit, ^/,1=simple):  ~0∊R∊1 ⍝ ⎕CT=0
∆ANY      Flag if any ones:  1∊R ⍝ ⎕CT=0
∆ANYe     Flag if any ones in each (1=simple):  1∊¨R ⍝ ⎕CT=0
∆NALL     Flag if not all ones (~^/,bit, ∨/,1≠simple):  0∊R∊1 ⍝ ⎕CT=0
∆NONE     Flag if no ones:  ~1∊R ⍝ ⎕CT=0
∆ALLZ     Flag if all zeros (~∨/,bit, ^/,0=simple):  ~0∊R∊0
∆ANYZ     Flag if any zeros:  0∊R
∆NALLZ    Flag if not all zeros (∨/,bit, ∨/,0≠simple):  0∊R∊0
∆NOZ      Flag if no zeros:  ~0∊R
∆EVEN     Flag even values:  0=2∣⌊∣R ⍝ ⎕CT=1<≡R
∆ODD      Flag odd values:  0≠2∣⌊∣R ⍝ ⎕CT=1<≡R
∆ZEQ      Flag zeros (0=simple):  R∊0
∆ZNE      Flag nonzeros (0≠simple):  ~R∊0
∆ZGT      Flag negative items:  0>R ⍝ 1≥≡R
∆ZLT      Flag positive items:  0<R ⍝ 1≥≡R
∆ZLE      Flag nonnegative items:  0≤R ⍝ 1≥≡R
∆ZGE      Flag nonpositive items:  0≥R ⍝ 1≥≡R
∆BEQ      Flag blanks (' '=simple):  R∊' '
∆BNE      Flag nonblanks (' '≠simple):  ~R∊' '
∆NLEQ     Flag newlines (⎕TCNL=simple):  R∊⎕TCNL
∆NLNE     Flag non-newlines (⎕TCNL≠simple):  ~R∊⎕TCNL
∆NULEQ    Flag nulls (⎕TCNUL=simple):  R∊⎕TCNUL
∆NULNE    Flag non-nulls (⎕TCNUL≠simple):  ~R∊⎕TCNUL
∆NVEC     End partition vector (flag end of runs); ⎕CT=1<≡R.
∆PVEC     Partition vector (flag start of runs); ⎕CT=1<≡R.

Sign manipulation

∆SIGN     Sign (¯1 if negative, 1 if positive, or 0):  ×R ⍝ 1≥≡R
∆ABS      Absolute value (magnitude):  ∣R ⍝ 1≥≡R
∆NABS     Negate absolute value (negated magnitude):  -∣R ⍝ 1≥≡R
∆FABS     Floor absolute value (integer magnitude):  ⌊∣R ⍝ ⎕CT=1<≡R
∆NEG      Negate (change sign):  -R ⍝ 1≥≡R
∆NOT      Logical negation:  ~R ⍝ bit
∆NPOW     Negative one power (1=even, ¯1=odd):  ¯1*R ⍝ int
∆NNPOW    Negate negative one power (¯1=even, 1=odd):  -¯1*R ⍝ int
∆ONEMAX   Higher of one or R:  1⌈R ⍝ 1≥≡R
∆ONEMIN   Lower of one or R:  1⌊R ⍝ 1≥≡R
∆ZMAX     Higher of zero or R (zero negative items):  0⌈R ⍝ 1≥≡R
∆ZMIN     Lower of zero or R (zero positive items):  0⌊R ⍝ 1≥≡R
∆HILO     Highest and lowest:  (⌈/,R),⌊/,R ⍝ ⎕DR-based if empty

Rounding

∆CEIL     Ceiling (lowest integer not below):  ⌈R ⍝ ⎕CT=1<≡R
∆FLOOR    Floor (highest integer not above):  ⌊R ⍝ ⎕CT=1<≡R
∆FABS     Floor absolute value (integer magnitude):  ⌊∣R ⍝ ⎕CT=1<≡R
∆FHALF    Floor halve:  ⌊R÷2 ⍝ ⎕CT=1<≡R
∆IRND     Integer round:  ⌊.5+R ⍝ ⎕CT=1<≡R
∆IRNDM    Integer round magnitude:  (×R)×⌊.5+∣R ⍝ ⎕CT=1<≡R
∆F2LOG    Adjusted floor two log:  (¯65536×0=R)+⌊2⍟∣R+0=R ⍝ ⎕CT=1<≡R
∆F10LOG   Adjusted floor ten log:  (1+0>R)+⌊10⍟1⌈∣R ⍝ ⎕CT=1<≡R

Addition and subtraction

∆DEC      Decrement by one:  R-1 ⍝ 1≥≡R
∆DEN      Decrement negation by one:  ¯1-R ⍝ 1≥≡R
∆DECH     Decrement by half:  R-.5 ⍝ 1≥≡R
∆INC      Increment by one:  R+1 ⍝ 1≥≡R
∆INN      Increment negation by one:  1-R ⍝ 1≥≡R
∆INCH     Increment by half:  R+.5 ⍝ 1≥≡R
∆NEG      Negate (change sign):  -R ⍝ 1≥≡R
∆NABS     Negate absolute value (negated magnitude):  -∣R ⍝ 1≥≡R
∆ADDR     Add reciprocal ((1+R*2)÷R):  R+÷R ⍝ 1≥≡R
∆SUBR     Subtract reciprocal ((¯1+R*2)÷R):  R-÷R ⍝ 1≥≡R

Multiplication, division, and reciprocal

∆DOUB     Double:  R×2 ⍝ 1≥≡R
∆DECD     Decrement double:  ¯1+R×2 ⍝ 1≥≡R
∆HALF     Halve:  R÷2 ⍝ 1≥≡R
∆HINC     Halve increment:  (R+1)÷2 ⍝ 1≥≡R
∆FHALF    Floor halve:  ⌊R÷2 ⍝ ⎕CT=1<≡R
∆HSQAR    Halve square:  (R*2)÷2 ⍝ 1≥≡R
∆HDIV     Half divided by:  .5÷R ⍝ 1≥≡R
∆TWODIV   Two divided by:  2÷R ⍝ 1≥≡R
∆RECIP    Reciprocal:  ÷R ⍝ 1≥≡R
∆SUMRECIP Sum reciprocals in simple R:  +/,÷R
∆NEGR     Negate reciprocal:  -÷R ⍝ 1≥≡R
∆INCR     Increment reciprocal ((R+1)÷R):  1+÷R ⍝ 1≥≡R
∆DECR     Decrement reciprocal ((1-R)÷R):  ¯1+÷R ⍝ 1≥≡R
∆INNR     Increment negated reciprocal ((R-1)÷R):  1-÷R ⍝ 1≥≡R
∆DENR     Decrement negated reciprocal ((¯1-R)÷R):  ¯1-÷R ⍝ 1≥≡R
∆ADDR     Add reciprocal ((1+R*2)÷R):  R+÷R ⍝ 1≥≡R
∆SUBR     Subtract reciprocal ((¯1+R*2)÷R):  R-÷R ⍝ 1≥≡R
∆RINC     Reciprocal increment:  ÷R+1 ⍝ 1≥≡R
∆RDEC     Reciprocal decrement:  ÷R-1 ⍝ 1≥≡R
∆RINN     Reciprocal incremented negation:  ÷1-R ⍝ 1≥≡R
∆RDEN     Reciprocal decremented negation:  ÷¯1-R ⍝ 1≥≡R
∆DIVI     Divide by increment:  R÷R+1 ⍝ 1≥≡R
∆DIVD     Divide by decrement:  R÷R-1 ⍝ 1≥≡R
∆DIVN     Divide by incremented negation:  R÷1-R ⍝ 1≥≡R
∆DIVC     Divide by decremented negation:  R÷¯1-R ⍝ 1≥≡R
∆DIVDI    Divide decrement by increment:  (R-1)÷R+1 ⍝ 1≥≡R
∆DIVID    Divide increment by decrement:  (R+1)÷R-1 ⍝ 1≥≡R
∆DIVIN    Divide increment by incremented negation:  (R+1)÷1-R ⍝ 1≥≡R
∆DIVNI    Divide incremented negation by increment:  (1-R)÷R+1 ⍝ 1≥≡R
∆UTRI     L-row upper triangular mat from vec R:  R×[⎕IO+1](⍳L)∘.≤⍳⍴R

Pi multiplication and division

∆DIVPI    Divide by pi:  R÷○1 ⍝ 1≥≡R
∆DIV2PI   Divide by twice pi:  R÷○2 ⍝ 1≥≡R
∆DIVHPI   Divide by half pi:  R÷○.5 ⍝ 1≥≡R
∆PIDIV    Pi divided by:  ○÷R ⍝ 1≥≡R
∆PI       Pi times:  ○R ⍝ 1≥≡R
∆PIDOUB   Pi times double:  ○R×2 ⍝ 1≥≡R
∆PIHALF   Pi times halve:  ○R÷2 ⍝ 1≥≡R
∆PISQAR   Pi times square:  ○R*2 ⍝ 1≥≡R
∆SQRTDPI  Square root of divide by pi:  (R÷○1)*.5 ⍝ 1≥≡R
∆PIDTR    Degrees to radians:  ○R÷180 ⍝ 1≥≡R
∆PIRTD    Radians to degrees:  R÷○÷180 ⍝ 1≥≡R

Remainders, residue, and modulus

∆ONEMOD   One modulus (fractional part):  1∣R ⍝ ⎕CT=1<≡R
∆TWOMOD   Two modulus:  2∣R ⍝ ⎕CT=1<≡R
∆EVEN     Flag even values:  0=2∣⌊∣R ⍝ ⎕CT=1<≡R
∆ODD      Flag odd values:  0≠2∣⌊∣R ⍝ ⎕CT=1<≡R
∆NPOW     Negative one power (1=even, ¯1=odd):  ¯1*R ⍝ int
∆NNPOW    Negate negative one power (¯1=even, 1=odd):  -¯1*R ⍝ int

Roots and powers

∆RECIP    Reciprocal:  ÷R ⍝ 1≥≡R
∆SQAR     Square:  R*2 ⍝ 1≥≡R
∆HSQAR    Halve square:  (R*2)÷2 ⍝ 1≥≡R
∆RSQAR    Reciprocal square:  R*¯2 ⍝ 1≥≡R
∆PISQAR   Pi times square:  ○R*2 ⍝ 1≥≡R
∆SQRT     Square root:  R*.5 ⍝ 1≥≡R
∆SQRTA    Square root absolute value:  (∣R)*.5 ⍝ 1≥≡R
∆SQRTD    Square root double:  (R×2)*.5 ⍝ 1≥≡R
∆SQRTDPI  Square root of divide by pi:  (R÷○1)*.5 ⍝ 1≥≡R
∆RSQRT    Reciprocal square root:  R*¯.5 ⍝ 1≥≡R
∆RSQRTA   Reciprocal square root absolute value:  (∣R)*¯.5 ⍝ 1≥≡R
∆CUBE     Cube:  R*3 ⍝ 1≥≡R
∆CUBERT   Cube root:  (×R)×(∣R)*÷3 ⍝ 1≥≡R
∆FOUTHPW  Fourth power:  R*4 ⍝ 1≥≡R
∆FOUTHRT  Fourth root:  R*.25 ⍝ 1≥≡R

Logarithms and exponentials

∆ELOG     Base-e (natural) logarithm:  ⍟R ⍝ 1≥≡R
∆SUMELOG  Sum base-e (natural) logarithms in simple R:  +/,⍟R
∆EPOW     Base-e (natural) exponential:  *R ⍝ 1≥≡R
∆SINH     Hyperbolic sine:  5○R ⍝ 1≥≡R
∆COSH     Hyperbolic cosine:  6○R ⍝ 1≥≡R
∆TWOLOG   Base-two logarithm:  2⍟R ⍝ 1≥≡R
∆TWOPOW   Base-two exponential:  2*R ⍝ 1≥≡R
∆TENLOG   Base-ten logarithm:  10⍟R ⍝ 1≥≡R
∆TENPOW   Base-ten exponential:  10*R ⍝ 1≥≡R
∆FAC      Factorial:  !R ⍝ int
∆F2LOG    Adjusted floor two log:  (¯65536×0=R)+⌊2⍟∣R+0=R ⍝ ⎕CT=1<≡R
∆F10LOG   Adjusted floor ten log:  (1+0>R)+⌊10⍟1⌈∣R ⍝ ⎕CT=1<≡R
∆NPOW     Negative one power (1=even, ¯1=odd):  ¯1*R ⍝ int
∆NNPOW    Negate negative one power (¯1=even, 1=odd):  -¯1*R ⍝ int

Trigonometric functions

∆SIN      Sine of radians:  1○R ⍝ 1≥≡R
∆SIND     Sine of degrees:  1○○R÷180 ⍝ 1≥≡R
∆COS      Cosine of radians:  2○R ⍝ 1≥≡R
∆COSD     Cosine of degrees:  2○○R÷180 ⍝ 1≥≡R
∆TAN      Tangent of radians:  3○R ⍝ 1≥≡R
∆TAND     Tangent of degrees:  3○○R÷180 ⍝ 1≥≡R
∆ASIN     Arcsine in radians:  ¯1○R ⍝ 1≥≡R
∆ASIND    Arcsine in degrees:  (¯1○R)÷○÷180 ⍝ 1≥≡R
∆ACOS     Arccosine in radians:  ¯2○R ⍝ 1≥≡R
∆ACOSD    Arccosine in degrees:  (¯2○R)÷○÷180 ⍝ 1≥≡R
∆ATAN     Arctangent in radians:  ¯3○R ⍝ 1≥≡R
∆ATAND    Arctangent in degrees:  (¯3○R)÷○÷180 ⍝ 1≥≡R
∆CSC      Cosecant of radians:  ÷1○R ⍝ 1≥≡R
∆SEC      Secant of radians:  ÷2○R ⍝ 1≥≡R
∆COT      Cotangent of radians:  ÷3○R ⍝ 1≥≡R
∆ACSC     Arccosecant in radians:  ¯1○÷R ⍝ 1≥≡R
∆ASEC     Arcsecant in radians:  ¯2○÷R ⍝ 1≥≡R
∆ACOT     Arccotangent in radians:  ¯3○÷R ⍝ 1≥≡R
∆SINC     Sine cardinal of radians:  (1○R)÷R ⍝ 1≥≡R
∆SINCPI   Sine cardinal pi times:  (1○○R)÷○R ⍝ 1≥≡R
∆SINCOS   Sine and cosine of radians:  1 2○⊂R ⍝ 1≥≡R
∆GD       Gudermannian function:  ¯3○5○R ⍝ 1≥≡R
∆AGD      Inverse Gudermannian function:  ¯5○3○R ⍝ 1≥≡R

Pythagorean functions

∆PYTH     Hypotenuse of unit tri. from side ((1+R*2)*.5):  4○R ⍝ 1≥≡R
∆APYTH    Side of unit tri. from hypot. (R×(1-R*¯2)*.5):  ¯4○R ⍝ 1≥≡R
∆APYTHA   Side of unit tri. from ∣hypot. ((¯1+R*2)*.5):  ¯4○∣R ⍝ 1≥≡R
∆ZPYTH    Side of unit tri. from other side ((1-R*2)*.5):  0○R ⍝ 1≥≡R

Hyperbolic functions

∆SINH     Hyperbolic sine:  5○R ⍝ 1≥≡R
∆COSH     Hyperbolic cosine:  6○R ⍝ 1≥≡R
∆TANH     Hyperbolic tangent:  7○R ⍝ 1≥≡R
∆ASINH    Hyperbolic arcsine:  ¯5○R ⍝ 1≥≡R
∆ACOSH    Hyperbolic arccosine:  ¯6○R ⍝ 1≥≡R
∆ATANH    Hyperbolic arctangent:  ¯7○R ⍝ 1≥≡R
∆CSCH     Hyperbolic cosecant:  ÷5○R ⍝ 1≥≡R
∆SECH     Hyperbolic secant:  ÷6○R ⍝ 1≥≡R
∆COTH     Hyperbolic cotangent:  ÷7○R ⍝ 1≥≡R
∆ACSCH    Hyperbolic arccosecant:  ¯5○÷R ⍝ 1≥≡R
∆ASECH    Hyperbolic arcsecant:  ¯6○÷R ⍝ 1≥≡R
∆ACOTH    Hyperbolic arccotangent:  ¯7○÷R ⍝ 1≥≡R
∆ELOG     Base-e (natural) logarithm:  ⍟R ⍝ 1≥≡R
∆EPOW     Base-e (natural) exponential:  *R ⍝ 1≥≡R
∆GD       Gudermannian function:  ¯3○5○R ⍝ 1≥≡R
∆AGD      Inverse Gudermannian function:  ¯5○3○R ⍝ 1≥≡R

Random values

∆GENB     Generate random bits of shape R.
∆GENC     Generate random characters of shape R.
∆GENI     Generate random 4-byte integers of shape R.
∆GENP     Generate random positive 4-byte integers of shape R.
∆ROLL     Random integers from ⍳¨R (?R) or of shape L from ⍳R (?L⍴R).

Each-based (¨)

∆RANKe    Rank each:  ↑¨⍴¨⍴¨R
∆SHAPEe   Shape each:  ⍴¨R
∆FAXe     First axis each (1 if scalar):  ↑¨(⍴¨R),¨1
∆LAXe     Last axis each (1 if scalar):  ↑¨⌽¨1,¨⍴¨R
∆NELMe    Number of elements each:  ↑¨⍴¨,¨R
∆DRe      Data representation each:  ⎕DR¨R
∆DRNe     Data representation each, negating nested:  (⎕DR¨R)ׯ1*×≡¨R
∆ENCLOSEe Enclose each (⊂[⍬]R):  ⊂¨R
∆ANYe     Flag if any ones in each (1=simple):  1∊¨R ⍝ ⎕CT=0

Outer product

∆AVFREQ   Frequency of all characters:  +/⎕AV∘.≡,R
∆ODOMETER Generalized odometer:  ⊃,↑∘.,/,R ⍝ nested array of intvecs
∆opAND    Outer product logical and:  L∘.^R ⍝ bit
∆SIGNS    Signs summary (count neg,0,pos):  +/¯1 0 1∘.=,×R ⍝ 1≥≡R
∆UTRI     L-row upper triangular mat from vec R:  R×[⎕IO+1](⍳L)∘.≤⍳⍴R

Summarization and reduction

∆ALL      Flag if all ones (^/,bit, ^/,1=simple):  ~0∊R∊1 ⍝ ⎕CT=0
∆ALLZ     Flag if all zeros (~∨/,bit, ^/,0=simple):  ~0∊R∊0
∆ANY      Flag if any ones:  1∊R ⍝ ⎕CT=0
∆ANYZ     Flag if any zeros:  0∊R
∆NALL     Flag if not all ones (~^/,bit, ∨/,1≠simple):  0∊R∊1 ⍝ ⎕CT=0
∆NALLZ    Flag if not all zeros (∨/,bit, ∨/,0≠simple):  0∊R∊0
∆NONE     Flag if no ones:  ~1∊R ⍝ ⎕CT=0
∆NOZ      Flag if no zeros:  ~0∊R
∆MATCH    Flag if arrays identical:  L≡R ⍝ ⎕CT=0
∆EMATCH   Flag if arrays identical (L≡R with ⎕CT=0), including ⎕DR.
∆NESTED   Flag if nested:  1<≡R
∆SIMPLE   Flag if simple:  1≥≡R
∆SIGNS    Signs summary (count neg,0,pos):  +/¯1 0 1∘.=,×R ⍝ 1≥≡R
∆HILO     Highest and lowest:  (⌈/,R),⌊/,R ⍝ ⎕DR-based if empty
∆AVG      Average items in simple R (mean):  (+/,R)÷1⌈×/⍴R
∆SUM      Sum items in simple R:  +/,R
∆SUMABS   Sum absolute values in simple R:  +/,∣R
∆SUMELOG  Sum base-e (natural) logarithms in simple R:  +/,⍟R
∆SUMRECIP Sum reciprocals in simple R:  +/,÷R
∆SUMSQAR  Sum squares in simple R:  +/,R*2
∆SUMZEQ   Number of zeros (+/,~bit, +/,0=simple):  +/,R∊0
∆SUMZNE   Number of nonzeros (+/,bit, +/,0≠simple):  +/,~R∊0
∆SSSHAPE  Number of segments in segmented string:  +/R=1↑R
∆CRC      Cyclic redundancy 32-bit check for simple R.
∆NVEC     End partition vector (flag end of runs); ⎕CT=1<≡R.
∆PVEC     Partition vector (flag start of runs); ⎕CT=1<≡R.

Zeros and nonzeros

∆ALLZ     Flag if all zeros (~∨/,bit, ^/,0=simple):  ~0∊R∊0
∆ANYZ     Flag if any zeros:  0∊R
∆NALLZ    Flag if not all zeros (∨/,bit, ∨/,0≠simple):  0∊R∊0
∆NOZ      Flag if no zeros:  ~0∊R
∆ZEQ      Flag zeros (0=simple):  R∊0
∆ZNE      Flag nonzeros (0≠simple):  ~R∊0
∆SUMZEQ   Number of zeros (+/,~bit, +/,0=simple):  +/,R∊0
∆SUMZNE   Number of nonzeros (+/,bit, +/,0≠simple):  +/,~R∊0
∆ZNDS     Indices of zeros in vector (bitvec (~R)/⍳⍴R):  (R∊0)/⍳⍴,R
∆INDS     Indices of nonzeros in vector (bitvec R/⍳⍴R):  (~R∊0)/⍳⍴,R
∆ZMAX     Higher of zero or R (zero negative items):  0⌈R ⍝ 1≥≡R
∆ZMIN     Lower of zero or R (zero positive items):  0⌊R ⍝ 1≥≡R