Saturday, July 16, 2011

Programmers Anonymous notes, 1100

Griffin wondered about how laser diodes work.
A laser diode is formed by doping a very thin layer on the surface of a crystal wafer. The crystal is doped to produce an n-type region and a p-type region, one above the other, resulting in a p-n junction, or diode.
...
When an electron and a hole are present in the same region, they may recombine or "annihilate" with the result being spontaneous emission — i.e., the electron may re-occupy the energy state of the hole, emitting a photon with energy equal to the difference between the electron and hole states involved. Spontaneous emission gives the laser diode below lasing threshold similar properties to an LED.
 The structure, a lasing medium between two conductive partial mirrors, is simple:
The physical chemistry is more complicated, and it took a lot of research to find out how to get it right so they work at room temperature, are cheap, and last a while.
This is a visible light micrograph of a laser diode taken from a CD-ROM drive. Visible are the P and N layers distinguished by different colours. Also visible are scattered glass fragments from a broken collimating lens.
The first laser diode to achieve continuous wave operation was a double heterostructure demonstrated in 1970 essentially simultaneously by Zhores Alferov and collaborators (including Dmitri Z. Garbuzov) of the Soviet Union, and Morton Panish and Izuo Hayashi working in the United States. However, it is widely accepted that Zhores I. Alferov and team reached the milestone first. For their accomplishment and that of their co-workers, Alferov and Kroemer shared the 2000 Nobel Prize in Physics.



David Byrne, Lead Singer of Talking Heads in 1987:
I don't think computers will have any important effect on the arts in 2007. When it comes to the arts they're just big or small adding machines. And if they can't "think," that's all they'll ever be. They may help creative people with their bookkeeping, but they won't help in the creative process.


Simultaneous localization and mapping (SLAM) on an iPad2:



The Singularity is Far: A Neuroscientist's View, including many interesting thoughts in the comments.



Art && Code: 3D

Kinect-Hacking Conference

Art && Code: 3D is a festival-conference about the artistic, technical, tactical and cultural potentials of 3D scanning and sensing devices — especially (but not exclusively) including the revolutionary Microsoft Kinect sensor. This highly interdisciplinary event will bring together, for the first time, tinkerers and hackers, computational artists and designers, industrial game developers, and leading researchers from the fields of computer vision, HCI and robotics. Half-maker’s festival, half-academic symposium, Art && Code: 3D will take place October 21-23 at Carnegie Mellon University in Pittsburgh, and will feature:
  • Hands-on workshops in programming interactive software with the Kinect, using popular arts-engineering toolkits such as Processing, openFrameworks, Cinder, Max/MSP/Jitter, Pure Data, and Microsoft’s own Kinect for Windows SDK in Silverlight.
  • Presentations by leading artists, designers, and researchers about their projects with depth cameras.
  • An interactive exhibition and live performance evening featuring artworks, robotics, games and other experiences using the Kinect.

Omek Raises $7 Million From Intel, Aims To Challenge Microsoft’s Kinect

Omek Interactive, a provider of tools that enables companies to incorporate gesture recognition and full body tracking into their applications and devices, has secured $7 million in financing in a round led by Intel Capital, TechCrunch has learned. The Series C round brings the company’s total funding raised to nearly $14 million.
Omek’s Beckon technology converts the raw depth map data from most major 3D cameras into an awareness of people and their movements or positions in front of the camera, enabling them to be converted into commands that control hardware or software.


Solving laundry at UC Berkeley (Willow Garage)


Nearly a million people have watched UC Berkeley's PR2 folding towels and sorting socks on YouTube, and it's easy to understand why: having a robot that can do your laundry is a fantasy that's been around since The Jetsons, and while we're not there yet, it's not nearly as far off a future as it was before the PR2 Beta Program. Since those demos, one of the research groups at Berkeley has been working on ways of making the laundry cycle faster, more efficient, and more complete, and for starters, they've taught their PR2 to reliably handle your pants.
The goal of Pieter Abbeel’s group is to teach a robot to solve the laundry problem. That is, to develop a system to enable a robot to go into a home it's never seen before, load and unload a washer and dryer, and then fold the clean clothes and put them away just like you would. The first aspect of this problem that the group tackled was folding, which is one of those things that seems trivial to us but is very difficult for a robot to figure out since clothes are floppy, unpredictable, and often decorated with tasteless and complicated colors and patterns.



In Search of a Robot More Like Us

Although robots have made great strides in manufacturing, where tasks are repetitive, they are still no match for humans, who can grasp things and move about effortlessly in the physical world. Designing a robot to mimic the basic capabilities of motion and perception would be revolutionary, researchers say, with applications stretching from care for the elderly to returning overseas manufacturing operations to the United States (albeit with fewer workers).
Yet the challenges remain immense, far higher than artificial intelligence hurdles like speaking and hearing.

 

7x21 pixel display, and is 2.5 x 7 feet in size, using an Arduino Decimillia board.

Monday, July 11, 2011

Programmers Anonymous notes, 1011

Distinguishing between sarcasm and irony is ironic. Expressing an opinion about the distinction is sarcastic.


Claire L  Evans, from Portland Or, writes about (and does art/performance about) science and technology issues. Her sequence of blogs about moon arts strike a chord.



What gestures should a vision system understand? A prime requirement is that gestures should be easily learned by humans, but it is not clear what is most natural or effective. Here's a discussion of future gesture interfaces for the Kinect. It's anybody's guess as to what will work well enough for people. The best way to find out is to try things:




While dropping off Cosmo for a summer camp I had the chance to check out their exhibit of video games, Game On 2.0, at OMSI: I wasn't expecting much, but was very impressed. Not only are the games set up well to be played, there was an impressive range of platforms and hardware, from handheld to pinball. Mixed in was good information about the game industry, game development and some original art. The tacit message was that computer games are not just a bit of pointless fun, but a driver of  the computer industry.
Play your way through the past, present, and future of global gaming. From Pong to Gran Turismo, Game On 2.0 is a hands-on experience of video game history and culture, and includes over 125 playable games, including Mario All Stars, Wii Sports, Gran Turismo, Halo Reach, Pacman, Zelda and Sonic the Hedgehog.

Explore over 40 years of gaming entertainment; from the very first commercial coin-op game to the latest in virtual reality and 3D technology. Game On 2.0 celebrates game design, development, and production including original concept and character art and history’s most influential arcade consoles.

RoboCup 2011 is finishing up today, furthering the goal of:
By mid-21st century, a team of fully autonomous humanoid robot soccer players shall win the soccer game, complying with the official rule of the FIFA, against the winner of the most recent World Cup.
While I think AI will advance enough to achieve this goal, I predict that the battery/power technology will be woefully inadequate to the task for many years, if not decades.  Electric cars have been around for more than 100 years now, and everyone agrees they are a good idea, but you still can't plug in a Prius.

And the best soccer team won't be humanoid, no matter how photogenic; we are just not that cute or functional on the pitch:

I think that the soccer playing premise is good, but I'm more interested in video game playing robots. Games are designed to tweak our interest and exercise our humanity, while soccer is largely a sport for spectators. When will a robot master Pong, PacMan, or World of Warcraft?

I don't know the answer, but this is what will close the gap in embodied intelligence. Below is a first step toward that goal, a three electro-mechanical relay bot that "plays" the Tower of Hanoi faster than a human, on a device designed to use human gestures (taps). While the sequence of moves is not found autonomously, it won't be long before robots will entertain themselves by game play.


In base 24 the first eight powers of five are palindromic (Wikipedia, palindromic numbers in other bases). Why? Is it likely that there is a logical reason for this string of palindromes?:
51 =          5
52 =         11
53 =         55
54 =        121
55 =        5A5
56 =       1331
57 =       5FF5
58 =      14641
5A =     15AA51
5C =    16FLF61


What do robots look like? Often it assumed they will evolve toward some sort of human form, and many are wedged into this bipedal mold. What about real robots, those designed for practical use where there is less anthropomorphic social pressure?

Industrial robots have decades of experience now, and their range of forms has settled down. They have one arm and no legs, and are typically bolted to the floors or on a specialized gantry. Here's one marketed to foundries, the KUKA KR 1000 Titan:

While this is a larger model, its body plan is recognizable in a wide range of industrial bots. Most industrial robots are quite a bit larger than humans, even though they are typically used for human scale products like cars. They don't know their own strength, and don't pay much attention to humans, so photos of them with humans are not common. Here's a similar KUKA bot swinging around a couple humans like so much meat (Wikipedia, Robocoaster):

What about autonomous robots? They have much different design requirements and are still evolving. Often they have a more car-like body plan. After several iterations we now know what the near-future looks like, the most sophisticated semi-autonomous bot ever:
NASA Mars Science Laboratory rover, Curiosity, \during mobility testing on June 3, 2011. The location is inside the Spacecraft Assembly Facility at NASA's Jet Propulsion Laboratory, Pasadena, Calif.
Preparations are on track for shipping the rover to NASA's Kennedy Space Center in Florida in June and for launch during the period Nov. 25 to Dec. 18, 2011.
JPL, a division of the California Institute of Technology in Pasadena, manages the Mars Science Laboratory mission for the NASA Science Mission Directorate, Washington. This mission will land Curiosity on Mars in August 2012. Researchers will use the tools on the rover to study whether the landing region has had environmental conditions favorable for supporting microbial life and favorable for preserving clues about whether life existed.


Saturday, July 2, 2011

Programmers Anonymous notes, 1010

Duo Adept: An 8-bit computer designed and built by Jack Eisenmann, a high school student.
Jack Eisenmann, a programmer who just graduated high school, has built his own 8-bit homebrew computer completely from scratch using an old keyboard, a television, and a ton of TTL logic chips. No, he didn't buy some computer parts and snap them together; he blueprinted every wire and connection and then built it, wire by wire. After he finished construction, he had to teach it how to communicate, so he created his own operating system and wrote some games for it.


Arduino: The Documentary


Obsolete collective's monthly chiptune showcase in downtown L.A.

In recent months, a pool of innovative L.A.-based artists who create music in an electronic subgenre called chiptune have formed the Obsolete collective, and have commenced throwing shows to celebrate their lo-bit love affair.


Tibetan singing bowls give up their chaotic secrets

The water-filled bowls, when rubbed with a leather-wrapped mallet, exhibit a lively dance of water droplets as they emit a haunting sound. Now slow-motion video has unveiled just what occurs in the bowls; droplets can actually bounce on the water's surface.

 

Science is Beauty


The Curfew: All Your Rights Are Belong to Us

The Curfew is terrific science fiction, pretty good cinema, a nicely designed defense of individual liberty, and an okay graphic adventure. It also won the Learning & Education award at the 2011 Games for Change Festival.
Designed by Kieron Gillen and funded by Britain's Channel 4, The Curfew takes place in 2027 in a UK dominated by "the Shepherd Party," which plays on fears of terrorism to impose near-absolute control over its citizens. They do so through gamification; you earn "citizen points" for obedience, and lose them through disobedience. Earn enough, and you can be a "Class A" citizen; its not clear what this gets you, other than jumping the queue at fast food joints.

 

Northwest Passages: Ursula K. Le Guin

AIR DATE: Thursday, April 29th 2010
Oregon icon Ursula K. Le Guin writes science fiction, fantasy, poetry, and children's books. She has won multiple honors for dozens of works, starting with her early novel A Wizard of Earthsea, and including the Library of Congress' Living Legends award and several honors for lifetime achievement.
The Oregonian recently called her the "Queen Mother of Science Fiction." I asked her if she was, and she laughed.

Ursula Le Guin, Oregon Art Beat
She's been called the Queen Mother of science fiction, but today Usula Le Guin is finishing a book of poems about Steens Mountains country, and working with photographer Roger Dorband. We join them in Harney county as she talks about her new book and reflects on her life as a writer.

Tuesday, June 28, 2011

Tau Day

Happy Tau Day!

Here's Helene, a moon of Saturn, with dimensions of about 36 by 32 by 30 kilometers. The Cassini robot took this picture recently. It's not quite spherical, but nothing is exactly spherical and tau is still a good approximation. There are even mountains on the moon! (Galileo)

(stereo pair, cross-view)

Sunday, June 26, 2011

"Blink of an eye" processing on Arduino

[This is a long, perhaps tedious, post with many variations of a program that finds the timing of Arduino loops and operations. The punchline is that in 1 second my system can do about 86,000 floating point (multiplication) operations, or about 700,000 integer (multiplication) operations. Division takes 3 (float) to 10 (int)  times longer.  All the rest is detail.]

I'd like to implement a simple active visual system using the photoresistor and LED's on the Android ADK shield (see "Does not compute...") controlled by a program implemented on the Arduino Mega 2560 microcontroller. This requires flashing the LED's, and they are bright enough that it is pretty darned irritating if the flashing is visible. 

But if they flash fast enough we don't perceive it as flashing, just a bright light. Above the flicker fusion threshold of about 15 Hz a flashing light appears as a steady light. Movies and video have a 25 to 30 Hz frame rate to avoid flicker. There's a reason that 'flick' is shorthand for moving picture.

How much processing can be done in one cycle through the program loop on an Arduino if the time permitted is less than about 1/20 second = 50 ms (milliseconds)?

To get a feel for this, I could use a program with a loop containing one or several operations, to see how fast that runs. But without an oscilloscope I don't have a chance at measuring the frequency of the loop execution. Even with an oscilloscope it would not be clear which periodic signal corresponds to the loop frequency.

So I want to use the program itself to measure the timing of loop execution. To do this, it is convenient to write text to the serial monitor (see appendix for full code listings). This loop function gets and writes the time since the program started, in milliseconds:

void loop(){

    // Every time through the loop write time since starting to  
    // the serial monitor.
    Serial.println( millis() );
}

with the serial monitor output:


Each line corresponds to one pass through the loop, and there is a consistent 6 ms time difference between each of them. That's a good chunk of 50 ms, and worth worrying about.

This millisecond temporal resolution is good for a rough estimate, but we can do better with the microsecond (one millionth of a second) reporting function micros() instead of millis(). The corresponding output is:



The differences are still consistent, but the differences are larger:

9360 microseconds = 9.36 ms

It might be that the micros() function takes longer to execute than millis(), and/or it takes longer to send and print out seven digits than it does four. I measured the time needed to execute these time functions (see appendix), and while they are different (1.3 and 3.3 microseconds, respectively) they are small compared with the serial send and print operations.

The ratios of a loop times to number of digits are similar, within the precision of the 6 ms measurement:

6/4    = 1.50 ms/character
9.36/7 = 1.34 ms/character

To test if the number of characters in the serial output is slowing things down (and by the way, serial means one bit at a time through a serial port), this loop has another serial write with a longer string:


void loop(){

       // At every loop cycle write time to the serial monitor.
    Serial.print( micros() );
    Serial.println( " microseconds up to this loop" );
}

with output:













Now the sequential differences are about 39500 microseconds = 39.5 ms, and the time per character is:
39.5/36 = 1.1 ms/character
In any case, about 1 ms is needed for sending each character to the serial port. Note that the serial communication rate was set to 9600 baud (lower right corner). This means that an 8-bit character, assuming the bit rate the is about the baud rate, was sent in about:

(1000 ms/s)*(8 bits/character)/ (9600 bits/s) =  .833 ms

and the rest of the time taken for the loop is due to executing the function calls.

Setting the serial rate to the maximum, 115200 baud results in a faster loop:

Now it only takes about 3780 microseconds = 3.78 ms to send the 36 characters, more than 10 times as fast.







At this rate the loop frequency is:

(1000 ms/s) / (3.78 ms/loop) = 264 Hz (loop cycles/second)

I wonder if it can write faster if the serial monitor is not displayed? Perhaps the serial monitor needs to display a character or line before it will accept the next character or line of characters. At this baud rate the display update is jerky, displaying several lines with pauses in between. I don't know exactly why.


For estimating the speed of various operations I want to average loop time differences over many loops, and ignore the time it takes for the time and text output functions. This code increments a counter (++nLoopCounter) each time through the loop, compares the count with a large number, and then reports information about the average time for a loop. Note that the time it takes for averaging and sending text to the serial monitor is not included in the reported timing:

int nLoopCount = 0;
int nLoopsReport = 32000;
  

int tLast = 0;
int tDelta = 0;
float flLoopRate = 0;

void loop(){ 

  // Don't do anything here, but this is where some 
  // operations would go.
  // ...

  // Every nLoopsReport times through the loop, 
  // write stats to the serial monitor.
  ++nLoopCount; 
  if ( nLoopCount == nLoopsReport ) {

                tDelta = millis() - tLast;


    Serial.print( tDelta );
    Serial.print( " milliseconds per " );
    Serial.print( nLoopsReport );
    Serial.println( " loops" );

    flLoopRate = (1000.0*nLoopsReport)/tDelta;
    Serial.print( flLoopRate, 0 );
    Serial.println( " loops/s (Hz)" );

    nLoopCount = 0;
    tLast = millis();
  } 
}

The serial monitor output looks like:


Now we're talking fast; half a million loops per second, or 500 KHz = .5 MHz! Compare this with how long it took to do the serial calls a rate only about 264 loops/s -- about 2000 times slower.

Exactly what operations were done during one loop, that took 2 microseconds to complete?

execute the loop
increment the counter
compare the counter with a variable

The clock rate is 16 MHz, so it appears that about 32 clock tics are needed to carry out these operations.

None of these short steps can be avoided, but can they be shortened? Yes.

Note that the somewhat arbitrary value 32000 was used for the number of loops to average. This could have been less (say 10000), but it could not be much more. The Arduino is a 16-bit machine (the CPU operates on 16-bit chunks of data), and the type int is sized accordingly. A signed 16-bit integer, int, has the range [-32,768, 32,767] = [-2^15, (2^15) - 1].

More loops could be averaged if the unsigned int type was used (up to 64,000), or the long or unsigned long type (up to 2,000,000,000 or 4,000,000,00 respectively). But if a longer integer type is used (for both the counter and the number of loops to average), the increment and compare function calls take longer, resulting in about 20% loss of speed, at 408 KHz.

There is an option for using a shorter integer type, byte or the equivalent unsigned char. But these unsigned types have a range of only [0,255]. It is marginally faster:



It loops at about 750 KHz  = .75 MHz, corresponding to about 1.3 microsecond/loop. That's as fast as I could execute a loop that periodically reports timing.









Now I can add math functions to the loop:

  // Do some math, see how much longer it takes:
    flFLOPvalue1 *= 1.0001;



An average of 86.4 KHz for the whole loop! More time is spent on this floating point operation (FLOP) than the loop/increment/compare used for reporting these statistics. The time for the loop/increment/compare is 1.33 microseconds (from above), and so the floating point operation takes about:

t(FLOP) = ( 2940 microseconds / 255 loops ) - 1.33microseconds ~= 10.2 microseconds

You might think that a CPU can anticipate that every time through the loop the same numbers (flFLOPvalue1 and 1.0001) are needed, and so the CPU will not have to retrieve the number from memory by keeping it in the register. It is true that modern compilers can sometimes produce optimized machine code that recognizes these regularities, and a CPU cache can reduce the need to go to memory. I don't know what optimizations occur in this case, or if they occur at all.

But the compound multiplication operation *= can use a shortcut because only one register is needed for one of the arguments (flFLOPvalue1) and the result. A more general FLOP requires three registers, the two arguments and a result, like flA = flB*flC. Trying this out results in an almost identical result so it appears that this optimization is not used or doesn't result in a speed increase.

So this answers my basic question, how many (multiplication) operations can be performed in 1/20th of a second:

(1 s/20) * 86,400 float operations / s ~= 4300 FLOP in 1/20 second

The corresponding result for the integer operation, multiplication, is:

(1 s/20) * 700,000 float operations / s ~= 35000 INTOP in 1/20 second

There is a huge performance hit for division and some other operations (see list in appendix). Floating point division takes about three times as long as floating point multiplication, and integer division takes about 10 times as long as integer multiplication.

That's a generous number of operations available in the blink of an eye. I think a robot with good visual comprehension is possible with the right programming, and efficient use of these operations. If the computational capacity is not exhausted using one eye spot (a photoresistor), then maybe several eyespots can be accomodated.


Appendix:

Here's a short list of the time it takes to do particular operations, including integer operations. See operation_time_1 program below, which was used to find these values.

Floating point (float) result:

addition of two floating point numbers, assignment to a third:
10.3 microseconds, or 97 K operations/s

multiplication (*) of  two floating point numbers, assignment to a third:
10.3 microseconds, or 97 K operations/s

compound multiplication (*=) of floating point numbers
10.4 microseconds, or 96 K operations/s

compound multiplication (*=) of float with an explicit float
10.2 microseconds, or 98 K operations/s

division (/) of  two floating point numbers, assignment to a third:
32.7 microseconds, or 30.5 K operations/s


Integer result:

addition of 16-bit integers:
.88 microseconds, or 1,100 K operations/s

multiplication of 16-bit integers (int):
1.43 microseconds, or 700 K operations/s

division of 16-bit integers (and truncation to int):
15.7 microseconds, or 63.6 K operations/s

division of 16-bit integer by float (and truncation to int):
43.1 microseconds, or 23.2 K operations/s

addition of 16-bit integer and a float (and truncation to int):
20.6 microseconds, or 48.5 K operations/s

modulo (2) operation on 16-bit integer;
15.7 microseconds, or 63.8 K operations/s


Floating point math operations:

trigonometric floating point operation, cos()
113 microseconds, or 8.9 K operations/s

irrational math floating point operation, sqrt()
47 microseconds, or 21 K operations/s


Utilities:

111 microseconds, or 9.0 K operations/s

From the function reference:
"It takes about 100 microseconds (0.0001 s) to read an analog input, so the maximum reading rate is about 10,000 times a second."
6.7 microseconds, or 150 K operations/s

n microseconds, to within the precision of the reporting time estimate

millis();
1.7 microseconds, or 590 K operations/s

micros();
3.3 microseconds, or 300 K operations/s


To Do: comparison, casting (conversion) operations, more math, control structures
To Do: byte, double (no, same size as float) and long operations


This program was used for calculating the statistics for single operations:

//
// operation_time_1
//
// Estimate operation(s) time on an Arduino CPU.
// Average the time needed for an operation, or set of operations,
// over 255 loop cycles, and report statistics.
//
// USAGE:
//
//  Find and hardcode loop time with no operations. Then add
//  an operation and run again.
//
// HARDCODED:
//
//  The operations to be timed.
//
//  flTReport:   // Time required for reporting alone.
//               // Find by running this program with no function calls
//               // before the counter (and ignore stats that depend on
//               // this value).
//
// created by Mark Dow,  June 24, 2011
// modified by Mark Dow, June 26, 2011 (better reporting)
//
////////////
// harcoded:
  flTReport = 1.333;   // Time required for reporting alone.
               // Find by running this program with no function calls
//             // before the counter (and ignore stats that depend on
////////////   // this value).
                  
#define  LED_2_BLUE       6
#define  PHOTORESISTOR   A2
byte nLoopCount = 0; 
const byte nLoopsReport = 255;
int tLast = 0;
int tDelta = 0;
float flLoopRate = 0;
// Arbitrary operation variables:
float flFLOPvalue1 = 1.00001;
float flFLOPvalue2 = 1.00002;
float flFLOPvalue3 = 1.00003;
int nINTOPvalue1 = 2;
int nINTOPvalue2 = 3;
int nINTOPvalue3 = 5;
void setup(){
 
  Serial.begin( 9600 );
  pinMode( LED_2_BLUE, OUTPUT );
  pinMode( PHOTORESISTOR, INPUT );
}
void loop(){ 

  // Do some math, or any operation, to see how much longer it
  // takes than reporting alone. 

  flFLOPvalue1 = flFLOPvalue2 * flFLOPvalue3;  
//  flFLOPvalue1 = flFLOPvalue2 * flFLOPvalue3;
//  flFLOPvalue1 *= flFLOPvalue2; 
//  flFLOPvalue1 *= 1.0001;      
//  flFLOPvalue1 = flFLOPvalue2 / flFLOPvalue3; 
//  nINTOPvalue1 = nINTOPvalue2 + nINTOPvalue3;
//  nINTOPvalue1 = nINTOPvalue2 * nINTOPvalue3;
//  nINTOPvalue1 = nINTOPvalue2 / nINTOPvalue3;
//  nINTOPvalue1 = nINTOPvalue2 / flFLOPvalue1;
//  nINTOPvalue1 = nINTOPvalue2 * flFLOPvalue1;
//  nINTOPvalue1 = nINTOPvalue2 % 2;
//  flFLOPvalue1 = cos( flFLOPvalue2 ); 
//  flFLOPvalue1 = sqrt( flFLOPvalue2 );
//  analogRead( PHOTORESISTOR );         // analog read
//  digitalWrite( LED_2_BLUE, HIGH );    // digital write
//  delayMicroseconds( 50 );
//  nINTOPvalue1 = millis();
//  nINTOPvalue1 = micros(); 

  // Every nLoopsReport times through the loop, write stats to the serial 
  //monitor.
  ++nLoopCount;
  if ( nLoopCount == nLoopsReport ) {
   
    tDelta = micros() - tLast;
   
    Serial.print( tDelta );
    Serial.println( " microseconds per 255 loops" );
   
    flLoopRate = (1000000.0*nLoopsReport)/tDelta;
    Serial.print( flLoopRate, 0 );
    Serial.println( " loops/s (Hz)" );
   
    Serial.println( "Operation(s) without stats: " );
   
    // Subtract off time without any operations (this loop),
    // about 1.333 microseconds on Arduino Mega 2560.
    Serial.print( ( float(tDelta)/nLoopsReport ) - 1.333 );
    Serial.print( " microseconds, or " );
    Serial.print( 1000/(( float(tDelta)/nLoopsReport ) - 1.333) );
    Serial.println( " K operation(s)/s" );
   
    Serial.println( "" );
    nLoopCount = 0;
    tLast = micros();
  } 
}

This program and variations of it was used for calculating finding statistics of loop that only report loop time:

//
// loop_timer_5
//
// Estimate loop time on the Arduino Mega 2560.
// Print to serial port the time required for many loops,
// each containing one increment and one control structure
// comparison.
//
//
// created by Mark Dow, June 23, 2011
//
unsigned long nLoopCount = 0;
//const byte nLoopsReport = 255;
//const int nLoopsReport = 32000;
//const unsigned int nLoopsReport = 64000;
const unsigned long nLoopsReport = 1000000;
int tLast = 0;
int tDelta = 0;
float flLoopRate = 0;
void setup(){
 
  Serial.begin( 9600 );
}
void loop(){ 

  // Don't do anything here, but this is where some operations would go.
  // ...
  // Every nLoopsReport times through the loop, write stats to the serial 
  //monitor.
  ++nLoopCount;
  if ( nLoopCount == nLoopsReport ) {
  
    tDelta = millis() - tLast;
//    tDelta = micros() - tLast;
  
    Serial.print( tDelta );
    Serial.print( " milliseconds per " );   
    Serial.print( nLoopsReport );
    Serial.println( " loops" );  
   
    flLoopRate = (1000.0*nLoopsReport)/tDelta;
    Serial.print( flLoopRate, 0 );
    Serial.println( " loops/s (Hz)" );
    nLoopCount = 0;
    tLast = millis();
//    tLast = micros();
  } 
}