Monday, November 29, 2010

Nell's half baked thoughts

NELL is a computer system that learns over time to read the web by making associations between words/phrases, and a limited set of categories. NELL regularly uses twitter (cmunell) to elicit feedback, and let the world know what it's thinking. While many of it's 500K beliefs may be truthy, I think it got these two half-right:
I think "self-reference" is an #Emotion (http://bit.ly/9yUJCU)
I think "non-degree-seeking student" is an #AreaOfStudyWithinTheFieldOfMachineLearning (http://bit.ly/eeYNcC)
NELL regularly adds and removes hypotheses, in an attempt to improve its confidence in the associations it has made. I wonder what it will find is funny.

Thursday, September 16, 2010

Binary addition of very long numbers


"To non-mathematicians we point out that it is a commonplace of modern algebra for symbols to represent concepts other than the numbers", from An Algebra for Theoretical Genetics, Claude Shannon's 1940 PhD thesis at MIT


Computers represent integers as a series of bits, 32 bits in a row on my machine. Some programming languages offer longer integers. For example in C++ the longer data types(1) use two of my 32 bit chunks for a total of 64 bits. This allows representation of 264 different integers -- perhaps the positive integers from 0 to 18,446,744,073,709,551,615. 

I recently found a good excuse (the Collatz conjecture) to add integers that are much longer than this, perhaps a thousand bits or digits long. No data types this big are provided, and built in math operations like addition and multiplication only work on standard data types.

I could have found an arbitrary precision library for the programming language of my choice, but I know it would take me a while to learn the ins-and-outs of how to use the library. So I wrote my own function that does addition of arbitrarily long strings of characters. This was easy, fun and satisfying -- the sky's the limit when you can add numbers greater than 101000. The entire universe contains less than 10100 fundamental particles, so I'll need something larger to count. Maybe I can count the number of ways that the particles of the universe can be arranged. Maybe not; do I have enough memory to store my long strings? After all, a bit of memory must require a few particles. Or does it?

It was easier than I thought. I used strings to represent the integers, one dimensional arrays of the character (char) data type. They are easy to read (the number 1 reads as '1'), and easy to manipulate with built in string functions. It is inefficient, because it requires 8-bits to represent each character and a function that can compare 8-bit characters, but that wasn't a concern. It only takes a few thousand operations to add a pair of 1000 bit numbers, and that is plenty fast for my purposes. I figure the total time it takes is about .00001 seconds, or 10,000 additions/second.

What's the algorithm? We learned this in third grade: line up the two numbers, fill in imaginary 0's in empty spots, starting from the right add single digits (we memorized these single digit sums in second grade), add any carry digit, and carry a 1 to the left if the sum is larger than 10:

   11   (carry digits)
  1359
+ 0248
  ----
  1607

It would be no big deal to implement this algorithm on a string of characters from the set ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], but I wanted to see the binary numbers and only needed two characters, '0' and '1'.

The algorithm for binary integers is slightly more simple because the addition table is shorter: we only need to know that: 

if there is no carry bit (0):
0 + 0 = 0 carry 0
0 + 1 = 1 carry 0
1 + 0 = 1 carry 0
1 + 1 = 0 carry the  1

and if there is a carry bit (1), 
0 + 0 = 1 carry 0
0 + 1 = 0 carry the 1
1 + 0 = 0 carry the 1
1 + 1 = 1 carry the 1

Given two strings that represent 2 binary integers, for example '11011' and '110110' which represent the decimal integers 27 and 54:

 (1111100) carry bits
  0011011
+ 0110110
  -------
  1010001

The pseudocode is something like this:

{ given string arguments str1 and str2
  that represent binary integers

  // get the right-hand character of each input string
     b1 and b2
  // store the results, starting with an empty string
     answer = ''
  // and keep track of the carry bit, starting with 0
     carry = '0'

  while not at left end of either string{
     
     If carry == '0' {
          if b1 == '0'{
               if b2 == '0' {
                    append '0' to the left of answer
                    carry = '0'
               }
               else b2 == '1' {
                    append '1' to the left of answer
                    carry = '0'
               }
          }
          else b1 == '1' {
               if b2 = '0' {
                    append '1' to the left of answer
                    carry = '0'
               }
               else b2 == '1' {
                    append '0' to the left of answer
                    carry = '1'
               }
          }
     }
     else carry == '1' {
          if b1 == '0' {
               if b2 == '0' {
                    append '1' to the left of answer
                    carry = '0
               }
               else b2 == '1' {
                    append '0' to the left of answer
                    carry = '1'
               }
          }
          else b1 == '1'{
               if b2 == '0' {
                    append '1' to the left of answer
                    carry = '0
               }
               else b2 == '1' {
                    append '0' to the left of answer
                    carry = '1'
               }
          }
     }
                    
         // get the next two bits from the two input strings
         next b1 and b2

   }

   return answer
}

I was introduced to programming in the 1970's, in an era dominated by electronic calculators. For a long while I thought of computation as fundamentally about numbers. How could computers do anything with text, when they understand nothing about the meaning of letters and words. But I'm slowly changing my point of view. Now I think about calculation as pure symbol manipulation, where computers understand nothing about the meaning of numbers or text. 

Any meaning, with respect to number or text, is supplied by control stratements (while..., if..else) and the conditional statements they check (0 or 1, true or false, same or different). This comparison of two bits,  with the result of same or different, is one of a few core thing a computer does know how to do without software, as it is hardcoded into the CPU's hardware and microcode

This program has made symbol manipulation more concrete for me: it takes letters (abstract symbols), adds them, and outputs letters (more abstract symbols). I could replace every '0' with 'a' and every '1' with 'b', and the program would still add two binary numbers:

    aabaa 
+  aabaab 
   ------
= ababbba

I could just as well replace the letters with two pictures and add those. The meaning of the algorithm would be the same. I think I will choose black and white square images. I'll call them pixels, and add strange arrays of pixels with ambiguous meaning.




(1) These types have many names: doubleword, longword, long long, quad, quadword, int64, or in specific languages, C/C++ long (on 64-bit Unix[3][4][5]), C/C++ long long, C/C++ uint64_t, int64_t, C# long, ulong, Delphi Int64, T-SQL bigint.

Sunday, September 12, 2010

Collatz conjecture and binary palindromes

The Collatz conjecture, or 3n + 1 problem, is about unexplained number patterns.
Take any natural number n. If n is even, divide it by 2 to get n / 2, if n is odd multiply it by 3 and add 1 to obtain 3n + 1. Repeat the process indefinitely. The conjecture is that no matter what number you start with, you will always eventually reach 1.
One wonders if the conjecture is true or false, and why. Might the existence of a path to 1 be undecideable, where we can't know if the process will ever end? [In 2006, researchers Kurtz and Simon, building on earlier work by J.H. Conway in the 1970s, proved that a natural generalization of the Collatz problem is undecidable.[2]]


But the appeal of the conjecture has more to due with the strange patterns that occur in the paths that numbers take on the way to reaching 1. The length of the path, the stopping time, is highly variable and anything but random. What's the pattern?

It is easy to find the stopping time for any small integer. A short Matlab program was able to find the first 10,000,000 stopping times within a few minutes, without any optimization. Jacob programmed a Collatz path calculator in Scratch, that shows the intermediate path values.

The first path that begs for an explanation is 27. Integers lower than 27 have stopping times less than 23 steps, but at 27 the number of steps jumps to 111!  What's so special about 27, or other jumps in stopping time? Is it because it is a power of 3 (1000 base 3)? No, other powers of 3 are easily checked with no obvious extremes or pattern.

Some patterns are easily explained. Any power of 2 (1000... in base 2) will have a small stopping time, because repeated division by 2 (removing 0's from the right repeatedly) brings it to 1 directly, without increases. Similarly, there are special binary palindromes (bit sequences that read the same backwards and forwards) that have short stopping times: any integer of the form 101010....1, with alternating 0's and 1's, immediately precedes a power of 2, so will also have a small stopping time. The reason for this is that 3n + 1, where n is of this form, is a power of two:


    101010...10      2*n
+    101010...1        n
       _______________________________
=   1111111...1      3*n
+                      1
      _______________________________
=  10000000...0      3*n + 1
Do all binary palindromes have short paths? No. In fact, 27 = 11011 base 2 is a binary palindrome and has a very long path relative to its length. Another binary palindrome, 22-bits long:
3732423 base 10 = 1110001111001111000111 base 2
also has an extreme stopping time, larger than any integer before it.

Is this a coincidence, a long stopping time for some palindromes? No, it is unlikely. There are 2^11 22-bit palindromes (one for every possible leading 11 bits), but 2^22 22-bit numbers. That means these palindromes only occur for 1 of 2^11 = 1 / 2048 integers. Accounting for the first bit always being 1, there is ~1/1000 chance that one of these numbers is a palindrome. 3732423 is only the 10th extreme stopping time integer. That is long odds -- out of 10 randomly picked 22 bit integers, there is less than a 1% (1/100) chance of any one of them being a palindrome.

Is it also a coincidence that this long stopping time palindrome (1110001111001111000111) also has long palindromes within its two halves (111000111)? Is there a way to predict which palindromes will have long stopping times?

I see the algorithm as a symbolic dynamics system on binary words, the base 2 representation of the integers. Each step transforms a string of 1's and 0's into a new string of 1's and 0's, and the process is repeated until there is a single 1 in the string. For example multiplication by 2 just shifts bits left, appending a 0, and division by 2 removes a right 0. The conjecture is that any string of 1's and 0's will end up as a single letter word, '1'.

Why are palindromes special with respect to these string operations?

To do: Write a program that shows every step of an integer's stopping sequence as a binary word.

[ From Collatz conjecture as a tag system
The conjecture can also be mapped to a different symbolic dynamics system, a very short 2-tag system with production rules abc, ba, caaa. In this system, the positive integer n is represented by a string of n a's, and iteration of the tag operation halts on any word of length less than 2. The Collatz conjecture equivalently states that this tag system, with an arbitrary finite string of a's as the initial word, eventually halts. See Example: Computation of Collatz sequences for a worked example.  
 Adapted from De Mol, "Tag systems and Collatz-like functions", Theoretical Computer Science, 390:1, 92-101, January 2008 (Preprint Nr. 314)   ]

Tuesday, July 27, 2010

Pallette color cycling

Old School Color Cycling with HTML5
A good description of the technique with fantastic examples, including JavaScript and C++ code.

Sunday, July 18, 2010

Sourcerer code

“A computational process is indeed much like a sorcerer’s idea of a spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform intellectual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer’s spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform.”
from Structure and Interpretation of Computer Programs by Abelson, Sussman and Sussman.
Below is the html code (with embedded javascript) that our programming group tinkered with. All it does is draw a few rectangles, two filled with a linear gradient. The interesting bit is that this can be done; using the HTML5 canvas element, anything can be dynamically drawn on a web page. This Conway's Game of Life is a simple example of using HTML5's canvas tag to make an interactive program.

Note that I can't post the exact code because blogger interprets well-formed tags as html. I've replaced the initial "less than symbol" with a "[" and "greater than symbol" with "]", and these will need to be replaced. To tinker with the program, copy it to a text file and change the extension to .htm or .html, replace the symbols, and open the file in any browser. If this page is publicly posted, any browser can run the program. (I haven't posted the page, but I will soon.)

[!DOCTYPE html]
[html lang="en"]

[head]
    [title]My test canvas[/title]
[body]


[!-- comment --]


[canvas id = "canvas_1" width = "800" height = "400" ]
    This text is displayed if your browser does not support HTML5 Canvas.
[/canvas]



[script type="text/javascript"]

    var example_canvas = document.getElementById( 'canvas_1' );
    var draw_context   = example_canvas.getContext( '2d' );

     var lingrad = draw_context.createLinearGradient( 0, 0, 0, 150 );
     lingrad.addColorStop( 0.0, '#00ABEB' );
     lingrad.addColorStop( 0.5, '#fff'    );
     lingrad.addColorStop( 0.5, '#26C000' );
     lingrad.addColorStop( 1.0, '#fff'    );

    draw_context.fillStyle = lingrad;
    draw_context.fillRect( 30, 30, 150,  50 );
    draw_context.fillRect( 110, 110, 130, 130 );

    draw_context.strokeRect(  50,  50,  50,  50);
    draw_context.strokeRect( 150, 150, 150, 150 );

[/script]



[/body]
[/head]

Saturday, July 10, 2010

Teletype gaming and LOVE

Short text based interactive game programs, at the dawn of gaming as an industry, were in a sense simple and trivial. But their influence on the narrow group of people who encountered them was profound. What was behind the curtain? Is this high tech timesharing system a toy, or just being used as a toy? The answer surprised everyone.

For retro-programmers the full content of the book "Basic Computer Games" is available online.
The classic book BASIC Computer Games, published by Creative Computing, inspired a generation of programmers. The games were written by many people, and compiled by David H. Ahl. The fabulous illustrations accompanying each game were done by George Beker.

I've included all the games here for your tinkering pleasure. I've tested and tweaked each one of them to make sure they'll run with Vintage BASIC, though you may see a few oddities. That's part of the fun of playing with BASIC: it never works quite the same on two machines. The games will play better if you keep CAPS LOCK on, as they were designed to be used with capital-letter input.
These text games were designed for teletypewriters or line printers, in an era when memory was valued, succinctness was necessary, and GOTO was considered useful. (Currently "Go To Statement Considered Harmful".)

I've only browsed the contents of more than 100 games. The style is short and sweet: a description of the game, an illustration of a session, and the BASIC code itself.

One of my favorites, although I didn't encounter it back in the days of teletype terminals, is LOVE, an instance of ASCII art programmed by David Ahl, the author of the book. The program generates a facsimile of the iconic pop-art piece LOVE, by Robert Indiana, with a dose of recursion.

Here is Artsy's gallery of Indiana's corpus of work, which gives a good sense of his range.
"I think of my peace paintings as one long poem, with each painting being a single stanza."
-Robert Indiana
 

A younger generation might more easily recognize Indiana's HOPE graphic, created for and used in Obama's 2008 presidential campaign.


Wednesday, June 30, 2010

AI in the New York Times

The New York Times, although no longer wholey a dead tree company, is not known for reporting on the cutting edge technology issues. When it does feature technology, it is usually focused on what works in practice, not on futuristic speculation. It's recent article about the technological singularity, "Merely Human? That’s So Yesterday", is a bit dismissive about an idea that is far from new, and uses Sergey Brin as a lead element because of his fabulous success in business as much his embracement of the future.


So it is remarkable that three NYT articles appear this month that take AI seriously. All focus, more or less, on business aspects of AI. While the topic of artificial intelligence has grown up with electronic digital computing, its reputation has been tarnished by its percieved failures and shortcomings. This attitude is changing, and the Times is catching up:

What Is I.B.M.’s Watson?

 For the last three years, I.B.M. scientists have been developing what they expect will be the world’s most advanced “question answering” machine, able to understand a question posed in everyday human elocution — “natural language,” as computer scientists call it — and respond with a precise, factual answer. In other words, it must do more than what search engines like Google and Bing do, which is merely point to a document where you might find the answer. It has to pluck out the correct answer itself. Technologists have long regarded this sort of artificial intelligence as a holy grail, because it would allow machines to converse more naturally with people, letting us ask questions instead of typing keywords.

Computers Learn to Listen, and Some Talk Back

For decades, computer scientists have been pursuing artificial intelligence — the use of computers to simulate human thinking. But in recent years, rapid progress has been made in machines that can listen, speak, see, reason and learn, in their way. The prospect, according to scientists and economists, is not only that artificial intelligence will transform the way humans and machines communicate and collaborate, but will also eliminate millions of jobs, create many others and change the nature of work and daily routines. 

Technology Innovator’s Mobile Move

“We are looking to augment human capability,” said Norman Winarsky, vice president for licensing and strategic programs at SRI. “But with artificial intelligence.”
Established in 1946 by Stanford University, SRI created early prototypes of the computer mouse and the technologies involved in ultrasound and HDTV.
Although SRI does roughly 80 percent of its work for the federal government, many of its technologies have been adapted for commercial purposes. Recently, the institute has set its sights on the mobile phone and Web market, especially on creating applications that perform personal functions.

Wednesday, June 23, 2010

Physics simulation games

Ragdoll Cannon is a flash game with that uses 2-D physics in a very nice way. It's graphical design is beautiful too. A Boing Boing comment indicated that it looks like a knock-off of the (commercial) Crayon Physics.

What are the other games we've run across that use physics simulation in the same sense?

Tuesday, April 27, 2010

Google question relative frequency

Frances posted lists of odd things that people asked of Google.

I wondered how often people ask questions of Google. Of course Google has good statistics on this, plotted nicely with Google Trends.

There are some firm conclusions:
  • More questions are asked during the Western school year, with a sharp drop near Christmas.
  • In the last few years the frequency of asking Google has increased dramatically.
  • While "why is"  is historically less frequent than "how does", it currently is nearly identical in frequency. Why is that? Perhaps it is because of increased incidence of "why is my poop green".
 

 


Tuesday, April 6, 2010

Getting closer to AI vision

I've been admiring the book "Eye, Brain, and Vision" by David H. Hubel (Scientific American Library series no. 22, 1988) for a while now. The illustrations are very good, and the discussion of what is (was) known and unknown is well written and complete. Not too much of the fundamental understanding of vision has changed since then.

[1/14/2011 This book, with supplementary material, is available online: David Hubel's "Eye, Brain, and Vision"]

Hubel (with Weisel) has deeply influenced neuroscience, particularly visual and cognitive neuroscience, for half a century now. This has coincided with the development of electronic computers, and the interaction of ideas has been fruitful.

After admiring video of a general purpose robot with a vision system informing a towel folding program, this section in the "Present and Future" chapter (ch. 10, page 220) caught my eye:
    This is where we are, in 1987, in the step-by-step analysis of the visual path. In terms of numbers of synapses (perhaps eight or ten) and complexity of transformations, it may seem a long way from the rods and cones in the retina to areas MT or visual area 2 in the cortex, but it is surely a far longer way from such processes as orientation tuning, end-stopping, disparity tuning, or color opponency to the recognition of any of the shapes that we perceive in our everyday life. We are far from understanding the perception of objects, even such comparatively simple ones as a circle, a triangle, or the letter A--indeed, we are far from even being able to come up with plausible hypotheses.
    We should not be particularly surprised or disconcerted over our relative ignorance in the face of such mysteries. Those who work in the field of artificial intelligence (AI) cannot design a machine that begins to rival the brain at carrying out such special tasks as processing the written work, driving a car along a road, or distinguishing faces. They have, however, shown that the theoretical difficulties in accomplishing any of these tasks are formidable. It is not that the difficulties cannot be solved--the brain clearly has solved them--but rather that the methods the brain applies cannot be simple: in the lingo of AI, the problems are "nontrivial". So the brain solves nontrivial problems.

The understanding of lower (neuronal) level sensory processing in animals has served as inspiration for many image processing techniques and algorithms, including scale space techniques and feature detection/description tools like SIFT.



Wednesday, March 24, 2010

Tuesday, March 23, 2010

Seeing and drawing

This comic strip is a beautiful meta-illustration of how to communicate visually:

Seeing the Future! A Guide to Visual Communication

The 17th panel illustrates the primary visualization styles in video games.

How does one draw unseen things, like extinct animals based on inference? This NY Times article has some tips:

Artists Mine Scientific Clues to Paint Intricate Portraits of the Past

It links to this collection of Conrad Gessner's Animals from the mid 1500's, including a unicorn and monkfish, drawn from sketchy verbal information.

Sunday, March 21, 2010

Collaborative tic-tac-toe

Problem Solved, LOL: A Complex Tic-Tac-Toe Puzzle Falls Thanks to Blog CommentsCommunity-based online efforts point to a new, faster approach in mathematics


...In some cases, citizen-scientists such as bird-watchers or amateur astronomers collectively can make significant contributions. “What about the solving of a problem that does not naturally split up into a vast number of subtasks?” he asked. Could such a problem be solved by the readers of his blog—simply by posting comments?

For a first experiment, Gowers chose the so-called density Hales-Jewett theorem. This problem, Gowers says, is akin to “playing a sort of solitaire tic-tac-toe and trying to lose.” The theorem states that if your tic-tac-toe board is multidimensional and has sufficiently many dimensions, after a short while it is impossible to avoid arranging X’s into a line—you cannot avoid winning no matter how hard you try. Mathematicians have known since 1991 that the theorem was true, but the existing proof used sophisticated tools from other branches of math. Gowers challenged his blog’s readers to help him find a more elementary proof, a problem generally considered quite hard.

Efficient space navigation

Erika DeBenedictis took the top prize (and $100K) in the Intel Science Talent Search.

Her project appears to be primarily based on a computational problem -- efficient navigation through the solar system using gravitational interactions:
Working at home and building on existing research, Erika developed an original optimizing search algorithm that discovers energy minimizing routes in specified regions of space and would allow a spacecraft to adjust its flight path en route. She believes her novel single-step method of repeated orbit refinement could work with essentially autonomous spacecraft, and may be a practical step forward in space exploration.
Many of the other finalists and top prizes also went to computational projects, including Yale Fan
who:
...demonstrated the power of quantum computing in solving challenging "NP-complete" (NPC) problems. His work may offer scientists another tool for exploring theoretical physics.
Thomas Friedman's take on attending the awards dinner was that legal immigration should be encouraged:
America's Real Dream Team

Saturday, March 20, 2010

Ada Lovelace history on BrainPOP

A scripted animated story gives a simplified history of Ada Lovelace's part in programming Babbage's analytical engine:
Ada Lovelace on BrainPOP
BrainPOP creates animated, curriculum-based content that engages students, supports educators, and bolsters achievement. Our award-winning online educational resources include BrainPOP Jr. (K-3), BrainPOP, BrainPOP EspaƱol, and the newly launched BrainPOP ESL. All are supported by BrainPOP Educators, which features free lesson plans, video tutorials, professional development tools, graphic organizers, and best practices for our teacher community.
Here's BrainPOP's explanation of how they make digital animations using Adobe's Flash suite.

The style reminds me of Noah and Jacobs Scratch projects:
Scratch Project


Scratch Project

Monday, March 15, 2010

Cloe Fan's Arduino Mario

Chloe Fan programmed an Arduino board to control an 8x8 display that plays a simplified Mario scrolling game, with a second Arduino board to play suitable music, transcribed from sheet music. The video below is Cloe's concise demo, and PC Mag wrote up a good general description: Mario Goes Open-Source with Arduino.


Super Mario Bros on an 8x8 LED matrix from Chloe Fan on Vimeo.

Tic Tac Toe mod

A modification of trivia's tic tac toe Scratch program that can't be beat:

Scratch Project

Programmers anonymous blogs

A couple blogs from people in our programming group:

Friday, March 12, 2010

Trivia's Scratch program with scrolling

trivia's incomplete Scratch program, experimenting with scrolling. Use arrow keys to navigate and jump:
Scratch Project

My modification that reorganizes the scrolling sprites, and adds a background sprite:
Scratch Project