Just make a quantitative model

What is the best method for researching anything? History, biology, physics, politics? The answer is, from my perspective: just make a quantitative model. Or at least a semi-quantitative model.

Only once one starts to account of the flow of sufficiently immutable things, such as mass, energy, people, time, atoms, monetary value, does it emerge where the gaps in one’s reasoning are. Good rhetorics hides these gaps, and a story might feel convincing. However, if you account for the immutable things, and keep in mind the sufficiently invariant things, which could be the total number of people, or atoms, or grain, or energy in the system, then suddenly the question emerges where someone was for a few years in the 14th century? Who else was there, since three people cannot possibly do this much in such amount of time? How exactly did they get the amount of revenue they needed? Which chemical reactions are existing besides the well-known biochemical pathway? Where does this little bit of energy go?

No doubt this is surprisingly laborious and seems maybe even unnecessary. The reward is the discovery of stories that do not go in the expected way, that there is more to a question. Perhaps even some intellectual humility, if thanks to the model one knows the gaps with some certainty, and perhaps can even say the precise conditions under which story A must be the case, and the conditions for story B to be true. We may never know how the unknown conditions are, or not for a long time. But we know with certainty what we don’t know. If deciding between the stories is important, then we will hence know exactly what to research next, which is a most valuable result.

Human neural network model collapse

AI models show “collapse” if repeatedly trained from the outputs of other neural networks. This means losing knowledge about the rare events in the modelled probability distribution, misjudging the probability of others, and thus becoming dull, stereotypical, simple, and wrong. The worry is that only the first generation of language models could have been trained on the human-produced text corpus of the internet. Now chatbots and other large language models produce so much text on the internet, that the next generations of models might mostly be trained on their output. Analogous problems arise with ai image and video generation that become training materials for the next models. Imagine generation after generation of image generators mostly trained on the output of previous generations of image generators. They would learn and amplify the mannerisms of these models, and incorporate their errors without check. Surely, the number fingers on hands in generated images would go from mostly five to perhaps eight, ten, or twelve with more and more variability in each generation of models. Eyes would move around on heads, while the style would be more and more same-y. 

Neural networks are meant to emulate human behaviour, and they do it well. So both artificial and biological neural networks clearly share some behaviours. Why then do they not share this behaviour, why doesn’t humanity collapse, even though each generation of the neural networks in our heads, that we are, is trained on the previous generations’s output since language and writing exist? 

Answer is it partially does. It does in areas where the particulars of behaviour don’t matter, such as language. Symbols, within some constraints of practicability, we are free to choose. And indeed, language and customs change between generations, evolve into something new all the time. Over generations language without active policing changes so much that the relationship is hard to recognise after a while. Standard French vs Creole, or Medieval English vs modern English are examples. If there are geographical boundaries the changes evolve in various directions, with vastly different outcomes, as demonstrated by the indo-European language family that includes Latin, English, and Sanskrit, all derived from the same origin within a few thousand years of separate evolution. Some of the changes are clear improvements that simplify use and learning, such as dispensing with grammatical genders. Others introduce unnecessary complexity. Overall, from cursory observation, the total difficulty seems to be kept roughly constant. Perhaps to a level that can be acquired by humans during a usual childhood. 

The constrains on change seem to be practicality and usability. There is a set of contents that speakers want to convey and for which they need efficient means. Words and sentences should be not too long, easy to form, and easy to distinguish for the users. Examples of content that humans might want to express elegantly are future tense and potentials, such as”I will go,” and “I might go,” as opposed to “I go”. Pidgin as a simplified derivative of English and French lost through incomplete and careless transmission the originating language’s specific ways of expressing such concepts, and could only express the more basic concepts elegantly. However, the next generation of users, the children of the original pidgin speakers came up with new ways of expressing the more complex concepts. These new ways have nothing to do with the original ways, but are just as powerful, while they are surely as ad hoc as the original means were at the time of their invention. So the needs of speakers are relatively fixed, being a constraint on language evolution, while the means to reach those aims are largely free. 

In this sense we might expect new AI languages to develop, but they will be only useful in any sense if we apply the constraints that efficient communication requires. Then they might evolve into something better than humans have produced this far. Given that humans feel strongly about their own languages, they might not want to learn these new ways of expression though. 

The second way how human neural networks do not become maladaptive is by training on external experience, which is not dependent on other neural network’s models of reality. It is important to honour and make one’s own experiences, and not only with humans, but with nature itself. This is playing and exploring and climbing for a child. It is observing nature and experimenting with real things for a learner. Man-made models and learning apps are no substitute, because those are products of other neural networks, and over a few generations of models will degrade. We have seen that with the state of knowledge about human anatomy from late antiquity to the renaissance. In that time period dissections of humans were widely banned, and even animal dissections were seen as dispensable, since medical professionals had already the ultimate source: Galen’s writings on anatomy. Every new generation learned from these texts, and from derivative texts that others formed without checking with reality. What they did not know was that Galen already got much of his “knowledge” from dissections of animals, not humans. So following generations learned wrong concepts that were a mix of animal and human anatomy. This faulty knowledge further degraded over the following centuries, by misunderstandings perpetuated and expanded by generations of scholars, manual text and image reproductions becoming corrupted over generations of book copies, and by unchecked mixing with the fanciful and superstitious, such as astrology. After learning from merely neural network outputs for centuries, when people began to look at actual human bodies again in the renaissance, a bizarre thing happened. The dissection master would show an organ from a human body, say a kidney, while the professor would read from Galen, and make tall claims, such as that the kidney has lobes (dog kidneys have lobes, humans’s have not). But students and everyone would somehow still believe to see lobes on the human kidney in front of them, because the text said so. Or at least claim to see them, because there was an exam to be passed. This indeed was the time when neural network outputs were more highly valued than own observations. Only with the founder of modern anatomy Andreas Vesalius and his book De Humani Corporis Fabrica did real observation start to win, against much resistance, as the primary source of learning about human anatomy. Only after this could the development of the today mostly helpful modern medicine begin. It seems therefore counterproductive if “modern” medical schools believe the direct experience of cadaveric dissections could be replaced with instruction using textbooks and models, because those are easier and cheaper. 

Similar things seem to have happened in rocketry, where the early experimentation of Goddard, the Verein für Raumschiffahrt, and of the Jet Propulsion Laboratory were replaced with sterile computer work that made space rocket development slow, unreliable, and expensive. The renewed direct experiments of SpaceX lately, the company which builds rockets and sees how they fail in testing have again proved superior. They managed to outclass all other competitors, who are now in terminal decline. In conclusion, although it seems cruel or wasteful, but dissecting a frog or human bodies, and going to the chemistry lab, and building a rocket and launching it can not be substituted for long by plastic models or 3D models, or simulations or animations. 

Without any possibility of grounding in experience, areas of human discourse are known to quickly fragment and collapse. Examples are religions that reject own observation (“hybris”) and rely mostly on texts, and opinions of others based on these texts. Some straightforward examples are Christianity and Judaism with all the commentary written based on previous texts, and with minimal perceived need of grounding in the outside world or usefulness. Practices based on these commentaries in a few generations fragmented into a variety of denominations such as Catholic, Orthodox, Protestant, Methodist, … or Progressive, Hasidic, …. Sometimes the differences are seen as large enough to warrant violence against each other. 

Another example are parts of the humanities that evolved into only tangentially useful models of reality such as postmodernism or some currents of moral philosophy. The latter sometimes fully rejects any concern about the outside world and is only concerned with tradition and introspection. 

As a last example must serve the law. Outside of times of revolution and existential crisis, legal discourse prefers to be only tenuously connected to reality, and rather uses the transmitted products of neural networks. The latter are almost always preferred for training and as input for transformation. The jargon calls these preferences for model outputs “precedence”, “scholarly opinion”, “morality”, “common practice”, and “fiat law”. The economic analysis of law and legal anthropology as observation-based correctives could contribute hugely to keep legal models useful, but they are marginal in the formation and transmission of law. 

Outside of textual discourses the visual arts are not immune against model collapse either. Byzantine iconography, the making of devotional pictures of Orthodox holy topics, became wholly stereotypical and unchanging within a relatively short time. Seeing a large collection of Orthodox icons can be a disconcerting experience because of the sameness. Without a fine eye for detail it is hard to even guess the century in which a picture was made. A 20st century mainstream icon is hardly different from a well-preserved 13th century image of the same topic. The newer images faithfully replicate every feature of older images, even stereotypical mannerisms such as a questionable realism of depicting hands, which surely must have originated in limited skill early in the history of these images. This is an example of learning purely from a limited and derivative set of training material, and having the only aim of emulating those faithfully. A different approach would be to take a higher principle such as emulating nature well (some ancient Greek painting), or emulate well, but also evoke certain emotions (e.g. Italian landscape painting or French academic painting), delight in ways not seen before (e.g. pointillism), or discuss injustice in a strongly appealing way (some activist art of the 20th century). These approaches all required however input beyond the output of previous generations of neural networks. 

While music clearly also shows model collapse and renewal, and living in a bubble produces odd political obsessions, I would like to discuss the positive side of model collapse as well. Sometimes living in a bubble can produce wonderful flourishes, and can be great fun. The high artifice of courtly life in renaissance Italy or at Versailles were formed through amplification of the odd or pleasurable features of previous neural network outputs, without much regard to the outside world. They could afford this disconnect from worldly needs. Undoubtedly those were great times for the arts and entertainment, grotesque as they may have been. It is therefore not always the case that more outside input is always better. If the outside world is too grim or dysfunctional, being judicious of what is allowed into the model is of benefit. Same if the aim is the grotesque. Nonetheless, while model collapse can produce great art and lots of fun for a while, inevitably without external input it all becomes too stale and stereotypical eventually. Then input from the outside world is urgently needed, as the resulting models of these times go out of fashion and live on as historical curiosities. 

In conclusion, thanks to observing human-like behaviour in neural networks we have a mirror onto our own behaviour. We can study what works and what does not. We can learn a lot about how to keep the human neural network collective in a useful state. The first step I suppose is seeing the need to do so. So that fields of human endeavour do not end up like the odd couple who spends their lives without speaking with anyone else, and whose conversations become bored and barren. 

For artificial models, the usual solution is probably grounding in reality, for example using real photos and videos, which is already happening. Ultimately necessary is choosing aims that go beyond faithful or pleasant replication of past results. Most aims we might choose, I suspect, will require real curiosity, play, and experimenting on the side of the model, which means the ability of the models to collect their own sensor data, rejecting some of the output of earlier generations ( teenage rebellion?) and manipulating reality to see what happens. The only exceptions might be fields of pure and rigorous thought, such as mathematics, which has its own means of purification from the absurd, mainly by demanding internal consistency according to fixed, unambiguous rules. But even then some taste is required to choose what question is worth pursuing. 

CAJOE radiation detector & ESP8266: better connection via the headphone port

The cheap CAJOE Geiger tube radiation detector can be connected to an ESP8266 with WiFi in order to connect to Grafana and other consumers of data. I found however that the headphone port of the CAJOE works better than the VIN output, which all projects I saw use for connection.

The projects I saw recommend to connect the 5V signal output on the CAJOE labeled “VIN” with a 3.3V digital input of the ESP8266. Although 5V can potentially damage the ESP8266, it is correctly found that this does not happen, possibly because upon connection the voltage drops to about 3V.

This way however I could detect no counts on the microcontroller. The CAJOE continued to click, and the indicator LED did blink, but the ESP8266 did not detect any changes on its input pin. Using an oscilloscope it became clear immediately why: if connected this way, there are no impulses on VIN, at least with my device. The voltage remains steadily at about 3V, and does not drop briefly to zero upon each click of the detector.

This clarified the importance of a board modification on the CAJOE that some projects obliquely recommend as necessary for “more reliable” detection. It is changing the 470 kΩ Resistor R19 on the board to 1kΩ, or similar. Bypassing the resistor did lead to detectable signals, but I found the work to be unnecessary.

Using the oscilloscope I found that the headphone jack of the CAJOE already produces a perfectly fine 3.6V output signal on both channels of a stereo headphone. Upon connecting the ESP8266 I could directly count the impulses without any further modification of the CAJOE.

Fast as an arrow

An arrow is really fast, about 100 m/s fast when it leaves the bow, that’s about 360 km/h. A microprocessor is running at perhaps 3 GHz, so how far does an arrow move during one CPU cycle? 3 GHz means 3 thousand million cycles per second, and therefore an arrow moves in one cycle 30 nanometres. That’s the order of magnitude of structure elements on microchips, and is about the length of 30 atoms of silicon end-to-end.

Light, the fastest thing in the universe, itself moves only 10 cm in one CPU cycle, that’s for most people about the width of their hand. Since electric signals travel at most at the speed of light, this is the total length that impulses can travel within the CPU to accomplish whatever needs to be done in one cycle. If your memory module is 10 cm away on the mainboard from your CPU, just sending a signal out and back takes at least two cycles. This explains one of the speed limits computers had when they were large like cupboards, built from discrete components. If the cycles were too short, it would be difficult to keep distant parts of the computer in sync, because signals back and forth took longer than a cycle. This also gives a good reason why in Apple’s M processors the memory is right next to the processing units in the encapsulation. That’s bad for adding memory later and modularity, but signal paths are so much shorter. As processing speed is reaching limits of physics, it seems that such tight packages of processing and memory are going to be the common form of future processors, as they already are in processing units for mobile phones and tablets.

AI-Box adversarial training

An AI with unknown motivations of its own, and abilities vastly superior to ours is a mortal danger to the world. It could attack us in ways which we cannot even comprehend. Could we contain such a potentially dangerous superhuman artificial intelligence in a locked computer (“a box”) once it is created, or could it convince one of us in a dialog to let it loose on the world? Could the AI learn to manipulate humans, and get out of the containment just by asking the keepers in the right way?

Continue reading

Turning climate change to stone

Excess carbon dioxide in the atmosphere changes Earth’s climate to far less hospitable to us. However, Earth’s atmosphere is but a thin sliver of gas around a huge ball of rock. How large is our carbon dioxide problem, measured on geological scales?

File:Thin Line of Earth's Atmosphere and the Setting Sun.jpg
Earth’s thin layer of atmosphere with the rising sun shining through, as seen from the International Space Station. (NASA)

Chalk is calcium carbonate or CaCO3, a compound of calcium and carbon dioxide. For much of Earth’s history, marine creatures from microscopic free-floating algae, to oysters, to coral reefs used chalk to build their shells and houses. When these creatures died, their shells sank to the bottom of the sea, or were built upon by new generations of creatures. All CO2 in them was safely stashed away from the atmosphere.

Over aeons thick layers of soft chalk formed around the globe. When we happen to live on land where there used to be oceans in earth’s history, we might be walking on layers of soft white matter tens or hundreds of meters deep. We may admire these layers when they are exposed, as in the white cliffs of Dover or in the mountains of Lebanon.

White Cliffs of Dover 02.JPG
The white chalk cliffs of Dover. (Immanuel Giel)

Even today, the ocean floor is being covered every day with more chalk. Assuming through some force we could encourage the creatures of the oceans to take up all the excess carbon dioxide from the atmosphere and store it away as chalk, how thick a layer would that form on the ocean floor?

Since 1850, humanity emitted about 4.4 * 1017 g carbon as CO2. Chalk or CaCO3 is a sink of carbon dioxide if chalk is produced from quicklime, CaO, according to the following formula:

CaO + CO2 → CaCO3

Humanity emitted enough carbon dioxide for forming 2.49 * 1018 g chalk this way:

MCaCO3 = MC/MwC * (MwC + MwCa + 3MwO) = 4.4 * 1017 g / 12 g/mol * (12 + 40 + 16)g/mol = 2.49 * 1018 g

The volume of chalk would be 1400 cubic kilometres. This follows from the mean density of chalk, which is 1.79 * 10^6 g/m3:

MCaCO3 / 𝝔CaCO3 = 1.39 * 1012 m3 = 1.39 * 103 km3.

That seems much, but how thick would be this layer spread out evenly on the sea-bed which occupies 3.61 * 108 km2? It would be 1.39 * 103 km3 / 3.61 * 108 km2 = 3.9 * 10-6 km. In more familiar units, it would be a layer of 3.9 mm chalk.

In consequence, what is the largest threat to humanity is a humblingly slim layer of 3.9 mm chalk over the ocean floor for geology. If only we knew how to entice sea animals and plants to use our emissions in the atmosphere for building shells fast enough.

AI-Driven UI

User interfaces connect users to software. If good, UI give access to all functions, please the eye, radiate power and trustworthiness. They unobtrusively do their job, and sometimes, they even make happy. Powerful software amounts to nothing if it can’t speak with its users. How can we make the best possible UI, and how can artificial intelligence help computers talking to humans?

Being made for humans, a UI must work with human faculties and limitations. Keeping those faculties and limitations in mind, we can go a long way towards making a good UI without using any AI. But we will see that only AI can ultimately solve the fundamental dilemma of how to make powerful software that is also easy to use.

Continue reading

Being naughty with R

If you sneak up to the unguarded R-session of a friend and enter this, some might soon consider exorcism:

`(` <- function(x) if(is.numeric(x)) runif(1)*x else x

In R the bracket, as in (1+2), is also a function, just as the + part of the expression is. The same is true of {}, as used in function declarations, if-clauses and for-loops. Since R is so flexible, you may well reassign these functions.

This is a great way to make enemies delightful intellectual challenge. Bonus points for entering the definition into an .Rprofile so it is not lost too easily when restarting R.

The formation of a convolutional layer: feature detectors in computer vision

Convolutional layers of artificial neural networks are analogous to the feature detectors of animal vision in that they both search for pre-defined patterns in a visual field. Convolutional layers form during network training, while the question how animal feature detectors form is a matter of debate. I observed and manipulated developing convolutional layers during training in a simple convolutional network: the MNIST hand-written digit recognition example of Google’s TensorFlow.

Convolutional layers, usually many different in parallel, each use a kernel, which they move over the image like a stencil. For each position the degree of overlap is noted, and the end result is a map of pattern occurrences. Classifier network layers combine these maps to decide what the image represents.

The reason why convolutional layers can work as feature detectors is that discrete convolution, covariance and correlation are mathematically very similar. In one dimension:

Multiply two images pixel-by-pixel, then add up all products to convolve them: $$(f * g)[n] = \sum_{m=-\infty}^\infty f_m g_{(n – m)}$$

Subtract their means E(f) first to instead get their covariance: $$\mathrm{cov} (f,g)=\frac{1}{n}\sum_{m=1}^n (f_m-E(f))(g_m-E(g))$$

Then divide by the variances σ(f) to get their Pearson correlation: $$\mathrm{corr}(f,g)={\mathrm{cov}(f,g) \over \sigma_f \sigma_g} $$

Continue reading

Explore the state of the UK, October 2017

An earlier post explored the winning and losing parts of London, as measured by the success of different kinds of cheap and expensive food-selling enterprises.

Assayed the same way (with the same shortcomings, too!), how did the rest of the UK do?  A first answer is: outside London, not many places have done very well, but the Sheffield area is a clear winner. Less densely populated areas are the ones losing most.

Everyone has different questions though: are cheap or expensive venues becoming more successful where I (want to) live? What kind of shops are opening in the South-West? Which parts of the country is Pizza Chain X focusing on? In the linked interactive map I created you can look for yourself. The map answers both questions about the whole of the UK, and about favourite counties, cities or boroughs (just zoom in!).

As a practical note, sometimes a far out region appears surprisingly full of activity. It is worth double-checking this. It may be because the local authority dumped or purged a lot of businesses at the same time (usually in smaller places). It helps to just shift this area to one side, and then the heat map will return to a more useful scale for the rest.

The list below the map shows the coming and going businesses in the area you are looking at. In the left pane you can filter by business name or type, e.g. if you were wondering what supermarkets or specifically Tesco does in a particular area.

Feedback is most welcome below this post. For the future I am considering adding comparisons between different time periods to make this tool even more useful. Please be kind to the tiny server if it is slow. My thanks to the makers of R, Shiny, ggmap and leaflet, and to EC2 for the server.

New and closing restaurants and sandwich shops in the UK.