



Southampton ARM SILISTIX THALES

1

#### Self-Tuning Bio-Inspired Massively-Parallel Computing

Steve Furber The University of Manchester <u>steve.furber@manchester.ac.uk</u>

> University Of









- 63 years of progress
- Many cores make light work
- Building brains
- The *SpiNNaker* project
- The networking challenge
- A generic neural modelling platform

Southampton ARM SILISTIX THALES

2

• Plans & conclusions















#### SpiNNaker CPU (2011)



EXADAPT Mar 2012





University Of

Sheffield





# 63 years of progress

- Baby:
  - filled a medium-sized room
  - used 3.5 kW of electrical power
  - executed 700 instructions per second
- SpiNNaker ARM968 CPU node:
  - fills ~3.5mm<sup>2</sup> of silicon (130nm)
  - uses 40 mW of electrical power
  - executes 200,000,000 instructions per second





5

Southampton ARM SILISTIX THALES





# Energy efficiency

University Of

- Baby:
  - 5 Joules per instruction
- SpiNNaker ARM968:

EPSRO

- 0.000 000 000 2 Joules per instruction
- **25,000,000,000** times better than Baby!

UNIVERSITY OF CAMBRIDGE



*(James Prescott Joule born Salford, 1818)* 

6

Southampton ARM SILISTIX THALES





7

#### Transistors per Intel chip











8

Southampton ARM SILISTIX THALES

- atomic scales
  - less predictable

UNIVERSITY OF CAMBRIDGE

less reliable

EPSRC





- 63 years of progress
- Many cores make light work
- Building brains
- The *SpiNNaker* project
- The networking challenge
- A generic neural modelling platform

Southampton ARM SILISTIX THALES

9

• Plans & conclusions



#### Multi-core CPUs

University Of

- High-end uniprocessors
  - diminishing returns from complexity
  - wire vs transistor delays
- Multi-core processors
  - cut-and-paste
  - *simple* way to deliver more MIPS
- Moore's Law
  - more transistors
  - more cores

#### ... but what about the software?

UNIVERSITY OF CAMBRIDGE



10

Southampton ARM SILISTIX THALES



#### **Multi-core CPUS**

- General-purpose parallelization
  - an unsolved problem



Southampton ARM SILISTIX THALES

11

- the 'Holy Grail' of computer science for half a century?
- but imperative in the many-core world
- Once solved
  - few complex cores, or many simple cores?
  - simple cores win hands-down on power-efficiency!

University Of



#### **Back to the future**

- Imagine...
  - a limitless supply of (free) processors
  - load-balancing is irrelevant
  - all that matters is:
    - the energy used to perform a computation
    - formulating the problem to avoid synchronisation

Of Southampton ARM SILISTIX THALES

- abandoning determinism
- How might such systems work?





- 63 years of progress
- Many cores make light work
- Building brains
- The SpiNNaker project
- The networking challenge
- A generic neural modelling platform

Southampton ARM SILISTIX THALES

13

• Plans & conclusions



# **Building brains**

- Brains demonstrate
  - massive parallelism (10<sup>11</sup> neurons)
  - massive connectivity (10<sup>15</sup> synapses)
  - excellent power-efficiency
    - much better than today's microchips
  - low-performance components (~ 100 Hz)
  - low-speed communication (~ metres/sec)
  - adaptivity tolerant of component failure

University Of

autonomous learning



14

Southampton ARM SILISTIX THALES



#### **Bio-inspiration**

Southampton ARM SILISTIX THALES

15

- How can massively parallel computing resources accelerate our understanding of brain function?
- How can our growing understanding of brain function point the way to more efficient parallel, fault-tolerant computation?

University Of



## **Building brains**

University Of

- Neurons
  - multiple inputs, single output (c.f. logic gate)
  - useful across multiple scales (10<sup>2</sup> to 10<sup>11</sup>)

UNIVERSITY OF CAMBRIDGE

Brain structure

EPSR

- regularity
- e.g. 6-layer cortical 'microarchitecture'





16

Southampton ARM SILISTIX THALES

#### **SpiNNaker** Spike Timing Dependent Plasticity **B**iologically I nspired Massively Parallel **A**rchitectures 0.35 0.30 0.25TIME 0.20Potentiation: Presynaptic Leads 0.15TIME Postsynaptic В 0.10 0.05 **Delta T** 0.00 Potentiation dw -0.05 -0.10Depression: -0.15Presynaptic Lags Postsynaptic -0.20-0.25TIME -0.30-0.35TIME -0.40В -0.45**Delta** T -0.50Depotentiation -0.5520-60 -30 -200 10 30 60 -50-40-10 4050dt (ms)

MANCHESTER

University Of

Sheffield

EXADAPT Mar 2012

EPSRO

UNIVERSITY OF

CAMBRIDGE

Southampton ARM SILISTIX THALES



#### Learning patterns

• Spot the pattern?



Southampton ARM SILISTIX THALES

18

EXADAPT Mar 2012

EPSRC

UNIVERSITY OF CAMBRIDGE

University Of



#### Learning patterns

• Now you see it!



Southampton ARM SILISTIX THALES

19

EXADAPT Mar 2012

EPSRC

UNIVERSITY OF CAMBRIDGE

University Of

Sheffield



#### Learning patterns



EX/





# Self-tuning: in brains

- With STDP, and no other re-inforcement
  - neurons learn the statistics of their inputs
- and, with just a little mutual inhibition
  - populations distribute themselves across the range of presented inputs.

University Of

- New inputs are interpreted against these learnt statistics.
  - Bayes would be very proud!

Masquelier & Thorpe, 2007

21

Southampton ARM SILISTIX THALES





- 63 years of progress
- Many cores make light work
- Building brains
- The SpiNNaker project
- The networking challenge
- A generic neural modelling platform

Southampton ARM SILISTIX THALES

22

• Plans & conclusions



# SpiNNaker project

- Multi-core CPU node
  - 18 ARM968 processors
  - to model large-scale systems of spiking neurons
- Scalable up to systems with 10,000s of nodes
  - over a million processors
  - >10<sup>8</sup> MIPS total

EPSR



Southampton ARM SILISTIX THALES

23

Host System

University Of

UNIVERSITY OF CAMBRIDGE



## **Design principles**

- Virtualised topology
  - physical and logical connectivity are decoupled
- Bounded asynchrony
  - time models itself
- Energy frugality
  - processors are free
  - the real cost of computation is energy

University Of

Southampton ARM SILISTIX THALES



#### SpiNNaker system



EXADAPT Mar 2012





MANCH





Massively Parallel Architectures

#### CMP node



EXADAPT Mar 2012

EPSRC

UNIVERSITY OF CAMBRIDGE

MANCHESTER 1824

The University Of Sheffield.





#### SpiNNaker chip



H

MANCHESTER

University Of

Sheffield.







Biologically Inspired Massively Parallel Architectures

#### **SpiNNaker SiP**



EXADAPT Mar 2012







Southampton ARM SILISTIX THALES



- Strategy: for all components consider:
  - fault insertion how do we test the FT feature?
  - fault detection we have a problem!
  - fault isolation contain the damage
  - reconfiguration repair the damage
- Goal: minimize performance deficit x time
  - real-time system, so checkpoint & restart inapplicable





#### **Circuit-level fault-tolerance**

Southampton

University Of

- Delay-insensitive comms
  - 3-of-6 RTZ on chip
  - 2-of-7 NRZ off chip
- Deadlock resistance
  - Tx & Rx circuits have high deadlock immunity
  - Tx & Rx can be reset independently
    - each injects a token at reset
    - true transition detector filters surplus token

UNIVERSITY OF CAMBRIDGE



ARM SILISTIX THALES



#### System-level fault-tolerance

- Breaking symmetry
  - any processor can be Monitor Processor
    - local 'election' on each chip, after self-test
  - all nodes are identical at start-up
    - addresses are computed relative to node with host connection (0,0)
  - system initialised using flood-fill
    - nearest-neighbour packet type
    - boot time (almost) independent of system scale

of Southampton ARM SILISTIX THALES



- Cross-system delay << 1ms</li>
  - hardware routing
  - 'emergency' routing
    - failed links
    - congestion
  - permanent fault
    - reroute (s/w)

EPSR

UNIVERSITY OF CAMBRIDGE



ARM Silistix THALES

32

Southampton

University Of





- 63 years of progress
- Many cores make light work
- Building brains
- The SpiNNaker project
- The networking challenge
- A generic neural modelling platform

Southampton ARM SILISTIX THALES

33

• Plans & conclusions



- Emulate the very high connectivity of real neurons
- A spike generated by a neuron firing must be conveyed efficiently to >1,000 inputs
- On-chip and inter-chip spike communication should use the same delivery mechanism







Southampton ARM SILISTIX THALES





- Four packet types
  - MC (multicast): source routed; carry events (spikes)
  - P2P (point-to-point): used for bootstrap, debug, monitoring, etc
  - NN (nearest neighbour): build address map, flood-fill code
  - FR (fixed route): carry 64-bit debug data to host
- Timestamp mechanism removes errant packets
  - which could otherwise circulate forever

| _ | Hea | der (8 | B bi | ts) |   | Event ID (32 bits) |
|---|-----|--------|------|-----|---|--------------------|
| Т | ER  | TS     | 0    | -   | Ρ |                    |

| Header (8 bits) | Address (* | Address (16+16 bits) |  |  |
|-----------------|------------|----------------------|--|--|
| T SQ TS 1 -     | P Dest     | Srce                 |  |  |

Southampton ARM SILISTIX THALES





- All MC spike event packets are sent to a router
- Ternary CAM keeps router size manageable at 1024 entries (but careful network mapping also essential)
- CAM 'hit' yields a set of destinations for this spike event
  - automatic multicasting
- CAM 'miss' routes event to a 'default' output link







MANCHESTER

University Of

Sheffield.

EXADAPT Mar 2012







EXADAPT Mar 2012

**EPSRC** UNIVERSITY OF CAMBRIDGE

MANCHESTER



Southampton ARM SIlistix THALES



## **Bisection performance**



• 1,024 links

- in each direction
- ~10 billion packets/s
- 10Hz mean firing rate
- 250 Gbps bisection bandwidth



39

SpiNNaker CMP

Host System

EXADAPT Mar 2012



The University of Southampton ARM SILISTIX THALES





- 63 years of progress
- Many cores make light work
- Building brains
- The SpiNNaker project
- The networking challenge
- A generic neural modelling platform

University Of

Plans & conclusions









Sheffield

EXADAPT Mar 2012





Parallel Architectures

#### PACMAN

#### - Partitioning and Configuration Manager

High Level Interface (PyNN, NEST, Lens, Damson etc.)

Splitting and Grouping using SQL defined rules

Mapping available cores + constraints

Binary File Generation

EPSRC

UNIVERSITY OF CAMBRIDGE

Model Level

and projections

#### PACMAN Level

part populations and projections group/core map

System Library

Model Library

43

EXADAPT Mar 2012

The Southampton ARM Silistix THALES



## Self-tuning: software

- PACMAN: extrinsic configuration
  - good for small systems
- 1000-processor system
  - move table creation into SpiNNaker
- 10,000-100,000 processors
  - increasingly intrinsic configuration
- Million processor system
  - application loaded in one place
  - relax configuration across machine
  - continue relaxation at run-time to relax hot-spots

University Of

#### We don't know how to do this!

Southampton ARM SILISTIX THALES



# **PyNN integration**





Southampton ARM SILISTIX THALES

45









## **PyNN integration**



Massively **P**arallel **A**rchitectures



Input Current (nA/nF)

46

Southampton ARM SILISTIX THALES

Time (ms)

ESTER

MANCH

University Of

Sheffield

Izhikevich

EPSRC

UNIVERSITY OF CAMBRIDGE



# **PyNN integration**

University Of

Sheffield

500

450

 Vogels-Abbott
benchmark
– 500 LIF

neurons

EPSRC

UNIVERSITY OF CAMBRIDGE



Southampton ARM SILISTIX THALES

SpiNNaker • NEST



**A**rchitectures

### **SpiNNaker robot control**



MANCHESTER

University Of

Sheffield.









- 63 years of progress
- Many cores make light work
- Building brains
- The SpiNNaker project
- The networking challenge
- A generic neural modelling platform

University Of Southampton ARM SILISTIX THALES

49

Plans & conclusions



#### Hexagonal PCB structure



Sheffield

MANCH

EXADAPT Mar 2012



Parallel Architectures

#### **Hexagonal PCB structure**



EXADAPT Mar 2012





University Of

Sheffield.

Southampton ARM SILISTIX THALES



**A**rchitectures





EXADAPT Mar 2012





Southampton ARM SILISTIX THALES



EXADAPT Mar 2012



HESTER 1824 The University Of Sheffield

Southampton ARM SIlistix THALES



### **Current status...**

- Full 18-core chip: arrived 20 May 2011
- Test card: 4 chips, 72 processors
  - Cards can be linked together
- Neuron models: LIF, Izhikevich, MLP
- Synapse models: STDP, NMDA
- Networks: PyNN -> SpiNNaker, various small tools to build Router tables, etc
- ...and the next steps:
- 48-chip 103 machine (Q1 2012), 500-chip 104 machine (Q2 2012), 5,000-chip 105 machine (H2 2012), 50,000-chip 106 machine (end H2 2012).

Of Southampton ARM SILISTIX THALES



## Conclusions

University Of

- Brains represent a significant computational challenge
  - now coming within range?
- SpiNNaker is driven by the brain modelling objective
  - virtualised topology, bounded asynchrony, energy frugality
- The major architectural innovation is the multicast communications infrastructure
- Self-tuning at many levels
  - hardware (for fault-tolerance), software and, most effectively, in the neurons themselves!
- We have prototype working hardware!









Southampton ARM SILISTIX THALES





## SpiNNaker team



#### Southampton



Manchester

EXADAPT Mar 2012





The University Of Sheffield

