Device fabrication
MZI array
The fabrication started from a silicononinsulator wafer (SOITEC) with a 220nm silicon (Si) device layer and a 2µm buried oxide layer. A 200nmthick positive ebeam resist (CSAR 62) was spincoated on a diced 1 cm × 1 cm silicononinsulator chip, followed by 3 min prebake at 150 °C. The ebeam resist was patterned by ebeam lithography (EBL; JEOL JBX5500 50 kV) and developed in AR 600546 for 30 s, MIBK for 15 s and IPA for 15 s in sequence. The waveguide patterns were transferred to the Si device layer (etch depth = 110 nm) by reactive ion etching (Oxford Instruments PlasmaPro) with SF_{6} and CHF_{3} gases, followed by O_{2} plasma cleaning of CSAR. A 1µmthick silicon dioxide (SiO_{2}) was deposited by plasmaenhanced chemical vapour deposition (Oxford Instruments PlasmaPro) as the upper cladding layer to isolate waveguides from thermooptic phase shifters. Next, a 2µmthick doublelayer PMMA (PMMA 495 A8 and PMMA 950 A4) was spincoated on the chip, followed by EBL patterning and development in MIBK:IPA = 1:3 for 1 min to define the heater patterns. A 200nmthick NiCr layer was sputtered using a magnetron sputtering system (physical vapour deposition, AJA International), followed by PMMA liftoff to form NiCr heaters. Gold pads of 100 nm thickness were fabricated using a similar process as NiCr heater fabrication, but with ebeam evaporation (Plassys MEB550S). A 3–5nm Cr layer was deposited before gold deposition to serve as an adhesion layer. The optical image of the fabricated MZI array is shown in Supplementary Fig. 1.
Photonic memory crossbar array
The Si photonic circuit was fabricated using the foundry multiproject wafer service provided by CORNERSTONE. The detailed specifications of CORNERSTONE standard waveguide components can be found at https://cornerstone.sotonfab.co.uk/. The fabricated Si photonic circuit has a 1µmthick SiO_{2} upper cladding. SiO_{2} windows were patterned by EBL and opened by hydrogen fluoride for the following deposition of the Ge_{2}Sb_{2}Te_{5} (GST)/indium tin oxide (ITO) stack. Next, GST/ITO stack windows were opened by the abovementioned PMMA process. A 10nmthick/10nmthick GST/ITO stack was deposited on the waveguide using a magnetron sputtering system (physical vapour deposition, AJA International). The GST and ITO targets were respectively sputtered at 30 W RF power with 3 sccm Ar flow and 40 W RF power with 3 sccm Ar flow at a base pressure of 10^{−7} torr. The stack was then lifted off in acetone for 180 min at 50 °C. Next, the thermooptic phase shifters were fabricated using the method described for the MZI array. Finally, the chip was annealed on a hotplate for 5 min at 250 °C to fully crystallize the GST. The fabricated photonic memory crossbar array is shown in Fig. 3a.
Photonic EAM tensor core
The photonic EAM tensor core was fabricated using the foundry multiproject wafer service provided by IMEC: iSiPP50G, with details at https://www.imeciclink.com/en/asicfabrication/si. This platform provides the monolithic integration of passive waveguide circuits, integrated EAMs and integrated photodetectors used in the photonic EAM tensor core.
Measurement setup
Coherence property measurement
The coherent light was generated by a tunable coherent laser (Santec, TSL550) operating at 1,550 nm. The 0.8nmbandwidth C34 partially coherent light was generated by filtering the ASE from an EDFA (Pritel FA33) with a passive DEMUX module (Gezhi, DWDM100GDEMUX) operating at channel C34 of the ITU grid. The 2.0, 4.0, 8.0 and 16.0nmbandwidth partially coherent light sources were generated by filtering the same ASE with an optical tunable bandpass filter (Santec, OTF350) operating at a centre wavelength of 1,550 nm. The spectra were measured by an optical spectrum analyser (Anritsu, MS9710C). For eye diagrams, light was modulated by a pulse generator (Agilent, 8133A) through an electrooptic modulator (Lucent 2623N) and received by a photodetector (Newport New Focus 1611) connected to an oscilloscope (Tektronix, TDS7404B).
System setup for parallel convolutional processing
The experimental setup for parallel convolutional processing on two gait signals is shown in Fig. 4a. The photonic memory crossbar array has three input channels and three output channels, representing a d_{3×3} matrix consisting of three d_{1×3} kernels. The input light was switchable between an EDFA (Pritel FA33) and a tunable pump laser (Santec, TSL550) using an optical switch (Gezhi GZ12C1×2SM). The phasechangematerial photonic memory in each cell of the photonic memory crossbar array was first set to the desired weight to correctly define kernels. The tunable pump laser was used in phasechangematerial weight setting. The amplified pump light passed through a DEMUX module (Gezhi, DWDM100GDEMUX) so that different wavelengths were routed to different input channels (λ_{1} = 1,550.12 nm to Ch 1, λ_{2} = 1,550.92 nm to Ch 2 and λ_{3} = 1,551.72 nm to Ch 3). After setting all phasechangematerial weights, parallel convolution was performed using the ASE from the EDFA. The DEMUX module was used to separate two wavelengths with a spacing of 0.8 nm to two different channels (λ_{1} = 1,550.12 nm and λ_{2} = 1,550.92 nm). Each wavelength was split into three channels by an optical splitter (FS PLC splitter). The three channels serve as the input light to the three respective input waveguide channels of the photonic memory tensor core. Adjacent channels have a 1m path difference, using a further 1mlong fibre to eliminate the coherence among all three input light sources. The gaitsignal data were loaded into each channel using a variable optical attenuator (VOA; Thorlabs V1550A). The VOAs were driven by a digital signal processor (DSP; NI USB6259). The polarization of output light from the VOA was controlled by a polarization controller (Thorlabs FPC032). Different wavelengths carrying the gait signal at the same time index from different patients were then grouped by a MUX array (Gezhi, DWDM100GMUX) to form three inputs to the respective input channels of the photonic memory tensor core. Convolutions were performed naturally as light propagated through the photonic memory crossbar array. Each output channel of the photonic memory tensor core contained both wavelengths λ_{1} and λ_{2}. The two wavelengths were demultiplexed to obtain the outputs and detected by a photodetector array (Newport New Focus 2011) and finally read out from the DSP.
System setup for highspeed convolutional processing
The experimental setup for highspeed convolutional processing on the MNIST datasets is shown in Supplementary Fig. 13. The whole system operating at 2 GSa s^{−1} was controlled by a FPGA evaluation board (Xilinx, Zynq UltraScale+ RFSoC ZCU216) with a processing system unit, a programmable logic unit, 16 DACs and 16 analoguetodigital controllers. The optical input was the 8.0nmbandwidth partially coherent light equally split into nine input grating couplers. The MNIST data were read by the processing system unit, stored in its DDR4 memory and accessed by the programmable logic unit to output at nine analoguetodigital controllers that modulated optical signals through the input EAM array. The weights on the photonic EAM crossbar array were set by a lowspeed DSP. The three convolutional processing outputs were received by the integrated photodetector array connected to three transimpedance amplifiers and analoguetodigital controllers, routed back to the processing system unit and stored in DDR4 memory.
Mapping nonnegative transmission to negative convolution results
The input gait signals and image data presented in this work are nonnegative, that is, x ∈ [0, 1]. The kernels involve negative values, that is, w ∈ [−1, 1]. The measurable outputs from the photonic system are nonnegative as a result of them being physical quantities. We need to map these nonnegative outputs to convolution results in the range [−1, 1]. This is done by the following steps:

(a)
We normalize every gait signal or image data to [0, 1] using software and load these normalized data to the photonic tensor core using modulators.

(b)
We represent the input data x using the output power of the modulator by setting P = x(P_{max} − P_{min}) + P_{min}, in which P_{max} and P_{min} are the maximum and minimum outputs from the modulator, respectively.

(c)
We represent the weight w using the transmission level of the phasechange material or the EAM by setting (T=wleft(frac{{T}_{max }{T}_{min }}{2}right)+frac{{T}_{max }+{T}_{min }}{2}), in which T_{max} and T_{min} are the maximum and minimum transmission levels of the weightsetting device, respectively.

(d)
We set the input vector x to the target input data and set the kernel w to the target weights. The measured output is:
$${sum }_{i}{P}_{i}times {T}_{i}={sum }_{i}left[({P}_{max }{P}_{min })left(frac{{T}_{max }{T}_{min }}{2}right){x}_{i}{w}_{i}+({P}_{max }{P}_{min })frac{{T}_{max }+{T}_{min }}{2}{x}_{i}+{P}_{min }left(frac{{T}_{max }{T}_{min }}{2}right){w}_{i}+{P}_{min }frac{{T}_{max }+{T}_{min }}{2}right]$$
(1)
Step (d) should be performed for every input vector x.

(e)
We set all x = 0 and all w = 0. Thus all P = P_{min} and all (T=frac{{T}_{max }+{T}_{min }}{2}). The measured output is:
$${sum }_{i}{P}_{min }frac{{T}_{max }+{T}_{min }}{2}$$
(2)
Step (e) only needs to be performed once for the whole system.

(f)
We set all x = 0 and set w to the target weights. Thus all P = P_{min} and ({T}_{i}={w}_{i}left(frac{{T}_{max }{T}_{min }}{2}right)+frac{{T}_{max }+{T}_{min }}{2}). The measured output is:
$${sum }_{i}left[{P}_{min }left(frac{{T}_{max }{T}_{min }}{2}right){w}_{i}+{P}_{min }frac{{T}_{max }+{T}_{min }}{2}right]$$
(3)
Step (f) needs to be performed once for each kernel.

(g)
We set x to the target input data and set all w = 0. Thus P_{i} = x_{i}(P_{max} − P_{min}) + P_{min} and all (T=frac{{T}_{max }+{T}_{min }}{2}). The measured output is:
$${sum }_{i}left[left({P}_{max }{P}_{min }right)frac{{T}_{max }+{T}_{min }}{2}{x}_{i}+{P}_{min }frac{{T}_{max }+{T}_{min }}{2}right]$$
(4)
Step (g) should be performed for every input vector x.

(h)
We perform postprocessing on a computer using the measured output from steps (d)–(g) as:
$${rm{Result}}=left(1right)left(3right)left(4right)+left(2right)=left({P}_{max }{P}_{min }right)left(frac{{T}_{max }{T}_{min }}{2}right){sum }_{i}{x}_{i}{w}_{i}$$
(5)

(i)
We normalize the results to [−1, 1] using software because all results share the same factor of ({(P}_{max }{P}_{min })(frac{{T}_{max }{T}_{min }}{2})) and x ∈ [0, 1] and w ∈ [−1, 1].
We can see that the hardware computation is doubled using this mapping approach, yet this mapping approach can be implemented without doubling by hardware implementation involving a balanced photodetection scheme (Supplementary Text 2).
Generation, convolution and output of gait signals
The properties of the original gaitsignal data collected by force sensors (Ultraflex Computer Dyno Graphy, Infotronic) are described in the next section ‘CNN model; Gaitsignal dataset’.
For parallel convolution of the middle three timedomain data of two gait signals, the input matrix is a d_{3×2} matrix: (X=left[begin{array}{cc}{x}_{11} & {x}_{12}\ {x}_{21} & {x}_{22}\ {x}_{31} & {x}_{32}end{array}right]). The jth column of X contains the middle three timedomain data of the jth gait signal (Fig. 4). The ith row of X contains the ith timedomain data of two gait signals. A DSP drove VOAs to load gait signals into the optical domain. The photonic memory tensor core was then effectively performing:
$$begin{array}{c}{Y=Wtimes X=left[begin{array}{ccc}{w}_{11} & {w}_{12} & {w}_{13}\ {w}_{21} & {w}_{22} & {w}_{23}\ {w}_{31} & {w}_{32} & {w}_{33}end{array}right]}^{{rm{T}}}left[begin{array}{cc}{x}_{11} & {x}_{12}\ {x}_{21} & {x}_{22}\ {x}_{31} & {x}_{32}end{array}right]\ ,,,,=,left[begin{array}{cc}mathop{sum }limits_{n=1}^{3}{{w}_{n1}x}_{n1} & mathop{sum }limits_{n=1}^{3}{{w}_{n1}x}_{n2}\ mathop{sum }limits_{n=1}^{3}{{w}_{n2}x}_{n1} & mathop{sum }limits_{n=1}^{3}{{w}_{n2}x}_{n2}\ mathop{sum }limits_{n=1}^{3}{{w}_{n3}x}_{n1} & mathop{sum }limits_{n=1}^{3}{{w}_{n3}x}_{n3}end{array}right]=left[begin{array}{cc}{y}_{11} & {y}_{12}\ {y}_{21} & {y}_{22}\ {y}_{31} & {y}_{32}end{array}right]end{array}$$
in which ({y}_{{ij}}={sum }_{n=1}^{3}{{w}_{{ni}}x}_{{nj}}) represents the convolution result of the middle three timedomain data of the jth gait signal using the ith kernel. Each row of Y was output from the respective photonic memory tensor core output channel.
CNN model
Gaitsignal dataset
Gait signals from ten patients with Parkinson’s disease were taken from the ‘Gait in Parkinson’s Disease’ database in PhysioNet^{51,52}. This database includes the vertical ground reaction force records of individuals as they walked at their usual, selfselected pace for approximately 2 min on level ground. The corresponding clinical information of ten patients is provided in Supplementary Table 1. Fifty gait pulses were extracted from each patient, leading to a total of 500 gait pulses. Each pulse has a 1.2s duration. The original electrocardiogram signals have a 0.01s time resolution. Gait pulses were extracted with a time interval of 0.04 s (that is, one out of every four original data), leading to 31 data in the extracted gait pulses. The 0.04s time interval was carefully chosen to minimize the extracted dataset while maintaining the key features from the original gait pulses. Eighty per cent of pulses were used for training and 20% were used for testing, that is, a total of 400 pulses for training and 100 pulses for testing.
MNIST dataset
The test dataset of MNIST handwritten digits and MNIST fashion products were respectively taken from https://gitdisl.github.io/GTDLBench/datasets/mnist_datasets/ and https://developer.ibm.com/exchanges/data/all/fashionmnist/. In both cases, the 10,000 test images were split into a training set with 8,000 images and a testing set with 2,000 images.
CNN architecture
The CNN architecture for the classification of the gaits dataset is shown in Fig. 4d. The input layer takes the gait signal, which is in the form of a d_{31×1} 1D array. The 1D array is passed to a convolution layer consisting of three d_{1×3} kernels. Convolution operations were implemented with a stride of 1 and ‘valid padding’, resulting in a d_{3×(313+1)} output. The output was activated by a rectified linear unit layer and flattened to a d_{87×1} vector. The flattened activated output was then fed to a fully connected layer with ten neurons. The output from the fully connected layer was converted to probabilities by a softmax layer. Finally, the classification result was obtained. The gait signals were classified into ten categories, representing ten patients with Parkinson’s disease. The convolution operations were implemented using the photonic memory tensor core. The convolution results were processed by the following CNN layers using the MATLAB R2021b Deep Learning Toolbox. Weights of the fully connected layer were trained by the Adam optimizer. A hundred epochs were used to reach the final CNN outcomes. The CNN architecture for the MNIST datasets is similar to that for the gaits dataset, as shown in Fig. 5d. We will only mention the key differences here. For the MNIST datasets, besides the trivial difference in layer dimensions, the images were convolved with ‘same padding’ implemented by the photonic EAM tensor core. We used 50 epochs to reach the final CNN outcomes.
More News
Stunning trial shows twiceyearly shots can prevent HIV infection
US postdoc support from NIH could be capped at five years — sparking criticism
The future of Mars Colony Two