New memristor design uses built-in oxygen gradient to bring stability to reinforcement learning

April 3, 2026 by Tejasri Gururaj, Phys.org

Collected at: https://techxplore.com/news/2026-04-memristor-built-oxygen-gradient-stability.html

In a recent study published in Nature Communications, researchers created a memristor that uses a built-in oxygen gradient to produce slow, stable conductance changes, enabling a reinforcement learning (RL) algorithm to learn faster and more stably than conventional approaches.

Reinforcement learning stands as one of the most promising ways to achieve continual learning in AI. The idea is to replicate how biological systems acquire and adapt knowledge slowly over time. The brain achieves this via ion gradients that regulate slow, directional signaling across cell membranes. Replicating this in hardware is a key goal of neuromorphic computing.

With their ability to mimic synaptic behavior, memristors have long been considered strong candidates for this. However, most existing devices suffer from unpredictable, abrupt conductance changes, making sustained and stable learning difficult.

The new work comes from a team across China and Hong Kong, who tackle this challenge by building a memristor with stable, temporally correlated internal states that continual learning requires. Tech Xplore spoke to co-author Haifeng Ling from Nanjing University of Posts and Telecommunications.

Speaking of the core challenge motivating the work, Ling said, “The ionic configuration can change abruptly, typically through sudden formation or rupture of conductive filaments. This leads to stochastic switching behavior and abrupt conductance transitions.”

To solve this, the team looked at biology and aimed to recreate how living cells use ion gradients for slow, gradual state changes in hardware.

How biology does it

Biological cells utilize ion gradients across their membranes to establish a resting potential that precisely regulates the flow of ions.

The gradient creates slow and directional changes to cellular state, letting neurons maintain an internal memory of past activity or cellular states. Recreating this gradient-regulated stability in memristors has proven harder than it sounds.

Most oxide-based devices lack any internal structure to guide ion motion, so ions redistribute randomly under an applied electric field. Attempts to introduce artificial gradients have also fallen short, as repeated electrical operation tends to erode the gradient over time.

“In many cases, these gradients are not stable,” explained Ling. “Repeated electrical operation can gradually deform or erase the gradient because ionic redistribution continuously reshapes the internal structure of the device.”

The result is a hardware landscape where stable, gradient-regulated dynamics—the very thing that makes biological learning so robust—have remained out of reach.

Creating the gradient

To recreate gradient-regulated dynamics in hardware, the team fabricated a memristor with a carefully engineered device stack consisting of indium tin oxide (ITO), zinc-porphyrin (ZnTPP), atomic layer-deposited aluminum oxide (ALD-AlOₓ), and aluminum (Al). The ZnTPP molecular layer is the key ingredient, sandwiched between the electrode and aluminum oxide layer.

The thin ZnTPP layer does two important things:

During fabrication, it provides chemically active sites for the atomic layer deposition of aluminum oxide, causing the interface region to become oxygen-rich and naturally establishing an intrinsic oxygen concentration gradient across the oxide layer.
During operation, it participates in reversible coordination interactions with oxygen ions, helping regulate their migration and generating a stabilizing interfacial electric field that prevents the gradient from eroding.

“Without the gradient, ion motion is like a ball moving on a flat surface,” said Ling. “With the gradient, the system behaves more like a sloped landscape that guides motion in a predictable direction.”

The result is a device whose conductance evolves slowly and continuously after electrical stimulation, with a relaxation timescale exceeding 100 seconds—far longer than the nanosecond-scale decay typical of other second-order memristors. This slow, stable evolution is what makes the device useful for continual learning tasks that unfold over extended timescales.

Learning from the device

The fabricated device demonstrated a conductance modulation range of 98.1% across 40 distinct pseudo-nonvolatile (PNV) conductance states—stable, temporally distinguishable plateaus that persist for a finite period after stimulation.

These were achieved using a pulse scheme the team developed called unipolar spike voltage-dependent plasticity (U-SVDP), which applies pairs of pulses at different amplitudes to precisely balance oxygen ion drift and diffusion along the intrinsic gradient.

“The device does not simply provide a passive relaxation signal,” said Ling. “Instead, through gradient-guided ionic dynamics and U-SVDP modulation, the memristor actively generates a biologically inspired temporal sequence of internal states that defines a dynamic learning-rate trajectory suitable for continual learning in non-stationary environments.”

These 40 PNV states were then mapped to learning rates in a Q-learning reinforcement learning algorithm, with the learning rate calculated from the relative conductance change between successive states. In a static pathfinding task, this approach reduced training iterations by 68.75% compared to conventional strategies.

In dynamic environments with sequentially increasing complexity, the reduction was 35.65%. The smaller gain in dynamic settings points to a current limitation.

“Although the memristor can provide a physically grounded adaptive learning rate, the temporal dynamics of the device are still relatively fixed,” noted Ling.

What’s next

The team’s immediate next step is scaling up from single-device demonstrations toward larger neuromorphic systems.

“One important direction is integrating these memristors into crossbar arrays so that the intrinsic device dynamics can be directly utilized in hardware implementations of reinforcement learning,” said Ling.

Looking further ahead, the team is interested in exploring the role of these devices in embodied intelligence systems, where intelligence arises from the interaction between a physical body, its environment, and the learning algorithm.

“By embedding physically adaptive memristive devices into larger neuromorphic platforms, we hope to move toward hardware systems where learning behavior is partially shaped by the intrinsic properties of the devices themselves,” explained Ling.

Publication details

Jianyu Ming et al, Intrinsic gradient oxygen-driven second-order memristors for continual reinforcement learning, Nature Communications (2026). DOI: 10.1038/s41467-026-70014-0

Journal information: Nature Communications

How biology does it

Creating the gradient

Learning from the device

What’s next

Publication details

Leave a Reply Cancel reply