May 20, 2025 by Okinawa Institute of Science and Technology

Collected at: https://techxplore.com/news/2025-05-peak-efficiency-optimizing-tutelage.html

The information age is built on mathematics. From finding the best route between two points, predicting the future load on a national power grid or tomorrow’s weather, to identifying ideal treatment options for diseases, algorithms share a common structure: they take input data, process it through a series of calculations, and deliver an output.

Powering the ongoing AI revolution are increasingly sophisticated algorithms, often composed of millions of lines of code. And the more steps a model goes through before presenting a solution, the costlier it is in the number of physical computing units, time, and energy required.

Optimizing these mathematical models is at the heart of the work of the Machine Learning and Data Science Unit (MLDS) at the Okinawa Institute of Science and Technology (OIST). Led by Professor Makoto Yamada, the unit strives to unlock the full potential of machine learning (ML) and improve efficiency, optimizing not just data science but also education and the scholarly output within the unit through a distributed hierarchy.

Reducing costs

Imagine you are tasked with moving goods from several warehouses to different stores. Each warehouse has a different quantity of goods available, and each store requires a different number of goods. To minimize transportation costs, you need to figure out the most efficient route between the different warehouses and stores that brings down the total distance required for moving all the needed goods.

This is the basic setup of Optimal Transport (OT) problems, which, while simple on the surface, are challenging at scale: as the number of distribution points (warehouses and stores) increases, so too does the complexity—and cost—of the algorithm.

“We are focused on optimizing and designing new tools for solving Optimal Transport problems,” says Prof. Yamada. OT is central to data science as a way of discovering the most efficient method of moving data between distribution points.

Consider single cell gene expression data as an example: “This data can be very highly dimensional with a large number of samples—we might investigate 20,000 genes with 100,000 cells. Computing the relationship between each gene in the context of drug discovery or disease classification is extremely complex, especially if we want to train ML on the data.

“One of our goals is to accurately solve OT in such cases with linear complexity: a computational cost that only scales by the number of distribution points.”

For the International Conference on Learning Representations (ICLR 2025), five papers from the MLDS were accepted. Two of these feature models that focus specifically on bringing down OT cost in computationally expensive ML. Both papers are available on the arXiv preprint server.

One paper introduces an OT method that better captures class relationships by comparing full feature distributions instead of simple averages, improving both accuracy and efficiency.

The second paper addresses the challenge of efficient unsupervised learning on unlabeled data. In this setting, ML models must learn both the structure of the data (how each feature and sample are related) and the rules for measuring similarity, here quantified through Wasserstein distance, which captures the minimal effort needed to morph one data distribution into another.

To overcome the prohibitive cost associated with computing this, the team introduces a novel method based on tree-Wasserstein distance, where instead of calculating distances across a full grid of all the distribution points, both samples (like cells) and features (like genes) are mapped onto nodes in a branching tree, where any two nodes are connected by exactly one path. This structure drastically reduces the number of comparisons required, thereby vastly improving the computational cost.

Learning at peak efficiency
The prediction module of an autonomous vehicle can be manipulated to include a specific trigger that may cause it to intentionally mispredict its trajectory and potentially lead to an accident. Credit: Mohammad Sabokrou (OIST)

Expanding accessibility

Another key focus of the unit is improving the reliability and safety of models by reducing errors and the potential for harm during model development, use, and output. One challenge here is open set recognition, which refers to how the inputs that a model may encounter after training are messy and are often entirely irrelevant.

Dr. Mohammad Sabokrou, who leads the unit’s research on trustworthy ML, explains, “If you’re using an ML model to detect different types of cars from images, and you give it a photo of a tiger, it should know that it cannot classify the image—it’s problematic if it confidently classifies objects it shouldn’t be able to.”

Teaching a model to draw this boundary while recognizing known objects in unusual configurations, like a car viewed from different angles, is difficult. Detecting samples that are irrelevant or dissimilar to the training data, also known as out-of-distribution samples, is generally framed as an anomaly detection problem.

One approach that the unit is exploring here is contrastive learning, where a model is taught to pull similar inputs together and push dissimilar (i.e., anomaly) ones apart in a feature space.

“Anomaly detection is closely related to tasks such as novelty detection, irregularity detection, open-set recognition, and out-of-distribution detection. While these tasks are similar in nature, they differ primarily in their testing settings,” says Dr. Sabokrou.

“We are working to unify the metrics across these different task types, which would allow for much greater knowledge sharing.”

The unit also works to improve trustworthy models through various attacks that expose their vulnerabilities. Adversarial attacks use subtle tweaks to provoke errors.

Backdoor attacks exploit hidden triggers in the training data, whether deliberately introduced or unintentionally inherited through spurious correlations or societal bias: a model might wrongly infer family ties in photos based on lighting or pick up bias against underrepresented groups from non-inclusive training sets.

And finally, membership inference attacks test whether a given input was in a model’s training set to detect or exploit data leakage, with major implications for privacy and safety.

These attacks can reveal if a cancer screening model mistakenly learns from artifacts like image scale bars, or if generative models reproduce copyrighted material. Together, these strategies provide a powerful diagnostic to improve the safety and reliability of AI systems.

Learning at peak efficiency
The Machine Learning and Data Science Unit recently held a machine learning conference at OIST. On the left is a panel discussion between several professors and industry professionals. On the right is Professor Makoto Yamada during his talk. Credit: Adrian Skov (OIST)

Promoting growth

Data science is at the foundation of most scientific fields, and as such, improving the methods by which researchers can extract knowledge from data improves the efficiency of the scientific process. And with the explosion in general AI usage, reducing computational cost and maximizing model safety and reliability have become all the more important.

The principle of efficiency permeates the unit, which is characterized by a flat hierarchy and a distributed approach to mentorship. For one, Prof. Yamada encourages unit members to be corresponding authors of their papers: “it’s good for your career and for your learning. That is why it is often our post-docs who take on this role, so that they get the experience.”

In the same vein, mentorship is delegated across the unit, rather than centered on Prof. Yamada, with post-docs and staff scientists usually tasked with directly supervising graduate students and interns—though Prof. Yamada remains within reach of everyone, and he keeps tabs on every project.

“It’s much more efficient to learn by doing. Plus, I like to talk,” he says. Similarly, Dr. Sabokrou is motivated by collaboration, working closely together with interns, companies, former colleagues, and external researchers from around the world. “You naturally build a network through your academic career and helping each other out diffuses knowledge and contributes to progress.”

The flat, high-trust unit culture pays off, with four of the five papers accepted for ICLR 2025 being penned by interns. And by taking on a steady stream of university interns and committing resources to various science outreach activities—such as an on-going math cafe for Okinawan junior high schoolers and the popular Machine Learning Summer School, which last year saw more than 200 participants from across the world—Prof. Yamada and the unit enacts their commitment to science through their foundational work.

As he says, “education is the best, long-term investment for both science and society. Our goal is to maximize the efficiency of that investment.”

More information: Siqi Zeng et al, Learning Structured Representations by Embedding Class Hierarchy with Fast Optimal Transport, arXiv (2024). DOI: 10.48550/arxiv.2410.03052

Kira M. Düsterwald et al, Fast unsupervised ground metric learning with tree-Wasserstein distance, arXiv (2024). DOI: 10.48550/arxiv.2411.07432

Journal information: arXiv 

Leave a Reply

Your email address will not be published. Required fields are marked *

0 0 votes
Article Rating
Subscribe
Notify of
guest
19 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
drover sointeru
3 months ago

I also think thence, perfectly indited post! .

Watch NBA Online
2 months ago

I like this internet site because so much useful stuff on here : D.

Live NCAAF Stream
2 months ago

Dead written content, Really enjoyed reading through.

Live Mixed Martial Arts Fights

Hello my family member! I want to say that this article is amazing, nice written and come with almost all vital infos. I would like to see more posts like this .

Watch Football Online
2 months ago

I think this is one of the most vital info for me. And i’m glad reading your article. But want to remark on few general things, The website style is wonderful, the articles is really nice : D. Good job, cheers

Baseball Live Streams
2 months ago

I am usually to blogging and i actually respect your content. The article has really peaks my interest. I’m going to bookmark your site and keep checking for brand spanking new information.

NBA schedule Qatar
2 months ago

Awesome blog you have here but I was curious if you knew of any discussion boards that cover the same topics talked about in this article? I’d really love to be a part of online community where I can get advice from other knowledgeable individuals that share the same interest. If you have any suggestions, please let me know. Thanks!

atendimento ao cliente
2 months ago

My coder is trying to convince me to move to .net from PHP. I have always disliked the idea because of the costs. But he’s tryiong none the less. I’ve been using Movable-type on numerous websites for about a year and am concerned about switching to another platform. I have heard fantastic things about blogengine.net. Is there a way I can import all my wordpress content into it? Any kind of help would be really appreciated!

Live NFL Streaming
1 month ago

I’m still learning from you, as I’m trying to achieve my goals. I absolutely love reading all that is posted on your site.Keep the aarticles coming. I enjoyed it!

Watch Sports Online
1 month ago

F*ckin¦ awesome things here. I am very happy to look your article. Thank you a lot and i am looking forward to contact you. Will you please drop me a e-mail?

American Football Stream online

I like this site because so much useful stuff on here : D.

American football schedule Qatar

I am extremely impressed together with your writing talents as smartly as with the layout in your weblog. Is this a paid topic or did you customize it your self? Anyway stay up the nice high quality writing, it’s rare to see a nice weblog like this one these days..

web url
1 month ago

Thank you for some other wonderful post. The place else may just anyone get that type of info in such an ideal method of writing? I’ve a presentation next week, and I am on the search for such information.

see post
1 month ago

hello!,I love your writing very so much! proportion we communicate extra approximately your article on AOL? I require a specialist on this house to solve my problem. Maybe that’s you! Having a look ahead to see you.

judi bola
21 days ago

Hello. splendid job. I did not imagine this. This is a great story. Thanks!

Gelatin Trick for Weight Loss

Great line up. We will be linking to this great article on our site. Keep up the good writing.

dmarket
9 days ago

Please let me know if you’re looking for a article author for your weblog. You have some really great articles and I believe I would be a good asset. If you ever want to take some of the load off, I’d really like to write some articles for your blog in exchange for a link back to mine. Please shoot me an e-mail if interested. Kudos!

honey trick
4 days ago

Everything is very open and very clear explanation of issues. was truly information. Your website is very useful. Thanks for sharing.

pink salt trick for weight loss

I like this weblog very much, Its a real nice spot to read and incur info .