Unsupervised AI for Big-Data Analytics

SmartTensors is a groundbreaking, unsupervised Artificial Intelligence (AI) methodology and software suite for latent feature discovery and predictions in big data.

The SmartTensors AI Platform is a patented, scalable, unsupervised machine learning software suite capable of identifying, extracting essential hidden features, and efficiently compressing information in massive datasets. SmartTensors autonomously analyzes and discovers hidden features, signatures, and patterns otherwise undetectable and buried in tens of terabytes of data.

The only tensor codes to:

  • identify and extract latent features in very large data sets
  • offer explainable machine learning
  • make informative, robust predictions
  • determine dependencies automatically

Applications

Large-scale Text Mining

Semantic topic modeling, topic evolution, scientific knowladge graph generation with human in the loop procedure, and scientific leadership identification and characterization.

High Performance Computing

Exascale data analytics, dimension reduction, hidden feature extraction, and efficent and scalable algorithms in emerging computing architectures.

Computer Security

Anomaly detection, user-behavior analysis, malware analysis, and novel threat discovery.

Applied Mathematics

Ultra-fast solving extra-large partial differential equations, high-dimensional integrals, and integro-differential equations.

Dynamic Networks and Ranking

Detection of latent communities in directed and undirected graphs and networks, ranking of latent research communities hidden in temporal multilayer networks.

Biology

Latent patterns in genomics, transcriptomics, metabolomics, proteomics, and cell membranes.

Material Science

Analysis of combinatorial material libraries based on their: X-ray, Hyperspectral X-ray Imaging, Raman fluresence and other spectra.

Medicine

Latent patterns in medical research.

Chemistry

Discoring new chemical pathways and reactions, radioisotope characterization, phase seperation analysis in complex liquids, and co-polymers.

Data Compression

Compression of large images and videos (e.g. asteroid water impacts), scientific computer-generated data, and more.

Climate

Ice and water masses trainsient patterns, micro-climate patterns.

Relational Databases

Boolean factorization analysis of categorical patterns

Privacy

Data privacy with federated learning, and recommender systems.

Economy

Macro-economy analyses, and marketing.

Agriculture

Estimating the role of water, salt, and fertilizer content on the yield.

Software Packages

SmartTensors AI delivers a comprehensive suite of software solutions tailored for in-depth analysis of vast datasets, accurate and precise extraction of hidden patterns, harnessing the power of high-performance computing and cutting-edge GPU architectures. Our approach is underpinned by scalable and highly efficient algorithms. We provide array of libraries targeted to diverse set of problems including data compression, computer security, and pattern analysis.

 

  • T-ELF

    Tensor Extraction of Latent Features (T-ELF) is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets.

  • pyCP-APR

    pyCP-APR is a Python library for tensor decomposition and anomaly detection that is developed as part of the R&D 100 award wining SmartTensors project. It is designed for the fast analysis of large datasets by accelerating computation speed using GPUs.

    3D scatter plot showing user, source, and destination with background traffic and anomalies.
  • pyDNMFk

    pyDNMFk is a software package for applying non-negative matrix factorization in a distributed fashion to large datasets. It has the ability to minimize the difference between reconstructed data and the original data through various norms (Frobenious, KL-divergence).

  • AdversarialTensors

    Tensors-based framework for adversarial robustness. Library implements a variety of tensor factorization methods for defending Artificeal intelligence (AI) models against adversarial attacks.

    Adtensors (black box adversarial training) input image denoised, attacked, and classified to fool a victim model.
  • pyDNTNK

    pyDNTNK is a software package for applying non-negative Hierarchical Tensor decompositions such as Tensor train and Hierarchical Tucker decompositons in a distributed fashion to large datasets. It is built on top of pyDNMFk.

  • cuda-pyDNMFk

    Cuda Python Distributed Non Negative Matrix Factorization with determination of hidden features. cuda-pyDNMFk is a dynamic software platform tailored for the decomposition of large datasets that surpass the limitations of in-memory processing.

    Cuda-pyDNMFk capabilities: visualizing asteroid impact data, extracting key frames, and topic modeling from security publications.
  • pyDRESCALk

    pyDRESCALk is a software package for applying non-negative RESCAL decomposition in a distributed fashion to large datasets. It can be utilized for decomposing relational datasets.

  • pyHNMFk

    The identification of sources of advection-diffusion transport is based usually on solving complex ill-posed inverse models against the available state-variable data records. pyHNMFk synergistically performs decomposition of the recorded mixtures, finds the number of the unknown sources and uses the Green's function of advection-diffusion equation to identify their characteristics.

    pyHNMFk reconstruction error percentage and silhouette score versus the number of sources.
  • pyQBTNs

    pyQBTNs is a Python library for boolean matrix and tensor factorization using D-Wave quantum annealers.

Revolutionizing Industries with SmartTensors AI:

A Multifaceted Approach to Complex Challenges

Second Image Tensor Networks

The SmartTensors AI Platform stands as a testament to innovation, offering a patented, scalable, and unsupervised machine learning software suite that transcends boundaries. Our platform doesn’t just provide a single solution; it offers a diverse set of capabilities that span a wide range of interdisciplinary fields, thereby advancing science and tackling some of the most challenging problems of our time.

Scientific Leadership Identification and Characterization

Slic Penta Fig
  • Performs robust unsupervised learning (it does not require training or labeled data) of arbitrary text corpus and extract the topics and subtopics in a hierarchical manner, while considering the semantic of the text.
  • Uses a LANL patent to determine the number of topics, which is vital for explainability.
  • Is an HPC tool that can analyze exascale data (sparse or dense) with a unique scaling on heterogenous CPU/GPUs clusters.
  • Can build unique and specific corpora and knowledge graphs through SMEs interactions human on the loop).
  • Can rank authors/institutions (based on their research on a specific topic), using their network interactions - e.g., co-authoring, co-citations data, etc.
  • Can determine the roles of the authors, such as, brain, working bee, mediator, and others, based on graph centrality.
  • Can build and analyze scientific ecosystems of a) country, or b) institution, or c) group of authors.
  • Can build a temporal, topic specific, authors profile, which includes their social scientific interactions (such as, citation and co-authors networks), as well as the evolution of their professional affiliations.
  • Can determine changes/evolution of a specific technology trend of interest, related to a country, or institution, or a group of authors.

Publications:

High Performance Computing

Kg Rescal

Publications

Partial Differential Equations

With the boom in the development of computer hardware and software, social media, IoT platforms, and communications, there has been exponential growth in the volume of data produced worldwide. We provide scalable and efficent high performance computing (HPC) solutions for dimensionality reduction on emerging architectures. Our HPC solutions has been demonstrated a record-breaking 350 Terabytes (TB) dense and 10 Exabytes (EB) sparse synthetic datasets.

Publications

Cyber Security

Machine Learning (ML) holds a pivotal position in the realm of cyber defense, particularly given the expanding scale of networks, the proliferation of software and malware, and the deluge of data they generate. One of the paramount challenges faced by cyber defenders is the ability to differentiate between malicious anomalies and benign yet uncommon activities. This task has taken on heightened significance as the attack surfaces within large enterprise networks continue to expand. In this context, anomaly detection systems grounded in statistical and large-scale analysis/modeling of user and device behavior have emerged as indispensable tools for identifying and mitigating malicious activities.

Publications

Media Coverage

R&D 100 winner of the day: SmartTensors AI Platform image

R&D 100 winner of the day: SmartTensors AI Platform

SmartTensors autonomously analyzes and discovers hidden features, signatures and patterns otherwise undetectable and buried in tens of terabytes of data.

Read More
Using AI to develop enhanced cybersecurity measures image

Using AI to develop enhanced cybersecurity measures

New research helps identify an unprecedented number of malware families

Read More
Not too big: Machine learning tames huge datasets image

Not too big: Machine learning tames huge datasets

Using the Summit supercomputer, Los Alamos algorithm breaks the exabyte barrier

Read More
Tensor network approach achieves record yottabyte compression solving neutron transport equations image

Tensor network approach achieves record yottabyte compression solving neutron transport equations

Innovative method solves gigantic partial differential equations with artificial intelligence methods

Read More

PERSONNEL

Boian Alexandrov

Boian Alexandrov is a senior scientist at the Theoretical Division in Los Alamos National Laboratory. He has MS in Theoretical Physics, a PhD in Nuclear Engineering and second PhD in Computational Biophysics. Alexandrov is specialized in Big Data analytics, Non-negative Matrix and Tensor Factorization, Unsupervised Learning,and Latent Feature Extraction.

Kim Rasmussen

Kim Rasmussen holds M.S. and Ph.D. degrees from the Technical University of Denmark, in Electric Engineering and Applied Mathematics, respectively. Kim Rasmussen has been at Los Alamos National Laboratory for 25 years, during which time he has been an active researcher (~ 200 publications, > 5,800 citation, forming h-index 42, Google Scholar), in addition to having held various management positions. Rasmussen’s main research is in Applied Mathematics and his research ranges from numerical analysis of nonlinear and stochastic problems, through condensed matter and bio-physics to most recently datas cience, when he was the Co-PI of the Tensor Networks for Big-Data Analytics LDRD DR 2019-2021 and was instrumental in the creation of SmartTensors AI platform awarded with 2021 R&D100 in IT and with a bronze medal in 2021 R&D100 Market Disruptor.

Hristo Djidjev

Hristo Djidjev is a computer scientist in the Information Sciences (CCS-3) group at Los Alamos National Laboratory (LANL). Before joining LANL as a scientist, Hristo worked as an Assistant Professor at Rice University, and as a Senior Lecturer in Warwick University. He is currently a Research Adjunct Professor at Carleton University, Ottawa, Canada. Hristo holds an MSc in applied mathematics and a PhD in computer science from Sofia University, Bulgaria.

Erik Skau

Erik Skau received the B.Sc. degree in applied mathematics and physics, and the M.Sc. and Ph.D. degrees in applied mathematics from North Carolina State University, Raleigh, NC, USA. His research expertise includes optimization techniques for matrix and tensor decompositions. Erik is a scientist in the Information Sciences Group at Los Alamos National Laboratory.

Ben Nebgen

Benjamin Nebgen received the B.A. degree in Chemistry from Cornell University, Ithaca, NY, USA and Ph.D. degrees in Chemistry from Purdue University, West Lafayette, IN, USA. He previously had Post doctoral appointments at the University of Southern California: Los Angeles, CA, USA, and Theoretical division at Los Alamos National Laboratory (LANL). He is currently a scientist in the Theoretical division at the Laboratory. His research expertise includes Quantum Chemistry and optimization techniques for matrix and tensor decompositions.

Raviteja Vangara

Raviteja Vangara holds both Ph.D. and M.S. degrees from the University of New Mexico. He is currently employed as a postdoctoral researcher in the Department of Cellular and Molecular Medicine at the University of California, San Diego. His research is primarily dedicated to developing diverse machine learning and deep learning techniques for various scientific applications.

Manish Bhattarai

Manish Bhattarai is a staff scientist at Los Alamos National Laboratory in Los Alamos, NM, where he is a integral member of the tensor factorizations group within the Theoretical division. At the Laboratory, his specialization lies in large-scale data factorization, playing a pivotal role in enhancing the laboratory’s capabilities in high-performance processing and computing. Dr. Bhattarai has made significant contributions by developing high-performance computing (HPC) empowered machine learning frameworks, notably pyDNMFk, pyDNTNk, and pyDRESCALk, tailored for mining extensive data through distributed Matrix and Tensor factorization. Currently, his research portfolio encompasses areas such as adversarial machine learning, generative AI, tensor factorizations, and the broader realm of high-performance computing.

Duc Truong

Duc Truong is a staff scientist in Los Alamos National Laboratory's Theoretical Division. He received his Ph.D. in Computational and Applied Mathematics from Southern Methodist University specializing in numerical analysis and computational neuroscience. At the Laboratory, his research focuses on developing tensor factorization algorithms and engineering machine learning techniques with applications in high dimensional scientific simulations, for solving ultra-large PDEs. Duc was instrumental in the creation of SmartTensors AI platform awarded with 2021 R&D100 in IT and with a bronze medal in 2021 R&D100 Market Disruptor.

Derek DeSantis

Derek DeSantis received his PhD in mathematics from the University of Nebraska Lincoln. He is currently a staff scientist in the Computational, Computer and Statistical Science division. He works on the mathematical theory of machine learning, specifically tensor factorizations, with applications broadly within the climate sciences.

Gianmarco Manzini

G. Manzini is a scientist with 30 years of experience in designing, developing and implementing numerical methods for PDEs, with special focus on the Mimetic Finite Difference (MFD) method and the Virtual Element Method (VEM). He coauthored 120+ journal papers and two books. He is now working on the application of tensor network-based methods to high-dimensional PDEs’ numerical approximations.

Hristo Djidjev

Hristo Djidjev is a computer scientist in the Information Sciences group at Los Alamos National Laboratory. Before joining the Laboratoy as a scientist, Hristo worked as an Assistant Professor at Rice University, and as a Senior Lecturer in Warwick University. He is currently a Research Adjunct Professor at Carleton University, Ottawa, Canada. Hristo holds an MSc in applied mathematics and a PhD in computer science from Sofia University, Bulgaria.

Maksim Eren

Maksim E. Eren is an early career scientist in Los Alamos National Laboratory’s Advance Research in Cyber Systems division. His interdisciplinary research interests lie at the intersection of machine learning and cybersecurity, with a concentration in tensor decomposition. His tensor decomposition-based research projects include large-scale malware detection and characterization, cyber anomaly detection, data privacy, text mining, and high performance computing.

Tom Tierney

Tom Tierney is a Scientist-5 and Team Leader at Los Alamos National Laboratory who focuses on bringing together the best and brightest to tackle some of the nation’s toughest global security problems. He graduated with his Ph.D. in plasma physics in 2002 from University of California, Irvine using research he performed at the Laboratory. Tom’s expertise is in analysis of emerging technologies and net assessments of scientific competitiveness; leveraging his wide expertise in various physics fields including pulsed power plasmas, inertial confinement fusion, radiation transport, dynamic materials sciences, scientific computing, and nuclear weapons sciences.

Ismael Bourema

Ismael Boureima is a scientist at the Theoretical division at Los Alamos National Laboratory, and his research interest is in turbulence modeling, physics informed machine learning, tensor methods, and distributed HPC algorithms.

Nicholas Solovyev

Nicholas Solovyev received his M.S. in Computer Science & Systems from the University of Washington, Tacoma. His research areas include natural language processing, topic modeling, and applications of non-negative tensor factorization.

Ryan Barron

Ryan is a current Ph.D. student at UMBC focusing on Natural Language Processing (NLP) applications to robotics and big data through and High-Performance Computing (HPC). Initially, his research started in the cross-disciplinary area of grounded language robotics incorporating NLP. Ryan’s current research interests encompass leveraging HPC methodologies to solve intricate NLP challenges, aiming to enable faster, more accurate language processing tasks on a large scale.

Vesselin Grantcharov

Vesselin Grantcharov is a Machine Learning engineer at Fitch Ratings and will soon start his PhD at University of New Mexico. As an engineer he has primarily focused on NLP and Information Retrieval systems. He holds a BS in Discrete Mathematics from Georgia Tech, where he performed undergraduate research in Additive Combinatorics. He also developed software for Non-negative Matrix Factorization during an undergraduate internship in LANL’s Theoretical Division.

James Ahrens
Gopinath Chennupati
Namita Karat
Dan O’Malley
John Patchett
Elijah Pelofske
Jesus Pulido
Lakshman Prasad

Publications