Strong data quality checks reduce bias, drift and inconsistencies that can distort analytics and AI outcomes before datasets ...
QVAC launches Genesis II, expanding the world’s largest synthetic AI dataset to 148B tokens and 19 domains for better ...
Researchers created a nationally consistent, publicly available dataset estimating car-based driving times and distances to ...
China is accelerating efforts to replace Europe’s ERA5 weather dataset with a domestic alternative built for AI forecasting.
The dataset is built from 10 real-world simulated environments in the RealMan Beijing Humanoid Robot Data Training Center.
Facebook has built a dataset of thousands of "hateful memes" for researchers. They will be able to use this dataset to learn how to identify online hate speech to better protect against it. The ...
Apple has released Pico-Banana-400K, a highly curated 400,000-image research dataset which, interestingly, was built using Google’s Gemini-2.5 models. Here are the details. Apple’s research team has ...
Data collected under the Death in Custody Reporting Act has some serious problems. Here’s how we fixed some of them.
The new dataset, published in Earth System Science Data by 16 scientists, shows a significantly cooler Earth from the late ...