Table 1: Summary of existing object detection benchmarks in retail stores. “C” indicates the image classification task, “S”
indicates the single-class object detection task, and “M” indicates the multi-class object detection task.
To facilitate data usage, we divide the dataset into two subsets, i.e., *training* and *testing* sets, including 34,022 images for training and 16,372 images for testing.
The dataset includes 9 big subclasses, i.e., Baby Stuffs (e.g., *Baby Diapers* and *Baby Slippers*), Drinks (e.g., *Juice* and *Ginger Tea*), Food Stuff (e.g., *Dried Fish* and *Cake*), Daily Chemicals (e.g., *Soap* and *Shampoo*), Clothing (e.g., *Jacket* and *Adult hats*),
Electrical Appliances (e.g., *Microwave Oven* and *Socket*), Storage Appliances (e.g., *Trash* and *Stool*), Kitchen Utensils (e.g., *Forks* and *Food Box*), and Stationery and Sporting Goods (e.g., *Skate* and *Notebook*).
There are various factors challenging the performance of algorithms, such as scale changes, illumination variations, occlusion, similar appearance, clutter background, blurring and deformation, *etc*.