@@ -27,15 +27,17 @@ the bounding boxes. Different colors denotes different object categories.</p>
To solve the above issues, we collect a large-scale object localization and counting dataset at 28 different stores and apartments, which consists of 50,394 images with the JPEG image resolution of 1920x1080 pixels.
More than 1.9 million object instances in 140 categories (including *Jacket*, *Shoes*, *Oven*, etc.) are annotated.

Table 1: Summary of existing object detection benchmarks in retail stores. “C” indicates the image classification task, “S”
indicates the single-class object detection task, and “M” indicates the multi-class object detection task.
To facilitate data usage, we divide the dataset into two subsets, i.e., *training* and *testing* sets, including 34,022 images for training and 16,372 images for testing.
The dataset includes 9 big subclasses, i.e., Baby Stuffs (e.g., *Baby Diapers* and *Baby Slippers*), Drinks (e.g., *Juice* and *Ginger Tea*), Food Stuff (e.g., *Dried Fish* and *Cake*), Daily Chemicals (e.g., *Soap* and *Shampoo*), Clothing (e.g., *Jacket* and *Adult hats*),
Electrical Appliances (e.g., *Microwave Oven* and *Socket*), Storage Appliances (e.g., *Trash* and *Stool*), Kitchen Utensils (e.g., *Forks* and *Food Box*), and Stationery and Sporting Goods (e.g., *Skate* and *Notebook*).
There are various factors challenging the performance of algorithms, such as scale changes, illumination variations, occlusion, similar appearance, clutter background, blurring and deformation, *etc*.
<palign="center">Figure 2: Category hierarchy of the large-scale localization and counting dataset in
the shelf scenarios.</p>
@@ -73,12 +75,13 @@ Thus, the counting task is formulated as the multi-class classification task, wh
We conduct several experiments of the state-of-the-art object detectors and the proposed CLCNet method on the proposed dataset, to demonstrate the effectiveness of CLCNet, Table 2 and Fig. 4.
