Figure 1: The previous object recognition datasets in grocery stores have focused on image classification, i.e., (a) Supermarket
<palign="center">Figure 1: The previous object recognition datasets in grocery stores have focused on image classification, i.e., (a) Supermarket
Produce (Rocha et al. 2010) and (b) Grozi-3.2k (George and Floerkemeier 2014), and object detection, i.e., (c) D2S (Follmann
et al. 2018), (d) Freiburg Groceries (Jund et al. 2016), and (e) Sku110k (Goldman et al. 2019). We introduce the Loccount task,
aiming to localize groups of objects of interest with the numbers of instances, which is natural in grocery store scenarios, shown
in the last row, i.e., (f), (g), (h), (i), and (j). The numbers on the right hand indicate the numbers of object instances enclosed in
the bounding boxes. Different colors denotes different object categories. Best viewed in color and zoom in.
the bounding boxes. Different colors denotes different object categories.</p>
## Locount dataset
To solve the above issues, we collect a large-scale object localization and counting dataset at 28 different stores and apartments, which consists of 50,394 images with the JPEG image resolution of 1920x1080 pixels.
@@ -34,9 +35,9 @@ The dataset includes 9 big subclasses, i.e., Baby Stuffs (e.g., *Baby Diapers* a
Electrical Appliances (e.g., *Microwave Oven* and *Socket*), Storage Appliances (e.g., *Trash* and *Stool*), Kitchen Utensils (e.g., *Forks* and *Food Box*), and Stationery and Sporting Goods (e.g., *Skate* and *Notebook*).
There are various factors challenging the performance of algorithms, such as scale changes, illumination variations, occlusion, similar appearance, clutter background, blurring and deformation, *etc*.
<palign="center">Figure 2: Category hierarchy of the large-scale localization and counting dataset in
the shelf scenarios.</p>
## Evaluation protocol
@@ -55,8 +56,10 @@ multiple stages for localization and counting, i.e., S_{1},..., S_{N} are cascad
instances enclosed in the bounding box, where N is the total number of stages. For more detailed definitions, please refer to the [paper](http://arxiv.org/abs/2003.08230).
The counting accuracy threshold for the positive/negative sample generation is determined by the architecture design of CLCNet, which is described as follows.

<palign="center">Figure 3: The architecture of our CLCNet for the Locount task. The cubes indicate the output feature maps from the convolutional layers or RoIAlign operation.
The numbers in the brackets indicate the range of counting number in each stage.</p>
We use the same architecture and configuration as Cascade R-CNN for the box-regression and box-classification layers. For the instance counting layer,
a direct strategy is to use a FC layer to regress a floating point number, indicating the number of instances, called *count-regression strategy*.
@@ -72,8 +75,10 @@ We conduct several experiments of the state-of-the-art object detectors and the

<palign="center">HelloWorld</p>

<palign="center">HelloWorld</p>
## Citation
If you find this dataset useful for your research, please cite