The convention standard for object detection uses a bounding box to represent each individual object instance.
However, it is not practical in the industry-relevant applications in the context of warehouses due to severe occlusions among groups of instances of the same categories.
For example, as shown in [Fig. 1(g)](https://isrc.iscas.ac.cn/gitlab/research/locount-dataset/-/tree/master/Images/dataset-comparison.jpg), it is extremely difficult to annotate the stacked dinner plates even by a well-trained annotator.
For example, as shown in Fig. 1(g), it is extremely difficult to annotate the stacked dinner plates even by a well-trained annotator.
Meanwhile, it is almost impossible for object detectors to detect all stacked dinner plates accurately, even for the state-of-the-art detectors.
Thus, it is necessary to rethink the definition of object detection in such scenarios.
@@ -25,7 +25,7 @@ To solve the above issues, we collect a large-scale object localization and coun
More than 1.9 million object instances in 140 categories (including *Jacket*, *Shoes*, *Oven*, etc.) are annotated.

indicates the single-class object detection task, and “M” indicates the multi-class object detection task.](Images/dataset-summary.png)
To facilitate data usage, we divide the dataset into two subsets, i.e., *training* and *testing* sets, including 34,022 images for training and 16,372 images for testing.
The dataset includes 9 big subclasses, i.e., Baby Stuffs (e.g., *Baby Diapers* and *Baby Slippers*), Drinks (e.g., *Juice* and *Ginger Tea*), Food Stuff (e.g., *Dried Fish* and *Cake*), Daily Chemicals (e.g., *Soap* and *Shampoo*), Clothing (e.g., *Jacket* and *Adult hats*),
@@ -53,7 +53,7 @@ instances enclosed in the bounding box, where N is the total number of stages. F
The counting accuracy threshold for the positive/negative sample generation is determined by the architecture design of CLCNet, which is described as follows.

The numbers in the brackets indicate the range of counting number in each stage.](Images/framework.png)
We use the same architecture and configuration as Cascade R-CNN for the box-regression and box-classification layers. For the instance counting layer,
a direct strategy is to use a FC layer to regress a floating point number, indicating the number of instances, called *count-regression strategy*.
@@ -68,9 +68,9 @@ Thus, the counting task is formulated as the multi-class classification task, wh
We conduct several experiments of the state-of-the-art object detectors and the proposed CLCNet method on the proposed dataset, to demonstrate the effectiveness of CLCNet, Table 2 and Fig. 4.

proposed dataset. The mark lc on the upper right corner indicates that its value is computed by the proposed metrics](Images/Experiment-results.png)


## Citation
If you find this dataset useful for your research, please cite