reconstruction (bcc505b7) · Commits · IntelligentTEAM / ACM MM 2019-ACO（Accepted）

.gitignore

+3 −1

Original line number	Diff line number	Diff line
		@@ -28,3 +28,5 @@ dist/
		# project dirs
		/datasets
		/models

		.DS_Store
		No newline at end of file

MODEL_ZOO.md

+21 −0

Original line number	Diff line number	Diff line
		@@ -33,7 +33,28 @@ backbone \| type \| lr sched \| im / gpu \| train mem(GB) \| train time (s/iter) \| to
		-- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| --
		R-50-FPN \| Keypoint \| 1x \| 2 \| 5.7 \| 0.3771 \| 9.4 \| 0.10941 \| 53.7 \| 64.3 \| 9981060

		### Light-weight Model baselines

		We provided pre-trained models for selected FBNet models.
		* All the models are trained from scratched with BN using the training schedule specified below.
		* Evaluation is performed on a single NVIDIA V100 GPU with `MODEL.RPN.POST_NMS_TOP_N_TEST` set to `200`.

		The following inference time is reported:
		* inference total batch=8: Total inference time including data loading, model inference and pre/post preprocessing using 8 images per batch.
		* inference model batch=8: Model inference time only and using 8 images per batch.
		* inference model batch=1: Model inference time only and using 1 image per batch.
		* inferenee caffe2 batch=1: Model inference time for the model in Caffe2 format using 1 image per batch. The Caffe2 models fused the BN to Conv and purely run on C++/CUDA by using Caffe2 ops for rpn/detection post processing.

		The pre-trained models are available in the link in the model id.

		backbone \| type \| resolution \| lr sched \| im / gpu \| train mem(GB) \| train time (s/iter) \| total train time (hr) \| inference total batch=8 (s/im) \| inference model batch=8 (s/im) \| inference model batch=1 (s/im) \| inference caffe2 batch=1 (s/im) \| box AP \| mask AP \| model id
		-- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| --
		[R-50-C4](configs/e2e_faster_rcnn_R_50_C4_1x.yaml) (reference) \| Fast \| 800 \| 1x \| 1 \| 5.8 \| 0.4036 \| 20.2 \| 0.0875 \| 0.0793 \| 0.0831 \| 0.0625 \| 34.4 \| - \| f35857197
		[fbnet_chamv1a](configs/e2e_faster_rcnn_fbnet_chamv1a_600.yaml) \| Fast \| 600 \| 0.75x \| 12 \| 13.6 \| 0.5444 \| 20.5 \| 0.0315 \| 0.0260 \| 0.0376 \| 0.0188 \| 33.5 \| - \| [f100940543](https://download.pytorch.org/models/maskrcnn/e2e_faster_rcnn_fbnet_chamv1a_600.pth)
		[fbnet_default](configs/e2e_faster_rcnn_fbnet_600.yaml) \| Fast \| 600 \| 0.5x \| 16 \| 11.1 \| 0.4872 \| 12.5 \| 0.0316 \| 0.0250 \| 0.0297 \| 0.0130 \| 28.2 \| - \| [f101086388](https://download.pytorch.org/models/maskrcnn/e2e_faster_rcnn_fbnet_600.pth)
		[R-50-C4](configs/e2e_mask_rcnn_R_50_C4_1x.yaml) (reference) \| Mask \| 800 \| 1x \| 1 \| 5.8 \| 0.452 \| 22.6 \| 0.0918 \| 0.0848 \| 0.0844 \| - \| 35.2 \| 31.0 \| f35858791
		[fbnet_xirb16d](configs/e2e_mask_rcnn_fbnet_xirb16d_dsmask_600.yaml) \| Mask \| 600 \| 0.5x \| 16 \| 13.4 \| 1.1732 \| 29 \| 0.0386 \| 0.0319 \| 0.0356 \| - \| 30.7 \| 26.9 \| [f101086394](https://download.pytorch.org/models/maskrcnn/e2e_mask_rcnn_fbnet_xirb16d_dsmask.pth)
		[fbnet_default](configs/e2e_mask_rcnn_fbnet_600.yaml) \| Mask \| 600 \| 0.5x \| 16 \| 13.0 \| 0.9036 \| 23.0 \| 0.0327 \| 0.0269 \| 0.0385 \| - \| 29.0 \| 26.1 \| [f101086385](https://download.pytorch.org/models/maskrcnn/e2e_mask_rcnn_fbnet_600.pth)

		## Comparison with Detectron and mmdetection

README.md

+69 −7

Original line number	Diff line number	Diff line
		Data Priming Network for Automatic Check-Out
		-----------------
		# Data Priming Network for Automatic Check-Out

		Introduction
		-----------------
		@@ -18,14 +17,71 @@ visual item tallying network.

		![DPNet](demo/DPNet.png)

		## Installation

		Install
		-----------------
		Check [INSTALL.md](INSTALL.md) for installation instructions.

		## Inference

		Results
		-----------------
		Run inference with pre-trained models using this command. Then images with boxes, labels and scores will
		be saved to `rpc_results` folder.

		```bash
		python demo/rpc_demo.py --config-file configs/e2e_faster_rcnn_R_101_FPN_1x_rpc_xxx.yaml --images_dir /path/to/test2019
		```

		## Prepare dataset

		Using `toolboxes` to extract masks, train [Salient Object Detection](https://github.com/AceCoooool/DSS-pytorch)
		and render with [CycleGAN](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix). Then modify `maskrcnn_benchmark/config/paths_catalog.py`
		to make the paths correct.

		## Single GPU training

		Most of the configuration files that we provide assume that we are running on 4 GPUs.
		In order to be able to run it on fewer GPUs, there are a few possibilities:

		1. Run the following without modifications

		```bash
		python tools/train_net.py --config-file "/path/to/config/file.yaml"
		```
		This should work out of the box and is very similar to what we should do for multi-GPU training.
		But the drawback is that it will use much more GPU memory. The reason is that we set in the
		configuration files a global batch size that is divided over the number of GPUs. So if we only
		have a single GPU, this means that the batch size for that GPU will be 8x larger, which might lead
		to out-of-memory errors.

		If you have a lot of memory available, this is the easiest solution.

		2. Modify the cfg parameters

		If you experience out-of-memory errors, you can reduce the global batch size. But this means that
		you'll also need to change the learning rate, the number of iterations and the learning rate schedule.

		Here is an example for Mask R-CNN R-50 FPN with the 1x schedule:
		```bash
		python tools/train_net.py --config-file "configs/e2e_mask_rcnn_R_50_FPN_1x.yaml" SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025 SOLVER.MAX_ITER 720000 SOLVER.STEPS "(480000, 640000)" TEST.IMS_PER_BATCH 1
		```
		This follows the [scheduling rules from Detectron.](https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14-L30)
		Note that we have multiplied the number of iterations by 8x (as well as the learning rate schedules),
		and we have divided the learning rate by 8x.

		We also changed the batch size during testing, but that is generally not necessary because testing
		requires much less memory than training.


		## Multi-GPU training
		We use internally `torch.distributed.launch` in order to launch
		multi-gpu training. This utility function from PyTorch spawns as many
		Python processes as the number of GPUs we want to use, and each Python
		process will only use a single GPU.

		```bash
		export NGPUS=4
		python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_net.py --config-file "path/to/config/file.yaml"
		```
		## Results

		![DPNet](demo/results.png)

		@@ -35,3 +91,9 @@ Results
		\| medium \| Syn+Render (DPNet) \| 80.68% \| 97.38% \| 0.32 \| 0.03 \| 98.07% \| 77.25% \|
		\| hard \| Syn+Render (DPNet) \| 70.76% \| 97.04% \| 0.53 \| 0.03 \| 97.76% \| 74.95% \|
		\| averaged \| Syn+Render (DPNet) \| 80.51% \| 97.33% \| 0.34 \| 0.03 \| 97.91% \| 77.04% \|

		## Citations
		Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the `url` LaTeX package.
		```
		TODO
		```
		No newline at end of file

configs/e2e_faster_rcnn_R_101_FPN_1x_rpc_render.yaml

+2 −3

Original line number	Diff line number	Diff line
		@@ -22,11 +22,10 @@ MODEL:
		PREDICTOR: "FPNPredictor"
		NUM_CLASSES: 201
		DATASETS:
		TRAIN: ("rpc_2019_train_rendered",)
		TRAIN: ("rpc_2019_train_render",)
		TEST: ("rpc_2019_val",)
		DATALOADER:
		SIZE_DIVISIBILITY: 32
		ASPECT_RATIO_GROUPING: False
		SOLVER:
		BASE_LR: 0.01
		WEIGHT_DECAY: 0.0001
		@@ -36,4 +35,4 @@ SOLVER:
		TEST:
		IMS_PER_BATCH: 4

		OUTPUT_DIR: 'outputs_v9_bag_like_only_back_front'
		No newline at end of file
		OUTPUT_DIR: 'outputs_rpc_2019_train_render'
		No newline at end of file

configs/e2e_faster_rcnn_R_101_FPN_1x_rpc_render_density.yaml→configs/e2e_faster_rcnn_R_101_FPN_1x_rpc_render_density_map.yaml

+3 −3

Original line number	Diff line number	Diff line
		@@ -21,12 +21,12 @@ MODEL:
		FEATURE_EXTRACTOR: "FPN2MLPFeatureExtractor"
		PREDICTOR: "FPNPredictor"
		NUM_CLASSES: 201
		DENSITY_ON: True
		DATASETS:
		TRAIN: ("rpc_2019_train_with_density_rendered",)
		TRAIN: ("rpc_2019_train_render_density_map",)
		TEST: ("rpc_2019_val",)
		DATALOADER:
		SIZE_DIVISIBILITY: 32
		ASPECT_RATIO_GROUPING: False
		SOLVER:
		BASE_LR: 0.01
		WEIGHT_DECAY: 0.0001
		@@ -36,4 +36,4 @@ SOLVER:
		TEST:
		IMS_PER_BATCH: 4

		OUTPUT_DIR: 'outputs_v9_bag_like_only_back_front'
		No newline at end of file
		OUTPUT_DIR: 'outputs_rpc_2019_train_render_density_map'
		No newline at end of file