Dummy thoughts
Use Gaussian mask as GT mask for training mask-rcnn on wider face, the segmentation branch works as an auxiliary loss.
Mask RCNN segmentation branch incorporating with low level features.
Use external memory to process each video. The memory is established online and for each video, we access the memory as well as current frame for doing either detection / segmentation / tracking.
Semantic segmentation + embedding -> panoramic segmentation
Supervise SPP module with hand coded GT
Supervise Spatial Attention with hand coded GT
ShuffleNet V2 x0.5 is good, but not x1.0. Identically copy x0.5 model as initial for x 1.0.
Like what BERT did for pre-training. Do unsupervised pre-training on masked out images to get a better performance feature extractor than training on ImageNet.
collect more than 1 million images
random masked out and add noise in the rest of pixels
train an encoder-decoder images filling hole network. e.g. GAN for in-painting.
Use the encoder part of GAN as pure-backbone.
Then fine-tune this backbone on other any CV tasks, classification, detection, segmentation etc.
Samsung's one shot deep fake
Some Beijing university student use DDPG to generate paintings stroke by stroke.
Distillation from [ high performance models' Grad-Cam results ] to [ the supervised spatial attentions of efficient structures ].
Applications
User crawler to get music lyrics and apply Chinese (NetEase) / English (Spotify) BERT on them. Do clustering and music genre recommendation.
Could be personalized.
Input your favorite list, cluster on it and recommend according to the nearest distance to cluster centers.
Could generate offline
Create content categories offline [Life, love, depression, happiness, etc.]
Using BERT to do paper content embedding. Then generate the similarity between different papers and check with the reference relationship.
Use BERT to do paper summarization. Abstract is the summarized info and the paper it self can be treated as the full text.
Last updated