mirror of
https://github.com/liudaizong/Awesome-LVLM-Attack.git
synced 2026-06-07 08:13:53 +02:00
update 5.31.2024
This commit is contained in:
@@ -1,10 +1,10 @@
|
||||
# Awesome-LVLM-Attack [](https://github.com/sindresorhus/awesome)
|
||||
A **continual** collection of papers related to Text-guided 3D Visual Grounding (T-3DVG).
|
||||
A **continual** collection of papers related to Attacks on Large-Vision-Language-Models (LVLMs).
|
||||
|
||||
Text-guided 3D visual grounding (T-3DVG) aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years.
|
||||
T-3DVG presents great potential and challenges due to its closer proximity to the real world and the complexity of data collection and 3D point cloud source processing.
|
||||
Large vision-language models (LVLMs) have achieved significant success and demonstrated promising capabilities in various multimodal downstream tasks.
|
||||
Despite their remarkable capabilities, the increased complexity and deployment of LVLMs have also exposed them to various security threats and vulnerabilities, making the study of attacks on these models a critical area of research.
|
||||
|
||||
In the T-3DVG community, we've summarized existing T-3DVG methods in our survey paper👍.
|
||||
Here, we've summarized existing LVLM Attack methods in our survey paper👍.
|
||||
|
||||
|
||||
> If you find some important work missed, it would be super helpful to let me know (`dzliu@stu.pku.edu.cn`). Thanks!
|
||||
@@ -12,25 +12,33 @@ In the T-3DVG community, we've summarized existing T-3DVG methods in our survey
|
||||
> If you find our survey useful for your research, please consider citing:
|
||||
|
||||
```
|
||||
@article{liu2024survey,
|
||||
title={A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions},
|
||||
author={Liu, Daizong and Liu, Yang and Huang, Wencan and Hu, Wei},
|
||||
@article{liu2024attack,
|
||||
title={A Survey of Attacks on Large Vision-Language Models: Resources, Recent Advances, and Future Trends},
|
||||
author={Liu, Daizong and Yang, Mingyu and Qu, Xiaoye and Hu, Wei},
|
||||
journal={arXiv preprint arXiv},
|
||||
year={2024}
|
||||
}
|
||||
```
|
||||
|
||||
**Table of Contents**
|
||||
- [Fully-supervised two-stage approach](#Fully-Supervised-Two-Stage)
|
||||
- [Fully-supervised one-stage approach](#Fully-Supervised-One-Stage)
|
||||
- [Weakly-supervised approach](#Weakly-supervised)
|
||||
- [Approaches using No Point Cloud Input](#Other-Modality)
|
||||
- [Approaches using LLMs](#LLMs-based)
|
||||
- [Adversarial Attacks](#Adversarial-Attack)
|
||||
- [Jailbreack Attacks](#Jailbreack-Attack)
|
||||
- [Prompt Injection](#Prompt-Injection)
|
||||
- [Data Poisoning](#Data-Poisoning)
|
||||
---
|
||||
|
||||
## Fully-Supervised-Two-Stage
|
||||
## Adversarial-Attack
|
||||
* **ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language** | [Github](https://github.com/daveredrum/ScanRefer)
|
||||
* Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner
|
||||
* Technical University of Munich, Simon Fraser University
|
||||
* [ECCV2020] https://arxiv.org/abs/1912.08830
|
||||
* A dataset, two-stage approach, proposal-then-selection
|
||||
|
||||
|
||||
## Jailbreak-Attack
|
||||
|
||||
|
||||
## Prompt-Injection
|
||||
|
||||
|
||||
## Data-Poisoning
|
||||
|
||||
Reference in New Issue
Block a user