paper-reading

Formal Algorithm for Transformers

Introduction Transformers have been tremendously successful in natural language processing tasks and other domains. Many variants have been suggested. Descriptions of transformers are usually graphical, verbal, partial, or incremental. No pseudocode has ever been published for any variant.

Spectral Gnn

Preface To understand the basics of the graph neural network, it can’t circumvent the topic of “Spectral-GNN”. Today, I’d like to deeply explore what the spectral GNN is and how does it works. Table of contents Theory part Basic theory 1 <The Emerging Field of Signal Processing on Graphs> challenges a “classical” signal $f(t)$ has a concept of “translate to the right by 3” to get $f(t-3)$. But for graph signal, it is not clear to say “translate by 3”....

pedestrain-trajectory

Pedestrain trajectory Questions: 1. only pedestrain relation considered? hoe about environment 2. what is the general framework for pedestrain trajectory prediction. SGCN: Sparse Graph Convolution Network for Pedestrain Trajectory Prediction (CVPR2021) SGCN framework superfulous interactions: dense interaction -> one pedestrain is related to all other pedestrains while in fact it is not sparse undirected -> equal interactions for a pair of pedestrains spatial GCN -> sparse directed -> not all pedestrains + not equal interaction temporal GCN -> motion tendency Disentangled Multi-Relational Graph Convolutional Network for Pedestrian Trajectory Prediction (AAAI2021) Use CNN to generalize complex interpersonal relations a graph representation: node as pedestrian, edges correspond to distance challeges only simple social relationship like collision avoidance is aggregated modelling social norms is not suitable for determining the end-points of pedestrians in the last frame (over-avoidance) contributions disentangled multi-scale aggregation to clearly distinguish between relevant pedestrians multi-relational GCN to extract sophisticated social interaction in a scene....

point-cloud-CVPR2022

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection From Point Clouds 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds Multi-Instance Point Cloud Registration by Efficient Correspondence Clustering Contrastive Boundary Learning for Point Cloud Segmentation Lepard: Learning Partial Point Cloud Matching in Rigid and Deformable Scenes CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding Density-Preserving Deep Point Cloud Compression Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients Neural Points: Point Cloud Representation With Neural Fields for Arbitrary Upsampling Not All Points Are Equal: Learning Highly Efficient Point-Based Detectors for 3D LiDAR Point Clouds Equivariant Point Cloud Analysis via Learning Orientations for Message Passing Point Cloud Pre-Training With Natural 3D Structures A Unified Query-Based Paradigm for Point Cloud Understanding REGTR: End-to-End Point Cloud Correspondences With Transformers 3DeformRS: Certifying Spatial Deformations on Point Clouds IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception Surface Reconstruction From Point Clouds by Learning Predictive Context Priors Point2Cyl: Reverse Engineering 3D Objects From Point Clouds to Extrusion Cylinders RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity Prior Deterministic Point Cloud Registration via Novel Transformation Decomposition Surface Representation for Point Clouds 3D-VField: Adversarial Augmentation of Point Clouds for Domain Generalization in 3D Object Detection An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation Why Discard if You Can Recycle?...

graph and machine learning

1 Jure Leskovec, Stanford CS224W: Machine Learning with Graphs, http://cs224w.stanford.edu Why graphs relations of entities Similar data points arbitrary sizes, no spatial index, no reference order representation learning Map nodes to d-dimensional embeddings-> similar nodes in the network are embedded close together Applications of Graph ML different tasks: Node classification Link prediction: knowledge graph completion Graph classification: molecule property prediction Clustering Graph generation Graph evolution: physical simulation Examples: node-level: Protein folding Recommender system: recommend related pins to users by edge level classification subgraph-level: traffic prediction: nodes: road segments, edges: connectivity between nodes-> predict time arrival etc graph-level: drug discovery: nodes: atoms, edges: chemical bonds....

visual text review

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Abstract Focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare results with the SOTA methods. Introduction Multimodal learning models: generate comprehensible but concise and grammatically well-formed descriptions of the visual content, or vice versa by generating the visual content for a given textual description in a natural language of choice, identify objets in the visual content and infer their relationships to reson about, or answer arbitrary questions about them, navigate through and environment by leveraging input from both vision and natural language instructions, translate textual content from one language to another while leveraging the visual content for sense disambiguation, generate stories about the visual content, and so on....

Logical Syntax

Content Background Preface Introduction What is logical syntax Language as calculi THE DEFINITE LANGUAGE 1 Rules of formation for language 1 Predicates and functors Syntactical gothic symbols The junction symbols Content Background What is syntax In logic, syntax is anything having to do with formal languages or formal systems without regard to any interpretation or meaning given to them. Syntax is concerned with the rules used for constructing, or transforming the symbols and words of a language, as contrasted with the semantics of a language which is concerned with its meaning...

Paper summary

2022/3/31 ~ 2022/4/6 Order-Embeddings of images and Language Core idea Explicitly modeling the partial order structure of the hierarchy over language and image -> Visual sematic hierarchy How to do it Penalize order violations $$ E(x,y) = ||max(0,y-x)||^2 $$ where $E(x,y)=0 \Leftrightarrow x \preceq y$ Modeling heterogeneous hierarchies with relation-specific hyperbolic cones Core idea Embeds entities into hyperbolic cones & models relations as transformations between the cones How to do it Poincare entailment cone at apex $x$ $$\zeta_x = {y\in \Beta^d | \angle_xy\leq\sin^{-1}(K\frac{1-||x||^2}{||x||})}$$ embed entity: $h=(h_1,h_2,\cdots,h_{d})$, where $h_i\in\Beta^2$ is the apex of the $i-$th 2D hyperbolic cone....