
经过数轮的行业洗牌以及电商生态的不断升级,数智化转型已然成为服装行业大势所趋。基于阿里云成熟的产品基础,知衣科技In-depth insight into user needs, directly hit the pain points of the industry,It provides a series of full-link digital intelligence solutions for apparel enterprises.


-
Product Layer:Zhiyi currently has a number of APP applications, such as the flagship product知衣Enhance design collaboration美念Wait. In addition, we also provide customized APIs to open data interface services and image search capabilities to third parties. A one-stop apparel supply chain platform from digital selection to mass delivery of finished products is also a core capability output. -
Service Layer:The front-end and back-end systems of related products have been containerized and deployed in Alibaba Cloud's ACK container service cluster -
Data Layer:It mainly saves the original pictures, business data generated by the business system, and OLAP data analysis services
-
Object Storage Service (OSS): saves original images and builds a billion-level style library for the apparel industry -
数据库MySQL:OLTP业务数据 -
HBase: data accessed in KV format, such as product details and offline calculation lists -
Feature Vector Library: The vectors extracted by image recognition are cleaned and stored in the Proxima vector retrieval engine library developed by Ali Damo Academy -
ElasticSearch: statistics and calculations of metrics for point searches and small- and medium-scale data. There are more than 1,000 labels for design elements, and the label dimensions mainly include categories, fabrics, textures, processes, accessories, styles, silhouettes, collar shapes, colors, etc
-
Big data platform
-
日志服务SLS:用于缓存经过图片识别后的海量向量数据。SLS还有一个基于SQL查询的告警能力,就是若向量数据没有进来会触发告警,这对于业务及时发现问题非常有用。 -
Offline data warehouse (DataWorks MaxCompute): You can use Log Service to cache image feature vectors as a data source through DataWorks integration, create a data development task to clean the original feature vectors (such as deduplication) and store them in MaxCompute, and then use DataWorks to write the cleaned vector data to Proxima in ElasticSearch -
Data Mining & Algorithm Recommendation: Some Python tasks deployed in ACK mainly do recommendation-related content, such as user feature embedding calculation, recommendation of style images based on user behavior, recommendation of similarity bloggers, etc -
Image recognition service: At present, the image recognition service is mainly deployed in the IDC computer room, and 5~6 GPU servers perform batch image recognition
知衣的大数据方案也是经过不同的阶段不断的演进,满足我们在成本、效率和技术方面的追求,本质上还是服务于业务需求。
Phase 1: Self-managed CDH cluster in the data center
我们的业务系统一开始就部署在阿里云,同时在IDC机房部署了10台服务器搭建CDH集群,Build a Hive data warehouse.In the computing process, the data from the production environment on the cloud is synchronized to CDH, and the calculation results are transmitted back to Alibaba Cloud to provide data services after the CDH cluster is computed.
Although self-managed CDH clusters save computing costs, they also cause many problems. The most important thing is that O&M is complex and requires professional personnel to manage the O&M of clusters. If there is a problem, it is also searched everywhere on the Internet to troubleshoot the cause, which is relatively inefficient.
Phase 2: DataWorks MaxCompute replaces the CDH cluster

Phase 3: ElasticSearch builds an ad hoc query
Zhiqian focuses on quickly discovering fashion trend inspiration, integrated社交平台、品牌秀场、零售及批发市场、淘系电商、时尚街拍Five major image sources, a large number of design inspiration references, help clothing brands and designers quickly and accuratelyPredict the trend of fashion and grasp the market dynamics.Among them, the trend analysis section needs to statistically analyze the design element labels under various combination conditions in a certain quarter, and output indicators such as up, down and pie charts.This is also the query scenario with the largest amount of data, and the amount of data scanned and analyzed will be close to one million.


我们的核心功能场景是以图搜图,前提是需要对海量的图片库数据进行识别。我们Machine learning analysis of all images in the gallery in an offline way,Each graph is abstracted into a high-dimensional (256-dimensional) feature vector, and then all features are constructed into an efficient vector index with the help of Proxima.

批量图片识别
模型生成以后打包到Docker镜像,然后在GPU节点上运行容器服务就可以对海量的服装图片进行识别,提取出高维的特征向量。因为提取的特征向量数据量很大且需要进行清洗,我们选择将特征向量先缓存在阿里云日志服务SLS,然后通过DataWorks编排的数据开发任务同步SLS的特征向量并进行包含去重在内的清洗操作,最后写入向量检索引擎Proxima。
单次图片识别


Faiss


-
稳定性较差: There are 5~6 distributed GPU clusters, and when a machine is hung up, the response time of the entire interface will be extended, and the performance of the service is that the search service waits for a long time before the result is returned. -
GPU资源不足We use the most basic brute force matching algorithm, and 200 million 256-dimensional feature vectors need to be loaded into the video memory, which puts a lot of pressure on offline GPU resources. -
运维成本高:特征库分片完全手动运维,管理比较繁琐。数据分片分布式部署在多个GPU节点,增量分片数据超过GPU显存,需要手动切片到新的GPU节点。 -
带宽争抢The image recognition service and image search service are both deployed in the offline data center, sharing the 300Mb private line bandwidth from the data center to Alibaba Cloud. -
特定场景下召回结果集不足:因为特征库比较大,我们人工将特征库拆成20个分片部署在多台GPU服务器上,但由于Faiss限制每个分片只能返回1024召回结果集,不满足某些场景的业务需求。
Proxima
-
High stability: Out-of-the-box product and service SLAs are guaranteed by Alibaba Cloud, and the high-availability architecture of multi-node deployment. So far, interface timeouts have been rare -
算法优化:基于图的HNSW算法不需要GPU,且与Proxima集成做了工程优化,性能有很大的提升(1000万条数据召回只需要5毫秒)。目前业务发展特征向量已经增长到3亿。 -
Low O&M costs:分片基于ES引擎,数据量大的情况下直接扩容ElasticSearch计算节点就可以 -
No bandwidth contention: The image search service is directly deployed on the cloud, which does not occupy the bandwidth of the leased line, and there is no timeout query alarm in the image search scenario -
召回结果集满足业务需求:Proxima也是基于segment分片取Top N相似,聚合后再根据标签进行过滤。因为segment较多,能搜索到的数据量就比原先多很多。
OLAP分析场景优化迭代
规范数据建模和数据治理

Advanced cooperation in image search solutions

