Home
Products
Zhi yi Dou yi Zhi kuan Zhi xiao hong
Zhi xiao bu
HOT
Overseas exploration Mei nian Dan furnace Trend report Trendscopes Fashion Diffusion + 知小衣
Technology
Image & Position Detection Clothing Color Recognition Search by Image E-commerce Trend Analysis Fashion Trend Analysis Consumer Trend Analysis
Case Studies
Solutions
Brand Solutions eCommerce Solutions Fabric Solutions Cross border Solutions AI Data Solutions
Activity/Salon
Zhiyi Salon
News
About Us
Book Demo
免费试用
Free Trial
资料领取
Download Resources
预约演示
Book Demo
扫码咨询
Scan for Help
Scan for Help
Zhiyi Technology X Alibaba Cloud Big Data to build the core competitiveness of the AI clothing industry
2022-09-19 Zhiyi Technology

经过数轮的行业洗牌以及电商生态的不断升级,数智化转型已然成为服装行业大势所趋。基于阿里云成熟的产品基础,知衣科技In-depth insight into user needs, directly hit the pain points of the industry,It provides a series of full-link digital intelligence solutions for apparel enterprises.

 

Solution architecture
At present, the overall solution architecture of Zhiyi on Alibaba Cloud is as follows, which is roughly divided into:Product layer, service layer, data layer, and big data platform.
 
  • Product Layer:Zhiyi currently has a number of APP applications, such as the flagship product知衣Enhance design collaboration美念Wait. In addition, we also provide customized APIs to open data interface services and image search capabilities to third parties. A one-stop apparel supply chain platform from digital selection to mass delivery of finished products is also a core capability output.
  • Service Layer:The front-end and back-end systems of related products have been containerized and deployed in Alibaba Cloud's ACK container service cluster
  • Data Layer:It mainly saves the original pictures, business data generated by the business system, and OLAP data analysis services
    • Object Storage Service (OSS): saves original images and builds a billion-level style library for the apparel industry
    • 数据库MySQL:OLTP业务数据
    • HBase: data accessed in KV format, such as product details and offline calculation lists
    • Feature Vector Library: The vectors extracted by image recognition are cleaned and stored in the Proxima vector retrieval engine library developed by Ali Damo Academy
    • ElasticSearch: statistics and calculations of metrics for point searches and small- and medium-scale data. There are more than 1,000 labels for design elements, and the label dimensions mainly include categories, fabrics, textures, processes, accessories, styles, silhouettes, collar shapes, colors, etc
  • Big data platform
    • 日志服务SLS:用于缓存经过图片识别后的海量向量数据。SLS还有一个基于SQL查询的告警能力,就是若向量数据没有进来会触发告警,这对于业务及时发现问题非常有用。
    • Offline data warehouse (DataWorks MaxCompute): You can use Log Service to cache image feature vectors as a data source through DataWorks integration, create a data development task to clean the original feature vectors (such as deduplication) and store them in MaxCompute, and then use DataWorks to write the cleaned vector data to Proxima in ElasticSearch
    • Data Mining & Algorithm Recommendation: Some Python tasks deployed in ACK mainly do recommendation-related content, such as user feature embedding calculation, recommendation of style images based on user behavior, recommendation of similarity bloggers, etc
    • Image recognition service: At present, the image recognition service is mainly deployed in the IDC computer room, and 5~6 GPU servers perform batch image recognition
Evolution of big data solutions

知衣的大数据方案也是经过不同的阶段不断的演进,满足我们在成本、效率和技术方面的追求,本质上还是服务于业务需求。

 

Phase 1: Self-managed CDH cluster in the data center

我们的业务系统一开始就部署在阿里云,同时在IDC机房部署了10台服务器搭建CDH集群,Build a Hive data warehouse.In the computing process, the data from the production environment on the cloud is synchronized to CDH, and the calculation results are transmitted back to Alibaba Cloud to provide data services after the CDH cluster is computed.

Although self-managed CDH clusters save computing costs, they also cause many problems. The most important thing is that O&M is complex and requires professional personnel to manage the O&M of clusters. If there is a problem, it is also searched everywhere on the Internet to troubleshoot the cause, which is relatively inefficient.

 

Phase 2: DataWorks MaxCompute replaces the CDH cluster

To reduce O&M complexity, we migrate computing tasks to MaxComputeUse DataWorks to orchestrate and schedule tasks.
 

Phase 3: ElasticSearch builds an ad hoc query

Zhiqian focuses on quickly discovering fashion trend inspiration, integrated社交平台、品牌秀场、零售及批发市场、淘系电商、时尚街拍Five major image sources, a large number of design inspiration references, help clothing brands and designers quickly and accuratelyPredict the trend of fashion and grasp the market dynamics.Among them, the trend analysis section needs to statistically analyze the design element labels under various combination conditions in a certain quarter, and output indicators such as up, down and pie charts.This is also the query scenario with the largest amount of data, and the amount of data scanned and analyzed will be close to one million.

Compared with the open-source version, the biggest advantage of Alibaba Cloud hosted ElasticSearch is that it is out-of-the-box and O&M-free, and in particular, it supports the Proxima vector retrieval engine of the Damo Academy, which is very suitable for multi-dimensional query and statistical analysis scenarios of our business. Later, we will expand on image recognition and describe the Proxima vector engine.
 

 

Picture recognition

我们的核心功能场景是以图搜图,前提是需要对海量的图片库数据进行识别。我们Machine learning analysis of all images in the gallery in an offline way,Each graph is abstracted into a high-dimensional (256-dimensional) feature vector, and then all features are constructed into an efficient vector index with the help of Proxima.

 

模型训练
图片识别之前需要训练模型。Professional service industry personnel annotate the image library, and then the GPU cluster deployed offline pulls the annotated images from Alibaba Cloud Object Storage Service (OSS) for training.为了降低标注的成本,我们采用了主动学习(Active Learning)方法,即基于一部分已标注的图片由机器学习训练出一个模型,然后对未标注的图片进行预测,让人工对预测结果再次进行确认和审核,Then use the annotated data to monitor督学习模型继续进行模型训练,逐步提升模型效果。

 

批量图片识别

 

模型生成以后打包到Docker镜像,然后在GPU节点上运行容器服务就可以对海量的服装图片进行识别,提取出高维的特征向量。因为提取的特征向量数据量很大且需要进行清洗,我们选择将特征向量先缓存在阿里云日志服务SLS,然后通过DataWorks编排的数据开发任务同步SLS的特征向量并进行包含去重在内的清洗操作,最后写入向量检索引擎Proxima。

 
Because of the large workload of batch image recognition at one time, the computing performance of offline GPU servers is bottlenecked, so we use elastic GPU resources on the cloud to supplement computing resources. On-premises GPUs and cloud GPUs form a pool of computing resources to jointly consume the same batch of computing tasks that require image recognition, greatly improving efficiency. On the cloud, we purchase GPU preemptible instances, which are generally 2~3% off the price according to volume, which can further reduce costs.
 

单次图片识别

 
我们以在线serving的模式在web前端提供单次单张图片识别For example, if a user uploads an image, the following result is output through the model's inference.
 
Search by image

 

构建好服装图片的特征向量库,我们就可以实现以图搜图的功能。当用户上传一张新图片的时候,我们用之前的机器学习方法对其进行分析并Produce a representation vector,然后用这个向量在之前构建的向量索引中查找出最相似的结果,这样就完成了一次以图片内容为基础的图像检索。Choosing the right vector retrieval engine is very important.
 
 

 

Faiss

Faiss (Facebook AI Similarity Search)  是Facebook AI 团队开源的向量检索库引擎。初期我们也是选择Faiss部署分布式服务,在多台GPU服务器上部署特征向量搜索匹配服务,将搜索请求分发到每台GPU子服务进行处理,然后将TOP N的相似结果数据汇总返回给调用方。
 
In the process of using Faiss, we also encountered practical difficulties.当然这并不是Faiss本身的问题,而是需要投入更多人力开发运维分布式系统才能匹配业务需求。
 
  • 稳定性较差: There are 5~6 distributed GPU clusters, and when a machine is hung up, the response time of the entire interface will be extended, and the performance of the service is that the search service waits for a long time before the result is returned.
  • GPU资源不足We use the most basic brute force matching algorithm, and 200 million 256-dimensional feature vectors need to be loaded into the video memory, which puts a lot of pressure on offline GPU resources.
  • 运维成本高:特征库分片完全手动运维,管理比较繁琐。数据分片分布式部署在多个GPU节点,增量分片数据超过GPU显存,需要手动切片到新的GPU节点。
  • 带宽争抢The image recognition service and image search service are both deployed in the offline data center, sharing the 300Mb private line bandwidth from the data center to Alibaba Cloud.
  • 特定场景下召回结果集不足:因为特征库比较大,我们人工将特征库拆成20个分片部署在多台GPU服务器上,但由于Faiss限制每个分片只能返回1024召回结果集,不满足某些场景的业务需求。

 

Proxima

 
Proxima是阿里达摩院自研的向量检索引擎(https://developer.aliyun.com/article/782391),The high-performance similarity search for big data is realized.也集成在我们之前在用的阿里云托管版的ElasticSearch。功能和性能上与Faiss相比各有千秋,主要是针对Faiss使用上的困难,ElasticSearch + Proxima帮助我们解决了。
 
  • High stability: Out-of-the-box product and service SLAs are guaranteed by Alibaba Cloud, and the high-availability architecture of multi-node deployment. So far, interface timeouts have been rare
  • 算法优化:基于图的HNSW算法不需要GPU,且与Proxima集成做了工程优化,性能有很大的提升(1000万条数据召回只需要5毫秒)。目前业务发展特征向量已经增长到3亿。
  • Low O&M costs:分片基于ES引擎,数据量大的情况下直接扩容ElasticSearch计算节点就可以
  • No bandwidth contention: The image search service is directly deployed on the cloud, which does not occupy the bandwidth of the leased line, and there is no timeout query alarm in the image search scenario
  • 召回结果集满足业务需求:Proxima也是基于segment分片取Top N相似,聚合后再根据标签进行过滤。因为segment较多,能搜索到的数据量就比原先多很多。

 

Outlook for technical architecture upgrades

OLAP分析场景优化迭代

With the continuous growth of data volume and changing business requirements, OLAP analysis scenarios are becoming more and more complex, and the requirements for algorithm and technical solution selection are getting higher and higher. Let's take an example of a business scenario:
 
Number of images posted by 100,000 bloggersThere are more than 100 million,用户可以对博主进行关注订阅,关注上限是2000个博主。The size of the pictures corresponding to the 2,000 bloggers that users follow will be200万左右。It is necessary to conduct real-time multi-condition statistical analysis of the pictures that users follow (each user follows bloggers differently)
The above example takes 9 seconds to query in Elasticsearch, which obviously does not meet business needs. Is there a better solution? Recently, after investigating Clickhouse, the data was preprocessed to generate a large wide table and then queried.查询时延已经降低到2秒以内,很好的满足了业务需求。阿里云Clickhouse开箱即用,降低业务试错成本,帮助我们快速响应业务需求。
 

规范数据建模和数据治理

 
At present, DataWorks is mainly used for data integration and task scheduling, and there are a small number of rule-based data quality judgments, and the internal conventions within the team are more documented development specifications and lack the assistance of some effective tools. As business scenarios become more and more complex, integrated data sources become more and more abundant, and there are more and more data developers.制定全部门统一的开发规范非常必要。DataWorks的数据建模通过工具和流程建立数据标准,可以实现结构化有序的统一管理。数据治理模块可以通过配置检查项检测不符合数据规范的开发流程,基于多项治理项的健康分度量项目健康度以及治理成效。目前我们正在结合自己的业务试用数据建模和数据治理,期待能帮助我们更好的管理数据,实现数据价值的最大化。
 
 

Advanced cooperation in image search solutions

 
在服装行业领域图片识别和以图搜图是我们的核心竞争力。阿里云机器学习PAI也提供了相似图匹配的图像检索解决方案(https://help.aliyun.com/document_detail/313270.html)只需要配置原始图像数据,无需标注就可以在线构建模型,这点对我们来说比较有吸引力,后续可以考虑进行测试对比,展开在服装图片建模领域的合作。
 
关于知衣科技
杭州知衣科技有限公司是一家以人工智能技术为驱动的国家高新技术企业,致力于will数据化趋势发现、爆款挖掘和供应链组织能力标准化输出to create a supply chain platform for intelligent clothing design. Zhiyi was established in February 2018 and received tens of millions of US dollars in Series A financing in the same year; In 2021, it completed a Series B financing of 200 million yuan by Hillhouse Venture Capital and Wan Material Ability, and was shortlisted for the "Hangzhou Quasi-Unicorn Enterprise List" in the same year.
 
Zhiyi by virtue of图像识别、数据挖掘、智能推荐and other core technical capabilities, constantly upgrading the service system, independent research and development知衣、知款、美念等一系列服装行业数据智能SaaS产品,为服装企业和设计师提供流行趋势预测、设计赋能、款式智能推荐等核心功能,并通过SaaS入口向产业链下游拓展,提供一站式设计+柔性生产的供应链平台服务。目前已服务UR、唯品会、绫致、赫基、太平鸟、海澜之家、森马等数千家时尚品牌和平台。

知衣APP下载
IOS
Android