Steps
- Prepare products-all.json and image_data (folder) using javascript to download. These files should be saved in
./data/BATCH_SOURCEwhich is a new folder. Give a new batch_source id to each new incoming data. - Run
process_item.pyto categorize category, gender and occasions for each data. Output to./data/{BATCH_SOURCE}/metadata_extraction.json. This should be running on H200 device. - Organize all data and then embed them into db locally using
run_ingestion.py
Raw Data Structure
## products-all.json
{
"id": "BUL808",
"name": "SARAH ZHUANG - 'Click & Link' diamond 18k gold earrings",
"brand": "SARAH ZHUANG",
"category": "Fine Jewellery And Watches",
"subcategory": "General",
"price": 17500,
"currency": "HKD",
"description": "Sarah Zhuang's Click & Link earrings embrace the allure of geometry. Forged into elegant rectangles with one side encrusted with diamonds, this gold pair will certainly elevate your cocktail ensembles.",
"tags": [
"sarah zhuang",
"fine jewellery and watches",
"in-stock",
"new",
"sarah",
"zhuang",
"'click",
"link'",
"diamond"
],
"imageUrl": "https://media.lanecrawford.com/B/U/L/BUL808_in_xl.jpg",
"url": "https://www.lanecrawford.com.hk/product/sarah-zhuang/-click-link-diamond-18k-gold-earrings/_/BUL808/product.lc?utm_medium=embed&utm_source=ai-recommended&utm_campaign=2025-christmas_lc_ai-recommended",
"color": "YELLOW GOLD",
"groupName": "Fine Jewellery",
"deptName": "Women's Fine Jewellery",
"onlineBU": "Fine Jewellery",
"stockAvailability": true
}
Example in metadata_extraction.json
"EOJ367": {
"category": "shoes",
"gender": "female",
"applicable_occasions": [
"Casual",
"Outdoor",
"Travel / Transit"
],
"inappropriate_occasions": [
"Formal",
"Black Tie / White Tie",
"Bridal / Wedding",
"Business / workwear",
"Cocktail / Semi-Formal"
]
}
Metadata in Vector Database
{
'item_id': 'EOJ128',
'category': 'sunglasses',
'gender': 'unisex',
'modality': 'image',
'brand': 'CELINE',
'color': 'BROWN',
'description': "Immerse yourself in the depth of classic style with CELINE\'s Tortoiseshell Logo Sunglasses. Featuring a rich, tortoiseshell acetate frame and adorned with the iconic CELINE logo in gold, these sunglasses are a testament to timeless elegance and luxury. Perfect for those who appreciate a sophisticated aesthetic, they offer optimal UV protection while ensuring you remain at the forefront of fashion.",
'tags': 'celine,accessories,in-stock,new,maxi,triomphe,acetate,round',
'price': 4500,
'url': 'https://www.lanecrawford.com.hk/product/celine/maxi-triomphe-acetate-round-sunglasses/_/EOJ128/product.lc?utm_medium=embed&utm_source=ai-recommended&utm_campaign=2025-christmas_lc_ai-recommended',
'batch_source': '2025_q4',
'Outdoor': 0,
'Ski / Snow / Mountain': 0,
'Festival / Concert': 0,
'Activewear': 0,
'Casual': 1,
'Cocktail / Semi-Formal': -1,
'Formal': -1,
'Party / Clubbing': 0,
'Evening': 0,
'Travel / Transit': 0,
'Beach / Swim': 0,
'Garden Party / Daytime Event': 1,
'Black Tie / White Tie': -1,
'Resort': 1,
'Athleisure': 0,
'Business / workwear': -1,
'Bridal / Wedding': -1,
}