## Steps 1. Prepare products-all.json and image_data (folder) using javascript to download. These files should be saved in `./data/BATCH_SOURCE` which is a new folder. Give a new batch_source id to each new incoming data. 1. Run `process_item.py` to categorize category, gender and occasions for each data. Output to `./data/{BATCH_SOURCE}/metadata_extraction.json`. This should be running on H200 device. 3. Organize all data and then embed them into db locally using `run_ingestion.py` ## Raw Data Structure ```json ## products-all.json { "id": "BUL808", "name": "SARAH ZHUANG - 'Click & Link' diamond 18k gold earrings", "brand": "SARAH ZHUANG", "category": "Fine Jewellery And Watches", "subcategory": "General", "price": 17500, "currency": "HKD", "description": "Sarah Zhuang's Click & Link earrings embrace the allure of geometry. Forged into elegant rectangles with one side encrusted with diamonds, this gold pair will certainly elevate your cocktail ensembles.", "tags": [ "sarah zhuang", "fine jewellery and watches", "in-stock", "new", "sarah", "zhuang", "'click", "link'", "diamond" ], "imageUrl": "https://media.lanecrawford.com/B/U/L/BUL808_in_xl.jpg", "url": "https://www.lanecrawford.com.hk/product/sarah-zhuang/-click-link-diamond-18k-gold-earrings/_/BUL808/product.lc?utm_medium=embed&utm_source=ai-recommended&utm_campaign=2025-christmas_lc_ai-recommended", "color": "YELLOW GOLD", "groupName": "Fine Jewellery", "deptName": "Women's Fine Jewellery", "onlineBU": "Fine Jewellery", "stockAvailability": true } ``` ## Example in `metadata_extraction.json` ```json "EOJ367": { "category": "shoes", "gender": "female", "applicable_occasions": [ "Casual", "Outdoor", "Travel / Transit" ], "inappropriate_occasions": [ "Formal", "Black Tie / White Tie", "Bridal / Wedding", "Business / workwear", "Cocktail / Semi-Formal" ] } ``` ## Metadata in Vector Database ```json { 'item_id': 'EOJ128', 'category': 'sunglasses', 'gender': 'unisex', 'modality': 'image', 'brand': 'CELINE', 'color': 'BROWN', 'description': "Immerse yourself in the depth of classic style with CELINE\'s Tortoiseshell Logo Sunglasses. Featuring a rich, tortoiseshell acetate frame and adorned with the iconic CELINE logo in gold, these sunglasses are a testament to timeless elegance and luxury. Perfect for those who appreciate a sophisticated aesthetic, they offer optimal UV protection while ensuring you remain at the forefront of fashion.", 'tags': 'celine,accessories,in-stock,new,maxi,triomphe,acetate,round', 'price': 4500, 'url': 'https://www.lanecrawford.com.hk/product/celine/maxi-triomphe-acetate-round-sunglasses/_/EOJ128/product.lc?utm_medium=embed&utm_source=ai-recommended&utm_campaign=2025-christmas_lc_ai-recommended', 'batch_source': '2025_q4', 'Outdoor': 0, 'Ski / Snow / Mountain': 0, 'Festival / Concert': 0, 'Activewear': 0, 'Casual': 1, 'Cocktail / Semi-Formal': -1, 'Formal': -1, 'Party / Clubbing': 0, 'Evening': 0, 'Travel / Transit': 0, 'Beach / Swim': 0, 'Garden Party / Daytime Event': 1, 'Black Tie / White Tie': -1, 'Resort': 1, 'Athleisure': 0, 'Business / workwear': -1, 'Bridal / Wedding': -1, } ```