Modify this regex to select ONLY the immediate surrounding specific tags and not siblings

  Kiến thức lập trình

I’m using Visual Studio Code to run a find/replace on a folder full of XML files.

I need to remove all LineItem nodes with a specific string “S00BBABHK”.

NOTE:this code is formatted for clarity, the actual XML in the files I’m working with are one single line and are not formatted in any way.

            <LineItem sequence="0">
                ...
                    <Item id="GOOD_1" />
                ...
            </LineItem>     
            <LineItem sequence="0">
                ...
                    <Item id="GOOD_2" />
                ...
            </LineItem>
            <LineItem sequence="0">
                ...
                    <Item id="S00BBABHK" />
                ...
            </LineItem>
            <LineItem sequence="0">
                ...
                    <Item id="Good3" />
                ...
            </LineItem>     

Here is the contents of an example file. NOTE:The files contain one single line and are not formatted in any way.

<ICSMXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.icsm.com/icsmxml"><Header><MsgDelivery><To><Credential><Domain>ICSMID</Domain><Identity>196M</Identity></Credential></To><From><Credential><Domain>ICSMID</Domain><Identity>LVS</Identity></Credential></From><Sender><Credential><Domain>DNS</Domain><Identity>BOSSAPI</Identity></Credential></Sender></MsgDelivery><MsgHeader><MessageId>40cb9826-e2e0-4ca2-8de7-08ab9f601cc4</MessageId><Timestamp>2024-04-10T07:12:52</Timestamp><DocumentId>ICSMPO</DocumentId></MsgHeader><ConversationState><ConversationId>9790f6aa-2225-461b-8c85-cd97e7ee2577</ConversationId></ConversationState></Header><Request deploymentMode="production"><OrderRequest><OrderRequestHeader orderDate="2024-04-09" orderID="SA1403865" type="new" orderType="regular" shippingType="item"><CustomerIdentification><CustomerInfo name="PONumber">33475712</CustomerInfo><CorporateInfo name="AccountNumber">Lovesac</CorporateInfo></CustomerIdentification></OrderRequestHeader><LineItem sequence="0"><CustomerIdentification><CorporateInfo name="unitOfMeasureQty">1</CorporateInfo><CorporateInfo name="unitTransferCost">0.000000</CorporateInfo><CorporateInfo name="SellCode">E</CorporateInfo></CustomerIdentification><LineItemConfiguration /><LineItemShipping domain="ICSM" carrier="P" serviceLevel="PF"><ShipTo><Address><Name>Brianna Eales</Name><PostalAddress addressID=""><DeliverTo>Brianna Eales</DeliverTo><Street>5010 RENO DR</Street><Street /><City>PASADENA</City><State>TX</State><PostalCode>77505-2156</PostalCode><Country>United States</Country></PostalAddress></Address></ShipTo></LineItemShipping><PartnerItemDetail type="point" lineNumber="5" quantity="1" unitOfMeasure="EA"><Item id="S00BBABHK" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail><PartnerItemDetail type="production" lineNumber="5" quantity="1" unitOfMeasure="EA"><Item id="S00BBABHK" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail></LineItem><LineItem sequence="0"><CustomerIdentification><CorporateInfo name="unitOfMeasureQty">1</CorporateInfo><CorporateInfo name="unitTransferCost">0.000000</CorporateInfo><CorporateInfo name="SellCode">E</CorporateInfo></CustomerIdentification><LineItemConfiguration /><LineItemShipping domain="ICSM" carrier="P" serviceLevel="PF"><ShipTo><Address><Name>Brianna Eales</Name><PostalAddress addressID=""><DeliverTo>Brianna Eales</DeliverTo><Street>5010 RENO DR</Street><Street /><City>PASADENA</City><State>TX</State><PostalCode>77505-2156</PostalCode><Country>United States</Country></PostalAddress></Address></ShipTo></LineItemShipping><PartnerItemDetail type="point" lineNumber="6" quantity="1" unitOfMeasure="EA"><Item id="UM6494" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail><PartnerItemDetail type="production" lineNumber="6" quantity="1" unitOfMeasure="EA"><Item id="UM6494" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail></LineItem><LineItem sequence="0"><CustomerIdentification><CorporateInfo name="unitOfMeasureQty">1</CorporateInfo><CorporateInfo name="unitTransferCost">0.000000</CorporateInfo><CorporateInfo name="SellCode">E</CorporateInfo></CustomerIdentification><LineItemConfiguration /><LineItemShipping domain="ICSM" carrier="P" serviceLevel="PF"><ShipTo><Address><Name>Brianna Eales</Name><PostalAddress addressID=""><DeliverTo>Brianna Eales</DeliverTo><Street>5010 RENO DR</Street><Street /><City>PASADENA</City><State>TX</State><PostalCode>77505-2156</PostalCode><Country>United States</Country></PostalAddress></Address></ShipTo></LineItemShipping><PartnerItemDetail type="point" lineNumber="7" quantity="1" unitOfMeasure="EA"><Item id="UM5562" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail><PartnerItemDetail type="production" lineNumber="7" quantity="1" unitOfMeasure="EA"><Item id="UM5562" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail></LineItem><LineItem sequence="0"><CustomerIdentification><CorporateInfo name="unitOfMeasureQty">1</CorporateInfo><CorporateInfo name="unitTransferCost">0.000000</CorporateInfo><CorporateInfo name="SellCode">E</CorporateInfo></CustomerIdentification><LineItemConfiguration /><LineItemShipping domain="ICSM" carrier="P" serviceLevel="PF"><ShipTo><Address><Name>Brianna Eales</Name><PostalAddress addressID=""><DeliverTo>Brianna Eales</DeliverTo><Street>5010 RENO DR</Street><Street /><City>PASADENA</City><State>TX</State><PostalCode>77505-2156</PostalCode><Country>United States</Country></PostalAddress></Address></ShipTo></LineItemShipping><PartnerItemDetail type="point" lineNumber="8" quantity="1" unitOfMeasure="EA"><Item id="UM5467" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail><PartnerItemDetail type="production" lineNumber="8" quantity="1" unitOfMeasure="EA"><Item id="UM5467" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail></LineItem><LineItem sequence="0"><CustomerIdentification><CorporateInfo name="unitOfMeasureQty">1</CorporateInfo><CorporateInfo name="unitTransferCost">0.000000</CorporateInfo><CorporateInfo name="SellCode">E</CorporateInfo></CustomerIdentification><LineItemConfiguration /><LineItemShipping domain="ICSM" carrier="P" serviceLevel="PF"><ShipTo><Address><Name>Brianna Eales</Name><PostalAddress addressID=""><DeliverTo>Brianna Eales</DeliverTo><Street>5010 RENO DR</Street><Street /><City>PASADENA</City><State>TX</State><PostalCode>77505-2156</PostalCode><Country>United States</Country></PostalAddress></Address></ShipTo></LineItemShipping><PartnerItemDetail type="point" lineNumber="9" quantity="1" unitOfMeasure="EA"><Item id="UM3925" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail><PartnerItemDetail type="production" lineNumber="9" quantity="1" unitOfMeasure="EA"><Item id="UM3925" /><Price valueType="unit"><Money currency="USD">0.000000</Money></Price><Price valueType="extended"><Money currency="USD">0.00</Money></Price></PartnerItemDetail></LineItem></OrderRequest></Request></ICSMXML>

I’m using the following regex:

`<LineItem.*?sequence="[^>]*>.*(?=S00BBABHK).*</LineItem>`

which selects ALL LineItem nodes in a given file if the query string is found in any of the nodes.

I am stuck on how to modify it to select ONLY the LineItem node that contains the query string.

This is not a part of a script. I’m using the regex as a search/replace query in VS Code. (Mac, OS 12.7.4, Visual Studio Code 1.86.2) I will have a batch of these to modify on a daily basis (until the problem gets fixed by the upstream source).

Thank you, in advance, for your guidance and your patience.

New contributor

Darin Murray is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

LEAVE A COMMENT