Several customers have recently asked how to search text within Microsoft Word documents, and this post outlines how this can be easily achieved.
I’m going to break the post down into a few sections as there are different options/techniques which should be used to locate text based on your requirements. However, the first step is to obtain the text from within the document.
Obtain text content from a Word document
For example, I will check whether new documents added to a SharePoint list contain email addresses and then update the records metadata accordingly.
NOTE:
The following steps can be performed on DOC, DOCX, DOTX and RTF Files.
1. Create a new ‘Automated cloud flow‘ flow in Power Automate
1.a. Flow name: Provide a name for your flow
1.b. Trigger: Select the ‘When a file is created in a folder’ SharePoint trigger action
1.c. Click ‘Create‘
2. Configure the ‘When a file is created in a folder’ SharePoint trigger action as required
NOTE: Before adding the Encodian action, you may wish to add some logic to your flow (or a trigger condition) so that only certain types of files are handled, for example:
3. Add the Encodian ‘Convert Word‘ action
3.a. Output Format: Select ‘TXT‘
3.b. Filename: Select the ‘x-ms-file-name-encoded‘ property provided by the ‘When a file is created in a folder’ SharePoint trigger action
3.d. File Content: Select the ‘File Content‘ property provided by the ‘When a file is created in a folder’ SharePoint trigger action
Our flow is now configured to obtain the contents of the Microsoft Word Document in TXT format (Text)… the Encodian action returns a TXT file, and we can easily access the contents by decoding the base64 file using the base64toString() expression as follows:
4. Add a ‘Compose‘ action and configure it as follows:
At this stage, your flow is ready to be tested! Add a Microsoft Word file to the SharePoint folder you are monitoring, and then check the run history:
Search Text with Power Automate
There are a couple of options which can be used to search text content within Power Automate:
- Contains() Expression
- ‘Search Text – Regex‘ Action
The Contains() expression can be used to validate whether a specific value exists within a string; the following example shows how to check whether the text obtained from the document contains the word ‘Encodian‘:
Expression Reference: contains(outputs(‘Compose_-_TXT_to_Text’),’Encodian’)
This will provide a boolean value confirming whether the string is contained. Alternatively, we can use the ‘Search Text – Regex‘ action as follows (Continuing from Step 4):
This regex query will search for any contained email addresses.
5. Add the Encodian ‘Search Text – Regex‘ action
5.a. Text: Select the ‘Outputs‘ property provided by the ‘Compose‘ action
5.b. Regex Query:
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
The ‘Search Text – Regex‘ action will find all text matches and return an array (one or more); see below for an example:
We can now add logic to our flow to do something based on the result. For this example, I will update the metadata on the source file to indicate whether sensitive data is contained (like an email address) within the document.
6. Add a condition action, and configure it as follows:
The configured condition checks whether the matches returned by the Encodian action are greater than 0, i.e. confirming that matches have been found.
7. Inside the ‘If yes‘ thread, add the ‘Get file metadata‘ SharePoint action:
7.a. Site Address: Configure as per step #2
7.b. File Identifier: Select the ‘x-ms-file-id‘ property provided by the ‘When a file is created in a folder’ SharePoint trigger action
8. Add the ‘Update file properties‘ SharePoint action
8.a. Site Address: Configure as per step #2
8.b. Library Name: Configure as per step #2
8.c. Id: Select the ‘ItemId‘ property provided by the ‘Get file metadata‘ SharePoint action
The flow is now complete and can be used to check whether sensitive data is contained within Microsoft word documents!
Finally
We hope you’ve found this guide useful, and as ever, please share any feedback or comments – all are welcome!
You can find further documentation and guidance on the Encodian support portal: Convert Word
1 Comment