When using the crawler.js extraction functions you may find that you don’t have all of the information you need available on a rendered page! Don’t panic. There are online services that will help you test to ensure your JSON array is formed correctly. To correctly form this JSON array, you’ll need to add curly braces around the entire array and remove the last trailing comma. If everything worked, you will have a json.txt file with a malformed JSON array. If you do have hundreds of files, look at splitting them up into smaller groups and making the json.txt file name dynamic so you end up with several smaller JSON files rather than one monster file. It took us a couple of tries before we had it outputting what we needed. We highly recommend running a test with only a few files to make sure you’ve adjusted the code correctly before you try running the crawler script on a directory with hundreds of files. You’ll see a flash of content for each *.html file on the screen and when it is finished running, the screen will stop on the last file crawled. Now, you can run crawler.php in the browser! In our case, that url was. You can change the name of your json.txt file by editing json-to-file.php on line 5. The crawler.php file, together with crawler.js, will go through every *.html file in your ‘html’ folder and use Javascript to pull data from the scraped content into a JSON array which will be called json.txt. You’ll have to inspect the source of the URLs you’re importing and find what elements/classes/IDs will help you to target the title, author, author slug, publish date, featured image, and content. This is where you’ll have a bit of work to do! These extraction functions will help populate the JSON array that you’ll be using to import data into WordPress. As you can see below, we needed content pieces that fell under ‘/about/features/’. For the post import, we only needed certain content pieces that fell under a particular category and, thankfully, those pieces were easy to parse out based on the URL structure. We’d need them for redirects, anyways, so this was already part of our process. We ran the Screaming Frog SEO Spider Tool on the existing site to get all of the site URLs. Add importing featured images to the Commander Vee import script. ![]() Write a supplemental script to scrape the database for missing information.Alter Commander Vee to be specific to our project. ![]() (Who wants to transfer everything only to end up deleting what you don’t need?) We had a copy of the database and could have written a script that would extract what we needed from it but we weren’t familiar with the entity relationships for Craft CMS and after an initial assessment it seemed like more work than necessary for what we wanted to do. This ruled out a straight migration using existing tools. In our case, we only needed to transfer over certain content pieces that fell under a particular category. ![]() (We’re seeing less and less of the latter as the Internet moves toward content management systems.) We helped migrate 60+ blog posts from a Craft CMS to WordPress for the Edmonton Community Foundation in under two hours and this is how we did it! This is especially true if you’re migrating from one platform to another or from a static site to a dynamic one. Content migration burns hours that could be used for design and development.
0 Comments
Leave a Reply. |