How do I plan and write CLI Script
This article takes you from start to end of a regular official week in a Senior Software Engineer’s life, while working for one of my employer. It not only gives you detail around solving a technical problem, but also describes the thought process around decision making, and choosing the optimal way of doing it.
Let me quickly start with the stuffs I have in plate to begin the week:
- VictorOps on call support for different services my team vertical own.
- QA support in context of testing out the code written last week. This actually is the QA week of our current sprint.
- Migrate public videos hosted over Youtube to Kaltura (3rd party SAAS platform for video processing & hosting purpose, already integrated with Video micro-service our team owns)
The items are enlisted in order of their priority. First two points drives the focus on the basis of their urgency, yet my primary deliverable/responsibilities for the week. Third one is from the next Sprint, we had picked this early to ensure a smooth delivery, and it’s the highlighted item of this post.
Before doing anything, as per protocol, engineer need to be clear on why do we need this?
What is the motivation behind this? Well YouTube provides a way to render video (hosted on their servers) on our webpage using IFrame embed option. But since the actual location of video is present on the famous tube site, hence all video suggestions by Google (or any other search engine) are ending up on youtube directly, rather than coming to my employer’s domain. Yes, it’s a SEO impact. Youtube as public video hosting solution was an outcome of legacy decision team made, and Kaltura as video media handling solution was something came into picture an year back when we needed to host private videos safely, without worries of processing them. A small recall, we are an education tech firm, whose primary focus is solving student problems. If this was a media streaming firm then things of focus would have been different. Let me clear once again, Kaltura is a SAAS platform whereas YouTube is a tube site which have a big stomach to gulp all the traffic directly which doesn’t even require coming to our domain (which have iframes of main source on our definition pages) for rendering it.
How can we achieve the migration now?
Let me trim down the course of action items here:
- Collect all videos which content team had uploaded to youtube.
- Bulk upload them to Kaltura platform.
- Update the database entires in Video API micro-service to point to new location.
- Some front-end code adjustments, as per new requirement which involves responsiveness introduction to our version of Kaltura Video player.
Collecting all videos:
We have two different options in here, ask content team to give us media container files (mp4) with a CSV sheet which contains a mapping of old youtube video URL to mp4 files, plus selected youtube video thumbnails as well. I guess, last week content team had already started doing the manual process of achieving this, and around 480 out of 591 videos were mapped by Monday of this week. Before choosing this kind of solution, always consider one important line in your mind: “humans are prone to errors”. Hence the other option arises, write an automated script to do that.
To find out scope of script we need to check following stuffs first:
> How to download our own authored video from youtube directly?
Here we have 2 options, like a wise man once said “an engineer is one who always have variety of solutions towards a specific problem”. First, ask respective teams for getting permissions around youtube account, using which those videos were uploaded, and download them manually using download button on their UI (Or use their APIs). Second, download it directly using youtube-dl.
I went with youtube-dl option, considering the amount of time spent in getting the permission from a team who is working in different timezone. This is in-fact an optimal solution for the case, taking “its our own content” in context!
> How does bulk upload in KALTURA works? Consider video metadata migration as well (like title, selected thumbnail, categories, etc)
Yes, Kaltura have bulk upload functionality, which says you just need to give them a list of video and its related data in CSV or XML format, with links to videos and thumbnail. More details are present over here. They also have direct youtube import but it actually iframe embeds the media rather than downloading them, an FYI. Our main target is to do the migration and remove existing stuffs on youtube, finally driving all traffic to pages with videos my employer’s domain hosts.
> How can I share video and thumbnail links to Kaltura? Note, we have metadata of youtube hosted videos present in our local team owned video micro-service database.
Since we are mostly utilising AWS heavily hence the cheapest and efficient way of doing this is to use a private S3 bucket for storing related objects, and share expirable signed URL with Kaltura as an entry in XML sheet. I personally prefers XML over CSV for doing this kind of stuff, considering the way of interaction via code.
Script finally:
To write any kind of code (or script) I personally considers following items:
> Language?
This was not a question for me some 4–5 years back, but now after spending 8 long years in this industry, and getting a wide exposure of different languages its a foremost question now. Every language have its own charm in doing things, it all depends on the kind of problem you are solving. So in this case, I need ability to download and upload stuffs in parallel (not concurrent), should have strong community support considering youtube-dl operations and XML generation, can be compiled and run anywhere. In this case I didn’t wasted even a single minute to choose python directly.
> Machine and Music?
We have been officially provided with the latest version of high powered Macbook Pro. Height adjustable desk, ergonomic chair, two different FHD enabled Dell screens, I had placed one horizontally (for IDEs and Editors) and one vertically (for terminal only). An okiesh TVS mechanical keyboard and normal wired mouse. Am saying “okiesh” because I had working experience on much better than this one, Das Keyboard. The way fingers glides on those Cherry MX Blue click enabled switches is in-fact unforgettable. As a human tendency we always do comparisons 😉 To know more, Once use it you will feel the difference — “an honest programmer”
^^ Plus the last piece, blessings of Lord Shiva ❤ !
For music I went with different tracks by “Naxatras”. Quite rythemic plus psychedelic, yet concentrative. link to one of its albums: https://www.youtube.com/watch?v=WNjyvtjAmUo
> IDE?
I like Intellij products, hence Community edition PyCharm will be a good option in here.
> Scope of script:
- Should be a CLI, so picked CLI boilerplate available on github: https://github.com/tkant/python-cli-tool-boilerplate
- Arguments its gonna take? Just the input csv which is extracted from video micro-service, containing metadata of only youtube provider type. Yes we used provider pattern there ????
- Whenever batch keyword comes under the requirements, I always think of multiple threads/processes. Its something I always ask to a DevOps interview candidate.
- youtube-dl have support to download the media container files in following format for max 1080p HD video:
- The best quality (format-code 22) ^^ in mp4 format which this lil’ notorious script says is actually of 480p resolution. May be its YouTube actually creates video and audio files separately and merge them in the YouTube video player while rendering. This is a simplest example of “divide and rule” algo. Having video and audio separately decreases the time of processing them. Like they can process multiple formats off video and audio both separately in different threads to generate multiple versions (audio & video) of the same input video. Multiple versions in the sense multiple formats, like for video we have mp4 (widely accepted format) & webm files, while for audio we have mp4 & aac formats. This is what youtube is generating on their processing end. Example output of available formats for a 4k video for youtube-dl:
- Similarly different products have different way of doing the video processing on their side using a pile of instructions written in a language which is widely accepted by many developers. Other example is pornhub which generates only mp4 files which have video and audio both for a 4k video input.
- So here we need to download the files separately and merge them both using ffmpeg. Luckily youtube-dl have support for doing this as well. The main soup here is the options passed to youtube_dl.YoutubeDL. It picks the best video in mp4 and best audio in m4a, then in post processor using FFMPEG clubs them both. Note we had also passed the progress hook to basically store base name of video title. Using this base name as prefix and hardcoded extension string (mp4, jpg) we upload the files to S3 bucket.
- That’s it we are done, now we can run the script and generate input XML for kaltura bulk import functionality. Kaltura will first validate the whole input XML file, then start importing & processing listed <item> in its platform.
End result of all this was:
Without any internal team (for example Content team) dependency, an engineer himself had successfully migrated the videos.
Experience++ in doing stuffs like this.
Concepts used:
Command line input parsing
Multi-processing
Scraping
Handling external dependencies
File handling, XML type
Simple, safe and precise file sharing mechanism
Little bit of video processing.
Due to my job responsibility, I had shared only the glimpse of whole code. For a learner, it will be fun connecting all the mentioned dots 😉
In the end, feel free to ask any questions you have using comments section below, Namaste!
Leave a comment