In this article, I am going to show you how we can improve the computing power of simple API script from total overall (6 minutes and 17 seconds) to (1 minute 14 seconds)
I will share with you one of my simple favourite technique that I prefer to use especially when I work on data science tasks such as data visualization, data analysis, code optimization, and big data processing.
Processing a task in a sequential way may take a long time especially when we are talking about a huge amount of data(eg. big inputs)
This technique takes advantage of parallelization capabilities in order to reduce the processing time.
The idea is to divide the data into chunks so that each engine takes care of classifying the entries in their corresponding chunks. Once performed, each engine reads, writes and processes its chunks, each chunk be processed in the same amount of time.
The example I choose to use for this article is Genderize names that consist of 2 alphabetic characters.
Output Analysis Chart
Clone GitHub Repo and follow instructions in Usage section.
Let’s generate all alphabet names that consist of 2 characters(to make the testing process easy)
we can use some Linux Kali penetration testing tool such as crunch $ crunch 2 2 > names.txt so we generate all possible alphabet names with length 2 (676 lines)
then let’s create directories which are needed for splitting process $ mkdir subs/ subs/inputs subs/outputs subs/outputs/parts subs/outputs/all
now we can split out input data, there are many ways to do that but I prefer to use Unix split command  $ split -l 100 -d names.txt ./subs/inputs/ so we split names.txt file into small files, each file consists of 100 lines
now let’s run all processes: ./init.bash after finish use merger.py script to merge all outputs. merging process separated to avoid conflicts behaviours and sorting-save.
2 SDE Amazon interviews invitations in 1 week, a new experience!
I used to get rejections from Amazon at CV monitoring stage :’( but I never give up! I also used to feel this low energy after finish contests ex. interesting codeforces rounds, although sometimes in contests I solve problems that are much harder than multi-international companies interviews questions but I have to admit that after these 2 interviews my battery is not low as usual it literally dies instead :’)
Let’s analyze briefly:
Total time [1.5 : 2.5] hours
Total questions topics-based: DP, BT, Recursion, String Manipulation, basic math, build and sort complex ds.
A session for algorithms & data-structures coding questions.
A session for open-ended questions to discuss your solutions and complexity.
A session for reasoning questions(very tricky).
A session for code debugging ability(not hard).
A session for working-style questions(focused on soft skills + psychological dimensions).
A session for a survey.
Tricky corner test cases.
DS coding questions are annoying and need to start with wisely choices and smart ideas from the beginning.
Open-ended questions have a very short time, need to think and organize your answer in your mind before the session or during the coding session.
Overall Div2 ~ D level can nail it.
For sure there are other hundreds of questions topics, sessions, and styles but this was my own experience!
If you are still an undergraduate, my advice: “problem-solving” & “practice”