Warning: We adapted the SDLC to our own condition based on limited timeline and resources. Trade-offs exist: we wanted to maximize output, which came at the expense of other factors. For example, we didn't implement comprehensive testing in our software.
The foundational documents were prepared, including the API Documentation, Product Requirements Document (PRD), and Architecture Design. Trivial, right? Not at all. First, we were given highly abstract requirements. We had to understand the problem, opportunity, objective, and all the tedious business-related requirements to better grasp the product we were building. We needed to define user requirements (user stories with their acceptance criteria), as well as functional and non-functional requirements, such as 99.9% availability and what users should see upon clicking a specific button. To tell you the truth, questions like "Is this even necessary?" popped up in my mind countless times throughout this process. But I had to endure because these documents are, I hate to break it to you, actually useful. Having a reference during development helps clarify what users expect from each feature and enables informed decision-making later in the development phase.
Beyond documentation, we created setup guides for the development environment to help future engineers joining our team. Additionally, this phase involved purchasing domains, preparing contracts for future team members, creating the GitHub repository and knowledge base, among other administrative tasks.
A very abstract and basic UI design of the platform had been created prior to these processes, but a more thorough design was needed that included all user flows. This was outside my scope of expertise. Luckily, our team had an engineer who also understood the inner workings of product design. Iqbal, a good friend of mine, led this process and finished it without any significant hindrances worth noting. Thanks to him, the development process could proceed smoothly.
With our requirements documented and our team aligned, we moved into the technical planning phase. This is where abstract ideas began taking concrete shape through technology choices and architectural decisions.
We decided to use technologies that were mainly familiar to us or could be quickly learned, rather than the most modern or optimal for specific use cases.
This included selecting the technologies needed for development. We chose Next.js for the front end (with Zustand for state management) and Go for the back end. Notice that I just typed "Go," whereas usually there'd be a framework for the back end like Express.js or Spring Boot. In our case, we wanted to challenge ourselves by using the Go standard library.
During the preparation phase, our team applied for the GCP and AWS startup programs and was fortunate to be selected for both. As mentioned earlier, we decided to move forward with AWS due to familiarity.
Since our app heavily relied on AI models, we chose Fireworks AI as the model provider, allowing us to select between various models while complying with security standards such as SOC 2 Type II and HIPAA Compliance.
As with all good system designs, we started by understanding scale and metrics.
We calculated the estimated total users by approximating the average number of employees that KJRI had and multiplying it by the number of branches in the US. Although it might not be best practice to rely solely on this estimation due to scalability concerns, it was a good starting point for a relatively small-to-medium sized project. Then came the hard part: calculating metrics regarding meetings. This involved estimating the number of meetings they'd hold monthly, how many would actually be online, the average meeting length, storage requirements, and many more factors. After these technical calculations, we could extrapolate how much each feature would cost. I won't get into detail on these calculations, but feel free to reach out if you'd like to know more.
These metrics would greatly affect the pricing of each plan (extrapolating from cloud + development costs to determine feature costs) and how the data models would be designed. For example, if everyone were to record in each meeting, the design would differ from a scenario where only one person was allowed to record and edit per meeting (affecting RBAC for a meeting object).
Afterwards, we had to decide upon the technologies actually needed to make our app operational. EC2 would be the conventional way to deploy our application, at least for startups at this stage. However, after considering trade-offs and advantages (mainly the convenience of scaling further down the line), we decided to use ECS Fargate. It offered better resource utilization and automatic scaling compared to EC2, which had static specifications. In a scenario where we allocated far greater specifications to the EC2 instance than needed and traffic was low, it would still consume the same amount of resources and costs as when traffic was high. With Fargate, we only pay for the resources we actually use. For the database, RDS with PostgreSQL was our straightforward choice. We considered using ScyllaDB after reading an article by tiket.com but deemed it an unnecessary hassle to implement at this early stage. The other technologies were typical selections: ECR for container images, S3 to store meeting recordings, CloudWatch for monitoring and logging, among others (VPC & Route 53). Finally, we ensured proper policies were configured for each service, such as retention periods for non-latest images in the ECR Repository and similar lifecycle policies for S3 and other services. Beyond cloud infrastructure, we decided to use Firebase OAuth for authentication via Google, Vercel to host the front end, and Redis (hosted on Redis.io) for our caching and queue management.
We had a solid plan for the system and infrastructure. Now, all we had to do was build it. Sidenote: I greatly credit this process to another good friend of mine, Jeremy. Without his assistance and knowledge, it would have been a pain in the rear.
With our architecture defined and our infrastructure planned, it was time to transform our designs into working software. This phase tested not just our coding abilities, but our problem-solving skills and adaptability.
Nothing particularly special happened in the beginning. Initialize the project, structure it as planned, and… well, code. For the back end, that involved integrations with the database and external services, and creating boilerplate code such as the actual CRUD functions, handlers and endpoints, and application logic. Similar work applied to the front end and infrastructure code. I'd say 70% of the hard work was in the planning phases, another 20% was dealing with problems, and only 10% was actually allocated to coding.
As you've noticed after reading the beginning of this section, there's nothing particularly special about the coding itself. I've purposely done this to keep you—and more importantly, myself—grounded. Without reading all the other sections, it would seem like developing yet another CRUD application, which, in all technicalities, it was. However, software engineering is more than just coding and CRUD, as you've read from the other sections. Furthermore, even at this phase, we faced a few challenges that required knowledge beyond basic CRUD applications.
There are many other processes that occurred during development that I haven't mentioned, such as setting up the CI/CD pipeline (including configuring the different stages, handling environment-specific configurations for dev and prod, etc.) and deep diving into the particulars of both the front end and back end applications. I'd be more than happy to share those details, but for now, we'll keep the focus on the most impactful challenges we encountered.
Our platform revolves around transcribing meeting audio and later processing it into summaries, emotional analysis, and more complex features such as diarization, translation, and chat. These features presented many problems of their own. I've selected a few to share. We'll delve into each challenge one by one.
Starting off, let's discuss the transcription process. As you'd imagine, meetings of lengthy duration took a significant amount of time to transcribe, and we wouldn't want users stuck in the process, unable to navigate or perform other tasks. As the age-old saying goes, time is money, and money gets you chicken, and y'all know damn well I love chicken. Wait, what was I talking about again? Jokes aside, a solution was quickly needed, and it came in the form of a Discord DM from one of our interns, Ravel. I could see an upcoming 10x developer when I saw one. The solution was to delegate these transcript-related processes as tasks to an external worker via a Redis-based queue system. In short, processes that took significant time and could potentially block other operations are sent to the worker to be executed in the background. That begs the question: how does the client know when the process finishes or is still in progress? Well, each task has metadata attached to it, including but not limited to: the ID, status, and payload. We then exposed an endpoint for the front end to poll and check a task's status based on the ID. Additionally, using SSE (Server-Sent Events), we're able to stream real-time updates to the client. Pretty neat, huh?
Beyond technical challenges, there were abstract requirements that needed further clarity. Mainly, the credit system. Questions like "How can we manage user credits?" and "Are user credits separate from team credits?" could drastically change our data models. Questions like "What feature(s), when used by the user, count as the use of one credit?" arose as well. Long story short, we decided to create a new internal platform to manage and distribute credits for teams and individual users. Explaining this requires an article of its own, so I'll keep it at that.
Lastly, one of the unique challenges we faced was designing a robust prompt for each feature. Even slight variations in phrasing, or the addition of a single sentence, could determine whether the feature was vulnerable to jailbreaks, produced the correct data format, or ran into other unexpected issues. This process, known as prompt engineering, presents a set of challenges you'd rarely encounter in conventional CRUD applications. It took countless prompt iterations and hands-on testing to finally reach versions that balanced flexibility with security.
Most of the problems we faced during this phase could be resolved through online research—reading documentation, understanding guides, watching YouTube tutorials, and occasionally asking ChatGPT or Claude. I say "occasionally" because I prefer to avoid using these tools during a learning process (and I consider these processes learning) unless I'm stuck (or lazy). Like sending prompts to AI, searching for quality resources takes skill, so it's best to hone it.
This whole process has been of great importance to me as it has drastically increased my understanding of various technologies and software engineering as a whole. However, a junior engineer working with resources found through online articles can only do so much. As you may have noticed, aspects such as high-traffic handling, notification systems, complex message queues, and comprehensive monitoring have yet to be fully implemented or optimized. In my opinion, gaining understanding of those concepts can only be done by being directly involved in the development of such systems and working closely with seasoned engineers, things that I have yet to experience. These are opportunities I'm eager to pursue as the next step in my growth.
For recruiters, founders, or any engineer involved in hiring: I hope I've shown not only my technical capability to architect systems from the ground up (albeit at a small scale), but also my leadership, problem-solving, user-centered thinking, and the ability to turn abstract ideas into tangible, usable products for real users.