Data science doesn’t work very well with the existing process of life cycles. It’s not enough like the software to fit the SDLC, and the data mining process of CRISP-DM is a little too rigid. That doesn’t mean that a data science team should just work in whatever way feels right. There is real value in these life cycles. One value is that it gives you a high-level map of where you’re going. This is useful when you’re starting a data science team.
You get a general sense of the path forward, so you can start with the end in mind. The danger with the life cycle is that it becomes the primary focus of the work. You want to use the life cycle as a vehicle for better data science. You don’t want to follow the process for the sake of following the process. A good life cycle should be like a handrail. You want to think it’s there for when you’re going up and down the stairs.
You don’t want to cling to it with every step. After a while, you shouldn’t even notice that it’s there. For data science projects, you can use DSLC. This process framework is lightweight and less rigid. DSLC has six steps.
Each of these life cycles works well depending on the project. The problem with these life cycles is that they require you to know a lot about what you’re doing before you start. In software development, you have to have a clear scope. With data mining, you have to know a lot about the data and business needs.
Data science is empirical. You don’t know what you’ll find. You might not even know what you’re looking for. Instead, you have to focus on interesting questions, and then create a feedback loop to make sure those questions are tied to business value. Life Cycle of Data Science
Nevertheless, a life cycle can be very useful. It’s like a high-level map that helps keep the team on track. That’s why for a data science team, you’ll want to try a different approach. You can use a data science life cycle (DSLC) as a way to set some direction for the team.
You’ll explore the SDLC and CRISP-DM so you can understand how they differ from a DSLC. Then you’ll learn how to use the DSLC and how to effectively loop through the DSLC questions.
This life cycle is loosely based on the scientific method.
As a data science team, start by identifying the key roles in your story. In the end, you want to be able to tell an interesting story with your data. The best way to start a story is to identify key players. Think about it as a scene in a play. Who walks into the room? Is there a main character or protagonist? Is there a backstory that helps make sense of his or her actions? Let’s go back to the running shoe web site. Who are your key players? There’s the runner. Maybe the runner has a partner who influences his or her running habits. Maybe your runner’s partner is a doctor, a blogger, or a trainer. Each of these players could be a part of your data science story.
After you’ve identified your key players, you can ask some interesting questions. Your team’s research lead might start by asking, “Is there a blogger who influences your runner?” Maybe the trainer plays a big role in influencing what your runner purchases.
The data analyst wants to work closely with the team to try and get at some strategies for researching the questions. The team decides to explore the relationship between the runners and their partners. Here, the research lead would ask the data analyst how they could get this information.
After you have your research topic, you want to create your first reports. These results are for the team. They should be quick and dirty. Hopefully, your data science team will go through a lot of questions and a lot of reports. Most of these will be duds.
Finally, your data science team should look at the results to see if there are any interesting insights, Maybe the data suggests that most of your customers run with partners. That insight might be very valuable to the marketing team.
In the end, your team will bundle up a bunch of these insights and try to create organizational knowledge. It’s here that your team will tell the customer’s story. You might want to use data visualizations to back up your story. This new knowledge is what adds value to the rest of the organization.