Is S3 a data lake
Amazon launches service for data lakes
The AWS Lake Formation offering is now generally available on Amazon Web Services. Amazon announced the service at re: Invent 2018. It offers managed data lakes on Amazon's cloud platform. AWS charges the usual fees for the underlying services for storing and transferring the data, but provides the service free of charge.
As part of the first announcement of AWS Lake Formation at re: Invent in November 2018, Amazon spoke of 10,000 data lakes that are on Amazon Simple Storage Service (S3). The new service automates the installation and management of the lakes and helps customers prepare the data before it flows into the data lake. Finally, AWS offers special security functions.
The data lake with its inflows and outflows
Relational or NoSQL databases and other S3 instances can be used as sources. AWS Lake Formation offers a source crawler that takes care of the acquisition of the data. In the data lake, the service organizes the inputs according to frequent query terms and divides them into data blocks, which should ensure efficient processing. AWS Lake Formation uses machine learning to deduplicate and find data that is different but points to the same thing.
The data stored in the data lake can be transferred to Amazon Redshift, Athena, AWS Glue or Amazon Elastic MapReduce (EMR) for further processing. The latter is still in beta at the start of the service. A preparation with Amazon QuickSight and SageMaker should follow in the next few months. Access to the data lake can be controlled via AWS Identity and Access Management and AWS Key Management, among other things.
A reservoir for all data
The term data lake goes back to Pentaho founder James Dixon. The concept is designed for large analysis systems. The data initially flow unprocessed into the lake and are allowed to change there. The name comes from the fact that the lake receives the data from numerous tributaries and combines structured with unstructured and raw data. There is no specific technique for storing the data associated with the term.
One advantage of the way it works is that administrators do not have to define formats or structures in advance. However, they must make sure that they can continue to manage the data and keep it accessible. If they have no control or poor access to the data lake, one appropriately speaks of a data swamp - in this case the lake is marshy.
In fact, the orderly access and optimization of the data is a major challenge when creating and managing data lakes. It is precisely these difficulties that Amazon wants to address with the new offer. Further details can be found in the announcement. (rme)Read comments (11) Go to the homepage
- What is a PCI graphics card
- Kills bed bugs with aerosol spray
- Why are Minnesotans so much like Canadians?
- What is the most overrated musical composition
- Who sings the song Evil Woman
- Who designed the first Switchblade
- What does Donnie Darko mean
- How to make a flipagram
- Who is Real Madrid's cheapest football player?
- Is a phobia rational or irrational
- How do we fix communism
- Which is the best crypto pump group
- Growing a beard causes hair loss
- How to cook shallots
- What is a champagne room
- How can I prevent cirrhosis
- How do tardigrades survive
- What if Great Britain invaded Argentina?
- Do you like lemons
- What are you deactivated with
- Is trade good for the future?
- Why should House Democrats investigate Trump's family?
- Should I believe in the MBTI system?
- Is quantum computation with NMR very promising?
- What is the Average Adderall Dosage
- What is tryptophan
- Why is smoking considered socially unacceptable?
- Joe Namath wore tights
- Which Cologne have pheromones?
- In which year did personnel development start in India
- How do you understand what google is doing
- Can light remove other lights
- What is Gaara's zodiac sign
- Where do I start freelance learning