Jonathan Lalou's blog

[Oracle Dev Days 2025] From JDK 21 to JDK 25: Jean-Michel Doudoux on Java’s Evolution

Jean-Michel Doudoux, a renowned Java Champion and Sciam consultant, delivered a session, charting Java’s evolution from JDK 21 to JDK 25. As the next Long-Term Support (LTS) release, JDK 25 introduces transformative features that redefine Java development. Jean-Michel’s talk provided a comprehensive guide to new syntax, APIs, JVM enhancements, and security measures, equipping developers to navigate Java’s future with confidence.

Enhancing Syntax and APIs

Jean-Michel began by exploring syntactic improvements that streamline Java code. JEP 456 in JDK 22 introduces unnamed variables using _, improving clarity for unused variables. JDK 23’s JEP 467 adds Markdown support for Javadoc, easing documentation. In JDK 25, JEP 511 simplifies module imports, while JEP 512’s implicit classes and simplified main methods make Java more beginner-friendly. JEP 513 enhances constructor flexibility, enabling pre-constructor logic. These changes collectively minimize boilerplate, boosting developer efficiency.

Expanding Capabilities with New APIs

The session highlighted APIs that broaden Java’s scope. The Foreign Function & Memory API (JEP 454) enables safer native code integration, replacing sun.misc.Unsafe. Stream Gatherers (JEP 485) enhance data processing, while the Class-File API (JEP 484) simplifies bytecode manipulation. Scope Values (JEP 506) improve concurrency with lightweight alternatives to thread-local variables. Jean-Michel’s practical examples demonstrated how these APIs empower developers to craft modern, robust applications.

Strengthening JVM and Security

Jean-Michel emphasized JVM and security advancements. JEP 472 in JDK 25 restricts native code access via --enable-native-access, enhancing system integrity. The deprecation of sun.misc.Unsafe aligns with safer alternatives. The removal of 32-bit support, the Security Manager, and certain JMX features reflects Java’s modern focus. Performance boosts in HotSpot JVM, Garbage Collectors (G1, ZGC), and startup times via Project Leyden (JEP 483) ensure Java’s competitiveness.

Boosting Productivity with Tools

Jean-Michel covered enhancements to Java’s tooling ecosystem, including upgraded Javadoc, JCMD, and JAR utilities, which streamline workflows. New Java Flight Recorder (JFR) events improve diagnostics. He urged developers to test JDK 25’s early access builds to prepare for the LTS release, highlighting how these tools enhance efficiency and scalability in application development.

Navigating JDK 25’s LTS Future

Jean-Michel wrapped up by emphasizing JDK 25’s role as an LTS release with extended support. He encouraged proactive engagement with early access programs to adapt to new features and deprecations. His session offered a clear, actionable roadmap, empowering developers to leverage JDK 25’s innovations confidently. Jean-Michel’s expertise illuminated Java’s trajectory, inspiring attendees to embrace its evolving landscape.

Links:

Hashtags: #Java #JDK25 #LTS #JVM #Security #Sciam #JeanMichelDoudoux

Posted in en-US | Tags: Java, JDK, JEP, JSR | No Comments »

Script to clean WSL and remove Ubuntu from Windows 11

Author: Jonathan Lalou

Here is a fully automated PowerShell script that will:

Unregister and remove all WSL distros
Reset WSL to factory defaults
Optionally reinstall WSL cleanly (commented out)

⚠️ You must run this script as Administrator

# =====================================================
# WSL Full Reset Script for Windows 11
# Removes all distros and resets WSL system features
# MUST BE RUN AS ADMINISTRATOR
# =====================================================

Write-Host "`n== STEP 1: List and remove all WSL distros ==" -ForegroundColor Cyan

$distros = wsl --list --quiet
foreach ($distro in $distros) {
    Write-Host "Unregistering WSL distro: $distro" -ForegroundColor Yellow
    wsl --unregister "$distro"
}

Start-Sleep -Seconds 2

Write-Host "`n== STEP 2: Disable WSL-related Windows features ==" -ForegroundColor Cyan

dism.exe /online /disable-feature /featurename:VirtualMachinePlatform /norestart
dism.exe /online /disable-feature /featurename:Microsoft-Windows-Subsystem-Linux /norestart

Start-Sleep -Seconds 2

Write-Host "`n== STEP 3: Uninstall WSL kernel update (if present) ==" -ForegroundColor Cyan
$wslUpdate = Get-AppxPackage -AllUsers | Where-Object { $_.Name -like "*Microsoft.WSL2*" }
if ($wslUpdate) {
    winget uninstall --id "Microsoft.WSL2" --silent
} else {
    Write-Host "No standalone WSL kernel update found." -ForegroundColor DarkGray
}

Start-Sleep -Seconds 2

Write-Host "`n== STEP 4: Clean leftover configuration files ==" -ForegroundColor Cyan
$paths = @(
    "$env:USERPROFILE\.wslconfig",
    "$env:APPDATA\Microsoft\Windows\WSL",
    "$env:LOCALAPPDATA\Packages\CanonicalGroupLimited*",
    "$env:LOCALAPPDATA\Docker",
    "$env:USERPROFILE\.docker"
)
foreach ($path in $paths) {
    Write-Host "Removing: $path" -ForegroundColor DarkYellow
    Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $path
}

Write-Host "`n== STEP 5: Reboot Required ==" -ForegroundColor Magenta
Write-Host "Please restart your computer to complete the WSL reset process."

# Optional: Reinstall WSL cleanly (after reboot)
# Uncomment the lines below if you want the script to also reinstall WSL
<# 
Write-Host "`n== STEP 6: Reinstall WSL ==" -ForegroundColor Cyan
wsl --install
#>

Posted in en-US | Tags: Linux, Windows, WSL | No Comments »

Mastering Information Structure: A Deep Dive into Lists and Nested Lists Across Document Formats

Author: Jonathan Lalou

In the vast and ever-evolving landscape of digital content creation, software development, and technical documentation, the ability to organize information effectively is not just a best practice—it’s a critical skill. Among the most fundamental tools for achieving clarity, enhancing readability, and establishing logical hierarchies are lists and, more powerfully, nested lists.

But how do these seemingly simple, yet incredibly effective, structural elements translate across the myriad of markup languages and sophisticated document formats that we interact with on a daily basis? Understanding the nuances of their representation can significantly streamline your workflow, improve content portability, and ensure your information is consistently and accurately rendered, regardless of the platform.

In this comprehensive article, we’ll take a single, representative nested list and embark on a fascinating journey to demonstrate its representation in several widely used and highly relevant formats: Markdown, HTML, WordprocessingML (the XML behind DOCX files), LaTeX, AsciiDoc, and reStructuredText. By comparing these implementations, you’ll gain a deeper appreciation for the unique philosophies and strengths inherent in each system.

The Sample List: A Structured Overview

To provide a consistent point of reference, let’s establish our foundational nested list. This example is meticulously designed to showcase four distinct levels of nesting, seamlessly mixing both ordered (numbered) and unordered (bulleted) entries. Furthermore, it incorporates common text formatting such as bolding, italics, and preformatted/code snippets, which are essential for rich content presentation.

Visual Representation of Our Sample List:

Main Category One
- Sub-item A: Important detail
  1. Sub-sub-item A.1: Normal text
  2. Sub-sub-item A.2: Code snippet example()
  3. Sub-sub-item A.3: Another detail
- Sub-item B: More information
- Sub-item C: Additional notes
Main Category Two
- Sub-item D: Configuration value
  - Sub-sub-item D.1: First option
  - Sub-sub-item D.2: Second option
  - Sub-sub-item D.3: Final choice
- Sub-item E: Relevant point
- Sub-item F: Last entry
Main Category Three
- Sub-item G: Item with inline code
- Sub-item H: Bolded item: Critical Task
- Sub-item I: Just a regular item

Now, let’s peel back the layers and explore how this exact structure is painstakingly achieved in the diverse world of markup and document formats.

1. Markdown: The Champion of Simplicity and Readability

Markdown has surged in popularity due to its remarkably simple and intuitive syntax, making it incredibly human-readable even in its raw form. It employs straightforward characters for list creation and basic inline formatting, making it a go-to choice for READMEs, basic documentation, and blog posts.

1.  **Main Category One**
    * Sub-item A: *Important detail*
        * 1. Sub-sub-item A.1: Normal text
        * 2. Sub-sub-item A.2: `Code snippet example()`
        * 3. Sub-sub-item A.3: Another detail
    * Sub-item B: More information
    * Sub-item C: *Additional notes*

2.  **Main Category Two**
    * Sub-item D: `Configuration value`
        * -   Sub-sub-item D.1: _First option_
        * -   Sub-sub-item D.2: Second option
        * -   Sub-sub-item D.3: _Final choice_
    * Sub-item E: *Relevant point*
    * Sub-item F: Last entry

3.  **Main Category Three**
    * Sub-item G: Item with `inline code`
    * Sub-item H: Bolded item: **Critical Task**
    * Sub-item I: Just a regular item

2. HTML: The Foundational Language of the Web

HTML (HyperText Markup Language) is the backbone of almost every webpage you visit. It uses distinct tags to define lists: <ol> for ordered (numbered) lists and <ul> for unordered (bulleted) lists. Each individual item within a list is encapsulated by an <li> (list item) tag. The beauty of HTML’s list structure lies in its inherent nesting capability—simply place another <ul> or <ol> inside an <li> to create a sub-list.

<ol>
  <li><strong>Main Category One</strong>
    <ul>
      <li>Sub-item A: <em>Important detail</em>
        <ol>
          <li>Sub-sub-item A.1: Normal text</li>
          <li>Sub-sub-item A.2: <code>Code snippet example()</code></li>
          <li>Sub-sub-item A.3: Another detail</li>
        </ol>
      </li>
      <li>Sub-item B: More information</li>
      <li>Sub-item C: <em>Additional notes</em></li>
    </ul>
  </li>
  <li><strong>Main Category Two</strong>
    <ul>
      <li>Sub-item D: <code>Configuration value</code>
        <ul>
          <li>Sub-sub-item D.1: <em>First option</em></li>
          <li>Sub-sub-item D.2: Second option</li>
          <li>Sub-sub-item D.3: <em>Final choice</em></li>
        </ul>
      </li>
      <li>Sub-item E: <em>Relevant point</em></li>
      <li>Sub-item F: Last entry</li>
    </ul>
  </li>
  <li><strong>Main Category Three</strong>
    <ul>
      <li>Sub-item G: Item with <code>inline code</code></li>
      <li>Sub-item H: Bolded item: <strong>Critical Task</strong></li>
      <li>Sub-item I: Just a regular item</li>
    </ul>
  </li>
</ol>

3. WordprocessingML (Flat OPC for DOCX): The Enterprise Standard

When you save a document in Microsoft Word as a DOCX file, you’re actually saving an archive of XML files. This underlying XML structure, known as WordprocessingML (part of Office Open XML or OPC), is incredibly detailed, defining not just the content but also every aspect of its visual presentation, including complex numbering schemes, bullet types, and precise indentation. Representing a simple list in WordprocessingML is far more verbose than in other formats because it encapsulates all these rendering instructions.

Below is a simplified snippet focusing on the list content. A complete, runnable WordprocessingML document would also include extensive definitions for abstract numbering (`<w:abstractNums>`) and number instances (`<w:nums>`) within the `w:document`’s root, detailing the specific styles, indents, and bullet/numbering characters for each list level. The `w:numPr` tag within each paragraph links it to these definitions.

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <w:body>

    <!-- List Definition (Abstract Num) and Instance (Num) would be here, defining levels, bullets, and numbering formats -->
    <!-- (Omitted for brevity, as they are extensive. See previous detailed output for full context) -->

    <!-- List Content -->

    <!-- 1. Main Category One -->
    <w:p>
      <w:pPr>
        <w:pStyle w:val="ListParagraph"/>
        <w:numPr><w:ilvl w:val="0"/><w:numId w:val="1"/></w:numPr>
      </w:pPr>
      <w:r><w:rPr><w:b/></w:rPr><w:t>Main Category One</w:t></w:r>
    </w:p>

    <!--   * Sub-item A -->
    <w:p>
      <w:pPr><w:pStyle w:val="ListParagraph"/><w:numPr><w:ilvl w:val="1"/><w:numId w:val="1"/></w:numPr></w:pPr>
      <w:r><w:t>Sub-item A: </w:t></w:r><w:r><w:rPr><w:i/></w:rPr><w:t>Important detail</w:t></w:r>
    </w:p>

    <!--     1. Sub-sub-item A.1 -->
    <w:p>
      <w:pPr><w:pStyle w:val="ListParagraph"/><w:numPr><w:ilvl w:val="2"/><w:numId w:val="1"/></w:numPr></w:pPr>
      <w:r><w:t>Sub-sub-item A.1: Normal text</w:t></w:r>
    </w:p>

    <!--     2. Sub-sub-item A.2 -->
    <w:p>
      <w:pPr><w:pStyle w:val="ListParagraph"/><w:numPr><w:ilvl w:val="2"/><w:numId w:val="1"/></w:numPr></w:pPr>
      <w:r><w:t>Sub-sub-item A.2: </w:t></w:r><w:r><w:rPr><w:rFonts w:ascii="Consolas" w:hAnsi="Consolas"/><w:sz w:val="20"/></w:rPr><w:t xml:space="preserve">Code snippet example()</w:t></w:r>
    </w:p>

    <!-- ( ... rest of the list items follow similar patterns ... ) -->

  </w:body>
</w:document>

4. LaTeX: The Gold Standard for Academic and Scientific Publishing

LaTeX is not just a markup language; it’s a powerful typesetting system renowned for producing high-quality documents, especially those with complex mathematical formulas, tables, and precise layouts. For lists, LaTeX employs environments: \begin{enumerate} for ordered lists and \begin{itemize} for unordered lists. Nesting is achieved by simply embedding one list environment within an `\item` of another.

\documentclass{article}
\begin{document}

\begin{enumerate} % Ordered List (Level 1)
    \item \textbf{Main Category One}
    \begin{itemize} % Unordered List (Level 2)
        \item Sub-item A: \textit{Important detail}
        \begin{enumerate} % Ordered List (Level 3)
            \item Sub-sub-item A.1: Normal text
            \item Sub-sub-item A.2: \texttt{Code snippet example()}
            \item Sub-sub-item A.3: Another detail
        \end{enumerate}
        \item Sub-item B: More information
        \item Sub-item C: \textit{Additional notes}
    \end{itemize}
    \item \textbf{Main Category Two}
    \begin{itemize} % Unordered List (Level 2)
        \item Sub-item D: \texttt{Configuration value}
        \begin{itemize} % Unordered List (Level 3)
            \item Sub-sub-item D.1: \textit{First option}
            \item Sub-sub-item D.2: Second option
            \item Sub-sub-item D.3: \textit{Final choice}
        \end{itemize}
        \item Sub-item E: \textit{Relevant point}
        \item Sub-item F: Last entry
    \end{itemize}
    \item \textbf{Main Category Three}
    \begin{itemize}
        \item Sub-item G: Item with \texttt{inline code}
        \item Sub-item H: Bolded item: \textbf{Critical Task}
        \item Sub-item I: Just a regular item
    \end{itemize}
\end{enumerate}

\end{document}

5. AsciiDoc: The Powerhouse for Technical Documentation

AsciiDoc offers a more robust set of features than basic Markdown, making it particularly well-suited for authoring complex technical documentation, books, and articles. It uses a consistent, visually intuitive syntax for lists: a dot (.) for ordered items and an asterisk (*) for unordered items. Deeper nesting is achieved by adding more dots or asterisks (e.g., .. or **) at the start of the list item line.

. Main Category One
* Sub-item A: _Important detail_
** 1. Sub-sub-item A.1: Normal text
** 2. Sub-sub-item A.2: `Code snippet example()`
** 3. Sub-sub-item A.3: Another detail
* Sub-item B: More information
* Sub-item C: _Additional notes_

. Main Category Two
* Sub-item D: `Configuration value`
** - Sub-sub-item D.1: _First option_
** - Sub-sub-item D.2: Second option
** - Sub-sub-item D.3: _Final choice_
* Sub-item E: _Relevant point_
* Sub-item F: Last entry

. Main Category Three
* Sub-item G: Item with `inline code`
* Sub-item H: Bolded item: *Critical Task*
* Sub-item I: Just a regular item

6. reStructuredText (RST): Python’s Preferred Documentation Standard

reStructuredText is a powerful yet readable markup language that plays a central role in documenting Python projects, often leveraging the Sphinx documentation generator. It uses simple numeric markers or bullet characters for lists, with nesting primarily dictated by consistent indentation. Its extensibility makes it a versatile choice for structured content.

1.  **Main Category One**
    * Sub-item A: *Important detail*
        1. Sub-sub-item A.1: Normal text
        2. Sub-sub-item A.2: ``Code snippet example()``
        3. Sub-sub-item A.3: Another detail
    * Sub-item B: More information
    * Sub-item C: *Additional notes*

2.  **Main Category Two**
    * Sub-item D: ``Configuration value``
        - Sub-sub-item D.1: *First option*
        - Sub-sub-item D.2: Second option
        - Sub-sub-item D.3: *Final choice*
    * Sub-item E: *Relevant point*
    * Sub-item F: Last entry

3.  **Main Category Three**
    * Sub-item G: Item with ``inline code``
    * Sub-item H: Bolded item: **Critical Task**
    * Sub-item I: Just a regular item

Why Such Diversity in List Formats?

The existence of so many distinct formats for representing lists and structured content isn’t arbitrary; it’s a reflection of the diverse needs and contexts in the digital world:

Markdown & AsciiDoc: These formats prioritize authoring speed and raw readability. They are ideal for rapid content creation, internal documentation, web articles, and scenarios where the content needs to be easily read and edited in plain text. They rely on external processors to render them into final forms like HTML or PDF.
HTML: The universal language of the World Wide Web. It’s designed for displaying content in web browsers, offering extensive styling capabilities via CSS and dynamic behavior through JavaScript. Its primary output is for screen display.
WordprocessingML (DOCX): This is the standard for office productivity and print-ready documents. It offers unparalleled control over visual layout, rich text formatting, collaborative features (like tracking changes), and is designed for a WYSIWYG (What You See Is What You Get) editing experience. It’s built for desktop applications and printing.
LaTeX: The academic and scientific community’s gold standard. LaTeX excels at typesetting complex mathematical formulas, scientific papers, and books where precise layout, consistent formatting, and high-quality print output are paramount. It’s a programming-like approach to document creation.
reStructuredText: A strong choice for technical documentation, especially prevalent in the Python ecosystem. It balances readability with robust structural elements and extensibility, making it well-suited for API documentation, user guides, and project manuals that can be automatically converted to various outputs.

Ultimately, understanding these varied representations empowers you to select the most appropriate tool for your content, ensuring that your structured information is consistently and accurately presented across different platforms, audiences, and end-uses. Whether you’re building a website, drafting a scientific paper, writing a user manual, or simply organizing your thoughts, mastering lists is a fundamental step towards clear and effective communication.

What are your go-to formats for organizing information with lists? Do you have a favorite, or does it depend entirely on the project? Share your thoughts and experiences in the comments below!

Posted in en-US | Tags: html, latex, markdown, opc | No Comments »

️ Prototype Pollution: The Silent JavaScript Vulnerability You Shouldn’t Ignore

Author: Jonathan Lalou

Prototype pollution is one of those vulnerabilities that many developers have heard about, but few fully understand—or guard against. It’s sneaky, dangerous, and more common than you’d think, especially in JavaScript and Node.js applications.

This post breaks down what prototype pollution is, how it can be exploited, how to detect it, and most importantly, how to fix it.

What Is Prototype Pollution?

In JavaScript, all objects inherit from Object.prototype by default. If an attacker can modify that prototype via user input, they can change how every object behaves.

This is called prototype pollution, and it can:

Alter default behavior of native objects
Lead to privilege escalation
Break app logic in subtle ways
Enable denial-of-service (DoS) or even remote code execution in some cases

Real-World Exploit Example

const payload = JSON.parse('{ "__proto__": { "isAdmin": true } }');
Object.assign({}, payload);

console.log({}.isAdmin); // → true

Now, any object in your app believes it’s an admin. That’s the essence of prototype pollution.

How to Detect It

✅ Static Code Analysis

ESLint
- Use plugins like eslint-plugin-security or eslint-plugin-no-prototype-builtins
Semgrep
- Detect unsafe merges with custom rules

Dependency Scanning

npm audit, yarn audit, or tools like Snyk, OWASP Dependency-Check
Many past CVEs (e.g., Lodash < 4.17.12) were related to prototype pollution

Manual Testing

Try injecting:

{ "__proto__": { "injected": true } }

Then check if unexpected object properties appear in your app.

️ How to Fix It

1. Sanitize Inputs

Never allow user input to include dangerous keys:

__proto__
constructor
prototype

2. Avoid Deep Merge with Untrusted Data

Use libraries that enforce safe merges:

deepmerge with safe mode
Lodash >= 4.17.12

3. Write Safe Merge Logic

function safeMerge(target, source) {
  for (let key in source) {
    if (!['__proto__', 'constructor', 'prototype'].includes(key)) {
      target[key] = source[key];
    }
  }
  return target;
}

4. Use Secure Parsers

secure-json-parse
@hapi/hoek

TL;DR

✅ Task	Tool/Approach
Scan source code	ESLint, Semgrep
Test known payloads	Manual JSON fuzzing
Scan dependencies	npm audit, Snyk
Sanitize keys before merging	Allowlist strategy
Patch libraries	Update Lodash, jQuery

‍ Final Thoughts

Prototype pollution isn’t just a theoretical risk. It has appeared in real-world vulnerabilities in major libraries and frameworks.

If your app uses JavaScript—on the frontend or backend—you need to be aware of it.

Share this post if you work with JavaScript.
️ Found something similar in your project? Let’s talk.

#JavaScript #Security #PrototypePollution #NodeJS #WebSecurity #DevSecOps #SoftwareEngineering

Posted in en-US | Tags: NodeJS, Security | No Comments »

Demystifying Parquet: The Power of Efficient Data Storage in the Cloud

Author: Jonathan Lalou

Unlocking the Power of Apache Parquet: A Modern Standard for Data Efficiency

In today’s digital ecosystem, where data volume, velocity, and variety continue to rise, the choice of file format can dramatically impact performance, scalability, and cost. Whether you are an architect designing a cloud-native data platform or a developer managing analytics pipelines, Apache Parquet stands out as a foundational technology you should understand — and probably already rely on.

This article explores what Parquet is, why it matters, and how to work with it in practice — including real examples in Python, Java, Node.js, and Bash for converting and uploading files to Amazon S3.

What Is Apache Parquet?

Apache Parquet is a high-performance, open-source file format designed for efficient columnar data storage. Originally developed by Twitter and Cloudera and now an Apache Software Foundation project, Parquet is purpose-built for use with distributed data processing frameworks like Apache Spark, Hive, Impala, and Drill.

Unlike row-based formats such as CSV or JSON, Parquet organizes data by columns rather than rows. This enables powerful compression, faster retrieval of selected fields, and dramatic performance improvements for analytical queries.

Why Choose Parquet?

✅ Columnar Format = Faster Queries

Because Parquet stores values from the same column together, analytical engines can skip irrelevant data and process only what’s required — reducing I/O and boosting speed.

Compression and Storage Efficiency

Parquet achieves better compression ratios than row-based formats, thanks to the similarity of values in each column. This translates directly into reduced cloud storage costs.

Schema Evolution

Parquet supports schema evolution, enabling your datasets to grow gracefully. New fields can be added over time without breaking existing consumers.

Interoperability

The format is compatible across multiple ecosystems and languages, including Python (Pandas, PyArrow), Java (Spark, Hadoop), and even browser-based analytics tools.

☁️ Using Parquet with Amazon S3

One of the most common modern use cases for Parquet is in conjunction with Amazon S3, where it powers data lakes, ETL pipelines, and serverless analytics via services like Amazon Athena and Redshift Spectrum.

Here’s how you can write Parquet files and upload them to S3 in different environments:

From CSV to Parquet in Practice

Python Example

import pandas as pd

# Load CSV data
df = pd.read_csv("input.csv")

# Save as Parquet
df.to_parquet("output.parquet", engine="pyarrow")

To upload to S3:

import boto3

s3 = boto3.client("s3")
s3.upload_file("output.parquet", "your-bucket", "data/output.parquet")

Node.js Example

Install the required libraries:

npm install aws-sdk

Upload file to S3:

const AWS = require('aws-sdk');
const fs = require('fs');

const s3 = new AWS.S3();
const fileContent = fs.readFileSync('output.parquet');

const params = {
    Bucket: 'your-bucket',
    Key: 'data/output.parquet',
    Body: fileContent
};

s3.upload(params, (err, data) => {
    if (err) throw err;
    console.log(`File uploaded successfully at ${data.Location}`);
});

☕ Java with Apache Spark and AWS SDK

In your pom.xml, include:

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-hadoop</artifactId>
    <version>1.12.2</version>
</dependency>
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-java-sdk-s3</artifactId>
    <version>1.12.470</version>
</dependency>

Spark conversion:

Dataset<Row> df = spark.read().option("header", "true").csv("input.csv");
df.write().parquet("output.parquet");

Upload to S3:

AmazonS3 s3 = AmazonS3ClientBuilder.standard()
    .withRegion("us-west-2")
    .withCredentials(new AWSStaticCredentialsProvider(
        new BasicAWSCredentials("ACCESS_KEY", "SECRET_KEY")))
    .build();

s3.putObject("your-bucket", "data/output.parquet", new File("output.parquet"));

Bash with AWS CLI

aws s3 cp output.parquet s3://your-bucket/data/output.parquet

Final Thoughts

Apache Parquet has quietly become a cornerstone of the modern data stack. It powers everything from ad hoc analytics to petabyte-scale data lakes, bringing consistency and efficiency to how we store and retrieve data.

Whether you are migrating legacy pipelines, designing new AI workloads, or simply optimizing your storage bills — understanding and adopting Parquet can unlock meaningful benefits.

When used in combination with cloud platforms like AWS, the performance, scalability, and cost-efficiency of Parquet-based workflows are hard to beat.

Posted in en-US | Tags: Java, NodeJS, parquet, Python, Spark | No Comments »

S	M	T	W	T	F	S
« May
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30